cocci.inria.fr archive mirror
 help / color / mirror / Atom feed
* [Cocci] coccinelle computes patch very slowly
@ 2018-06-25 15:44 Michele Martone
       [not found] ` <ff255650-b3d6-c1dc-53ef-8e6e715c5077@users.sourceforge.net>
  2018-06-26 11:03 ` Julia Lawall
  0 siblings, 2 replies; 8+ messages in thread
From: Michele Martone @ 2018-06-25 15:44 UTC (permalink / raw)
  To: cocci

Dear Coccinelle Team,

While patching source files of a few thousand lines long, I 
noticed prohibitively long patch compute times (seemed hanged).

This effectively prevented spatch from being usable.

I attach a minimalistic program and patch replicating the problem.

It seems like presence of uninitialized variables  and/or
a loops body might slow down spatch computation extremely.

I will be grateful of any support!
Michele
-------------- next part --------------
/*
// A semantic patch introducing casting on function return:
@@
type T;
identifier I;
symbol f;
@@

T I;
...

-I = 
+I = (T) 
 f(...);
*/
// Unfortunately, the above semantic patch takes 44s to run `spatch --sp-file <patch above> <snippet below>` !
// By using initializing declarators, or deleting the loop, it takes a fraction of a second.
// Tested with a coccinelle version 04d390a4414e626a0ca83f65f1ec08390c378cd1 (Sat May 26 07:43:36 2018 +0200).
// spatch version 1.0.6-00440-ga4532f08 compiled with OCaml version 4.08.0+dev0-2018-04-09
void * f(){return 0;}
void g(void)
{
  int v1=0, v2=0;
  int *a1, *a2;
  int presence, of, these, slows, down, patch, computing, extremely, y1, y2, y3, y4, y5;//each new uninitialized variable can cost seconds!
  //int presence=0, of=0, these=0, slows=0, down=0, patch=0, computing=0, extremely=0, y1=0, y2=0, y3=0, y4=0, y5=0;//inizialized ones are fine
  //undefined_t status;// presence of this (if uncommented) slows down a lot
  a1 = f();
  for(v1 = 0; v1 < v2; v1++) { } // presence of this loop here, too! delete it to have a huge speedup ?!
  a2 = f();
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
       [not found]   ` <20180626102638.GD14796@localhost>
@ 2018-06-26 10:32     ` Michele Martone
  2018-06-26 10:56       ` Julia Lawall
  0 siblings, 1 reply; 8+ messages in thread
From: Michele Martone @ 2018-06-26 10:32 UTC (permalink / raw)
  To: cocci

On 20180626 at 11:20, SF Markus Elfring wrote:
> > While patching source files of a few thousand lines long,
> > I noticed prohibitively long patch compute times (seemed hanged).
> 
> Do you get nicer run time characteristics if you would refactor the SmPL script
> example like the following for the addition of a type cast?
> 
> // A semantic patch introducing casting on function return:
> @@
> identifier F, V;
> type T;
> @@
>  T V;
>  ...
>  V =
> +    (T)
>      F(...);

Hi Markus,

using your patch
(having F as any identifier, rather than a specific token)
does not change anything time-wise, unfortunately :-(

p.s.: My general problem actually was more complicated than this,
  in that the variables declarations can be anywhere, while this
  patch here requires declaration and F call to be in the same
  scope.

Michele

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
  2018-06-26 10:32     ` Michele Martone
@ 2018-06-26 10:56       ` Julia Lawall
  0 siblings, 0 replies; 8+ messages in thread
From: Julia Lawall @ 2018-06-26 10:56 UTC (permalink / raw)
  To: cocci



On Tue, 26 Jun 2018, Michele Martone wrote:

> On 20180626 at 11:20, SF Markus Elfring wrote:
> > > While patching source files of a few thousand lines long,
> > > I noticed prohibitively long patch compute times (seemed hanged).
> >
> > Do you get nicer run time characteristics if you would refactor the SmPL script
> > example like the following for the addition of a type cast?
> >
> > // A semantic patch introducing casting on function return:
> > @@
> > identifier F, V;
> > type T;
> > @@
> >  T V;
> >  ...
> >  V =
> > +    (T)
> >      F(...);
>
> Hi Markus,
>
> using your patch
> (having F as any identifier, rather than a specific token)
> does not change anything time-wise, unfortunately :-(
>
> p.s.: My general problem actually was more complicated than this,
>   in that the variables declarations can be anywhere, while this
>   patch here requires declaration and F call to be in the same
>   scope.

The call to F can be in a nested scope.

julia

>
> Michele
> _______________________________________________
> Cocci mailing list
> Cocci at systeme.lip6.fr
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
  2018-06-25 15:44 [Cocci] coccinelle computes patch very slowly Michele Martone
       [not found] ` <ff255650-b3d6-c1dc-53ef-8e6e715c5077@users.sourceforge.net>
@ 2018-06-26 11:03 ` Julia Lawall
  2018-06-26 15:27   ` Michele Martone
  1 sibling, 1 reply; 8+ messages in thread
From: Julia Lawall @ 2018-06-26 11:03 UTC (permalink / raw)
  To: cocci



On Mon, 25 Jun 2018, Michele Martone wrote:

> Dear Coccinelle Team,
>
> While patching source files of a few thousand lines long, I
> noticed prohibitively long patch compute times (seemed hanged).
>
> This effectively prevented spatch from being usable.
>
> I attach a minimalistic program and patch replicating the problem.
>
> It seems like presence of uninitialized variables  and/or
> a loops body might slow down spatch computation extremely.
>
> I will be grateful of any support!

Loops can cause the matching process to become very expensive.

I was about to propose various solutionsto get around the loop problem,
but I think you don't care.  You just want to know the type of I.  Hence:

@@
type T;
idexpression T I;
identifier f;
@@

I =
+ (T)
  f(...)

julia

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
  2018-06-26 11:03 ` Julia Lawall
@ 2018-06-26 15:27   ` Michele Martone
  2018-06-26 15:40     ` Julia Lawall
  0 siblings, 1 reply; 8+ messages in thread
From: Michele Martone @ 2018-06-26 15:27 UTC (permalink / raw)
  To: cocci

On 20180626 at 13:03, Julia Lawall wrote:
> On Mon, 25 Jun 2018, Michele Martone wrote:
> > Dear Coccinelle Team,
> >
> > While patching source files of a few thousand lines long, I
> > noticed prohibitively long patch compute times (seemed hanged).
> >
> > This effectively prevented spatch from being usable.
> >
> > I attach a minimalistic program and patch replicating the problem.
> >
> > It seems like presence of uninitialized variables  and/or
> > a loops body might slow down spatch computation extremely.
> >
> > I will be grateful of any support!

Hi Julia,

> Loops can cause the matching process to become very expensive.

With your suggeston below (thanks!) I was able to go around
the problem (see comments) !

However, I hope that this behaviour is unintended.

I mean: to practical means, loop presence, or a growing amount of
uninitialized variables leading to supra-linear patch compute 
times is a game-stopper: it can IMHO severely scare users...
I hope it is some algorithmic limitation that can be overcome..

> I was about to propose various solutionsto get around the loop problem,
> but I think you don't care.
I do care --- I am open to further techniques --- they might very likel
y come in handy soon ;-).
So you may send them, please.

> You just want to know the type of I.  Hence:
> 
> @@
> type T;
> idexpression T I;
> identifier f;
> @@
> 
> I =
> + (T)
>   f(...)

Now the POC code gets patched in a fraction of a second.
And a 1.2KLOC source with 17 occurrences of 'f' in just <1s.
So my practical problem here is solved: thanks!

I have an extra question.
I observed that applying:
 spatch --sp-file <patch above> <120 files totalling 140 KLOC>
seems to take >4 minutes time and consuming > 5.5 GB of memory;
I reran on each file separately, concatenating the patches.
Then it finishes in ~40s, computing the same exact patch.

Given such a simple patch, was this expected to be so ?

Cheers,
Michele

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
  2018-06-26 15:27   ` Michele Martone
@ 2018-06-26 15:40     ` Julia Lawall
  2018-06-27 22:23       ` Michele Martone
  0 siblings, 1 reply; 8+ messages in thread
From: Julia Lawall @ 2018-06-26 15:40 UTC (permalink / raw)
  To: cocci



On Tue, 26 Jun 2018, Michele Martone wrote:

> On 20180626 at 13:03, Julia Lawall wrote:
> > On Mon, 25 Jun 2018, Michele Martone wrote:
> > > Dear Coccinelle Team,
> > >
> > > While patching source files of a few thousand lines long, I
> > > noticed prohibitively long patch compute times (seemed hanged).
> > >
> > > This effectively prevented spatch from being usable.
> > >
> > > I attach a minimalistic program and patch replicating the problem.
> > >
> > > It seems like presence of uninitialized variables  and/or
> > > a loops body might slow down spatch computation extremely.
> > >
> > > I will be grateful of any support!
>
> Hi Julia,
>
> > Loops can cause the matching process to become very expensive.
>
> With your suggeston below (thanks!) I was able to go around
> the problem (see comments) !
>
> However, I hope that this behaviour is unintended.
>
> I mean: to practical means, loop presence, or a growing amount of
> uninitialized variables leading to supra-linear patch compute
> times is a game-stopper: it can IMHO severely scare users...
> I hope it is some algorithmic limitation that can be overcome..

It is an algorithmic limitation, but I don't think it can be overcome in
general.  Coccinelle follows control-flow paths.  When A ... B matches a
case where there is a loop between A and B, then there is an infinite loop
that blocks going from A to B.  The solution is to use what is called weak
until (Coccinelle matching is based on computational tree logic).  This
inlvoves negating everything and somehow reaching the fixed point in terms
of the code outside the loop, rather than the code inside the loop.  This
easily explodes.  A solution could be --no-loops, since you don't care
about situations that start at the end of a loop and jump around to the
top in your case.  You can also use a timeout, to get an overall result
quickly at the risk of missing some results.  You can also reduce the
problem by having one rule that matches a declaration, and another rule
that makes the transformation for each declared variable individually;
currently paths for different variables are interfering with each other
and further inflating the cost.  You can also check whether there exists a
path from the declaration to a variable use, by putting exists in the
header, to be able to ignore variables that are simply never used.

> > I was about to propose various solutionsto get around the loop problem,
> > but I think you don't care.
> I do care --- I am open to further techniques --- they might very likel
> y come in handy soon ;-).
> So you may send them, please.
>
> > You just want to know the type of I.  Hence:
> >
> > @@
> > type T;
> > idexpression T I;
> > identifier f;
> > @@
> >
> > I =
> > + (T)
> >   f(...)
>
> Now the POC code gets patched in a fraction of a second.
> And a 1.2KLOC source with 17 occurrences of 'f' in just <1s.
> So my practical problem here is solved: thanks!
>
> I have an extra question.
> I observed that applying:
>  spatch --sp-file <patch above> <120 files totalling 140 KLOC>
> seems to take >4 minutes time and consuming > 5.5 GB of memory;
> I reran on each file separately, concatenating the patches.
> Then it finishes in ~40s, computing the same exact patch.
>
> Given such a simple patch, was this expected to be so ?

When you put many files on the command line, it works on the all at once,
in order to be able to take into account references from oneto the other.
Since there are 120 of them, that will be a lot of live memory which will
stress the OCaml garbage collector.  When you run on the files
individually the garbage collector is happy - the OCaml GC works well when
it only has to collect the minor heap.

Normally, when you have many files to work on, you give the name of the
enclosing directory and Coccinelle works on all of them.  Then it works
one file at a time.  If that is not suitable for your use case, then you
can make a file with the names of the files you are interested in,
separated by blank lines, ad git the argument --file-groups filename.
Then it will only work on the listed files, again one at a time.

julia

>
> Cheers,
> Michele
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
  2018-06-26 15:40     ` Julia Lawall
@ 2018-06-27 22:23       ` Michele Martone
  2018-06-27 22:52         ` Julia Lawall
  0 siblings, 1 reply; 8+ messages in thread
From: Michele Martone @ 2018-06-27 22:23 UTC (permalink / raw)
  To: cocci

On 20180626 at 17:40, Julia Lawall wrote:
> On Tue, 26 Jun 2018, Michele Martone wrote:
> > On 20180626 at 13:03, Julia Lawall wrote:
> > > On Mon, 25 Jun 2018, Michele Martone wrote:
> > > > Dear Coccinelle Team,
> > > >
> > > > While patching source files of a few thousand lines long, I
> > > > noticed prohibitively long patch compute times (seemed hanged).
> > > >
> > > > This effectively prevented spatch from being usable.
> > > >
> > > > I attach a minimalistic program and patch replicating the problem.
> > > >
> > > > It seems like presence of uninitialized variables  and/or
> > > > a loops body might slow down spatch computation extremely.
> > > >
> > > > I will be grateful of any support!
> >
> > Hi Julia,
> >
> > > Loops can cause the matching process to become very expensive.
> >
> > With your suggeston below (thanks!) I was able to go around
> > the problem (see comments) !
> >
> > However, I hope that this behaviour is unintended.
> >
> > I mean: to practical means, loop presence, or a growing amount of
> > uninitialized variables leading to supra-linear patch compute
> > times is a game-stopper: it can IMHO severely scare users...
> > I hope it is some algorithmic limitation that can be overcome..
> 
> It is an algorithmic limitation, but I don't think it can be overcome in
> general.  Coccinelle follows control-flow paths.  When A ... B matches a
> case where there is a loop between A and B, then there is an infinite loop
> that blocks going from A to B.  The solution is to use what is called weak
> until (Coccinelle matching is based on computational tree logic).  This
> inlvoves negating everything and somehow reaching the fixed point in terms
> of the code outside the loop, rather than the code inside the loop.  This
> easily explodes.  A solution could be --no-loops, since you don't care
> about situations that start at the end of a loop and jump around to the
> top in your case.  You can also use a timeout, to get an overall result
> quickly at the risk of missing some results.  You can also reduce the
> problem by having one rule that matches a declaration, and another rule
> that makes the transformation for each declared variable individually;
> currently paths for different variables are interfering with each other
> and further inflating the cost.  You can also check whether there exists a
> path from the declaration to a variable use, by putting exists in the
> header, to be able to ignore variables that are simply never used.
Hmmm I see, these are the innards... thanks!
But ..  for what C is concerned, the presence of a loop construct 
between a variable declaration and its use after the loop shall not 
be of any problem wrt type and visibility of that variable identifier.
So in principle in the future one might think to follow this logic ---
at least for C --- if desired --- no ?

> > > I was about to propose various solutionsto get around the loop problem,
> > > but I think you don't care.
> > I do care --- I am open to further techniques --- they might very likel
> > y come in handy soon ;-).
> > So you may send them, please.
> >
> > > You just want to know the type of I.  Hence:
> > >
> > > @@
> > > type T;
> > > idexpression T I;
> > > identifier f;
> > > @@
> > >
> > > I =
> > > + (T)
> > >   f(...)
> >
> > Now the POC code gets patched in a fraction of a second.
> > And a 1.2KLOC source with 17 occurrences of 'f' in just <1s.
> > So my practical problem here is solved: thanks!
> >
> > I have an extra question.
> > I observed that applying:
> >  spatch --sp-file <patch above> <120 files totalling 140 KLOC>
> > seems to take >4 minutes time and consuming > 5.5 GB of memory;
> > I reran on each file separately, concatenating the patches.
> > Then it finishes in ~40s, computing the same exact patch.
> >
> > Given such a simple patch, was this expected to be so ?
> 
> When you put many files on the command line, it works on the all at once,
> in order to be able to take into account references from oneto the other.
> Since there are 120 of them, that will be a lot of live memory which will
> stress the OCaml garbage collector.  When you run on the files
> individually the garbage collector is happy - the OCaml GC works well when
> it only has to collect the minor heap.
Ok, I see..

> Normally, when you have many files to work on, you give the name of the
> enclosing directory and Coccinelle works on all of them.  Then it works
> one file at a time.  If that is not suitable for your use case, then you
> can make a file with the names of the files you are interested in,
> separated by blank lines, ad git the argument --file-groups filename.
> Then it will only work on the listed files, again one at a time.
Sounds like a very useful option!

I did not find it in `man spatch'.
That is a pity.

I have two suggestions for `man spatch':
 
 1.
  SYNOPSIS
  spatch --sp-file <SP> <files> [-o <outfile> ] [--iso-file <iso> ] [ options ]
  +spatch --help 
  +spatch --longhelp 

  (otherwise --longhelp is buried in the man page and difficult to see1)
 
  2. Keep the `man spatch' page in sync with the actual options.
     I'm sure for OCaml wizards like you this shall be pretty easy to do 
     automatically ;-)

Thanks,
Michele

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cocci] coccinelle computes patch very slowly
  2018-06-27 22:23       ` Michele Martone
@ 2018-06-27 22:52         ` Julia Lawall
  0 siblings, 0 replies; 8+ messages in thread
From: Julia Lawall @ 2018-06-27 22:52 UTC (permalink / raw)
  To: cocci



On Thu, 28 Jun 2018, Michele Martone wrote:

> On 20180626 at 17:40, Julia Lawall wrote:
> > On Tue, 26 Jun 2018, Michele Martone wrote:
> > > On 20180626 at 13:03, Julia Lawall wrote:
> > > > On Mon, 25 Jun 2018, Michele Martone wrote:
> > > > > Dear Coccinelle Team,
> > > > >
> > > > > While patching source files of a few thousand lines long, I
> > > > > noticed prohibitively long patch compute times (seemed hanged).
> > > > >
> > > > > This effectively prevented spatch from being usable.
> > > > >
> > > > > I attach a minimalistic program and patch replicating the problem.
> > > > >
> > > > > It seems like presence of uninitialized variables  and/or
> > > > > a loops body might slow down spatch computation extremely.
> > > > >
> > > > > I will be grateful of any support!
> > >
> > > Hi Julia,
> > >
> > > > Loops can cause the matching process to become very expensive.
> > >
> > > With your suggeston below (thanks!) I was able to go around
> > > the problem (see comments) !
> > >
> > > However, I hope that this behaviour is unintended.
> > >
> > > I mean: to practical means, loop presence, or a growing amount of
> > > uninitialized variables leading to supra-linear patch compute
> > > times is a game-stopper: it can IMHO severely scare users...
> > > I hope it is some algorithmic limitation that can be overcome..
> >
> > It is an algorithmic limitation, but I don't think it can be overcome in
> > general.  Coccinelle follows control-flow paths.  When A ... B matches a
> > case where there is a loop between A and B, then there is an infinite loop
> > that blocks going from A to B.  The solution is to use what is called weak
> > until (Coccinelle matching is based on computational tree logic).  This
> > inlvoves negating everything and somehow reaching the fixed point in terms
> > of the code outside the loop, rather than the code inside the loop.  This
> > easily explodes.  A solution could be --no-loops, since you don't care
> > about situations that start at the end of a loop and jump around to the
> > top in your case.  You can also use a timeout, to get an overall result
> > quickly at the risk of missing some results.  You can also reduce the
> > problem by having one rule that matches a declaration, and another rule
> > that makes the transformation for each declared variable individually;
> > currently paths for different variables are interfering with each other
> > and further inflating the cost.  You can also check whether there exists a
> > path from the declaration to a variable use, by putting exists in the
> > header, to be able to ignore variables that are simply never used.
> Hmmm I see, these are the innards... thanks!
> But ..  for what C is concerned, the presence of a loop construct
> between a variable declaration and its use after the loop shall not
> be of any problem wrt type and visibility of that variable identifier.
> So in principle in the future one might think to follow this logic ---
> at least for C --- if desired --- no ?

Coccinelle is not going to have a special case for this.  You are free to
use the --no-loops option.

julia

>
> > > > I was about to propose various solutionsto get around the loop problem,
> > > > but I think you don't care.
> > > I do care --- I am open to further techniques --- they might very likel
> > > y come in handy soon ;-).
> > > So you may send them, please.
> > >
> > > > You just want to know the type of I.  Hence:
> > > >
> > > > @@
> > > > type T;
> > > > idexpression T I;
> > > > identifier f;
> > > > @@
> > > >
> > > > I =
> > > > + (T)
> > > >   f(...)
> > >
> > > Now the POC code gets patched in a fraction of a second.
> > > And a 1.2KLOC source with 17 occurrences of 'f' in just <1s.
> > > So my practical problem here is solved: thanks!
> > >
> > > I have an extra question.
> > > I observed that applying:
> > >  spatch --sp-file <patch above> <120 files totalling 140 KLOC>
> > > seems to take >4 minutes time and consuming > 5.5 GB of memory;
> > > I reran on each file separately, concatenating the patches.
> > > Then it finishes in ~40s, computing the same exact patch.
> > >
> > > Given such a simple patch, was this expected to be so ?
> >
> > When you put many files on the command line, it works on the all at once,
> > in order to be able to take into account references from oneto the other.
> > Since there are 120 of them, that will be a lot of live memory which will
> > stress the OCaml garbage collector.  When you run on the files
> > individually the garbage collector is happy - the OCaml GC works well when
> > it only has to collect the minor heap.
> Ok, I see..
>
> > Normally, when you have many files to work on, you give the name of the
> > enclosing directory and Coccinelle works on all of them.  Then it works
> > one file at a time.  If that is not suitable for your use case, then you
> > can make a file with the names of the files you are interested in,
> > separated by blank lines, ad git the argument --file-groups filename.
> > Then it will only work on the listed files, again one at a time.
> Sounds like a very useful option!
>
> I did not find it in `man spatch'.
> That is a pity.
>
> I have two suggestions for `man spatch':
>
>  1.
>   SYNOPSIS
>   spatch --sp-file <SP> <files> [-o <outfile> ] [--iso-file <iso> ] [ options ]
>   +spatch --help
>   +spatch --longhelp
>
>   (otherwise --longhelp is buried in the man page and difficult to see1)
>
>   2. Keep the `man spatch' page in sync with the actual options.
>      I'm sure for OCaml wizards like you this shall be pretty easy to do
>      automatically ;-)
>
> Thanks,
> Michele
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-27 22:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-25 15:44 [Cocci] coccinelle computes patch very slowly Michele Martone
     [not found] ` <ff255650-b3d6-c1dc-53ef-8e6e715c5077@users.sourceforge.net>
     [not found]   ` <20180626102638.GD14796@localhost>
2018-06-26 10:32     ` Michele Martone
2018-06-26 10:56       ` Julia Lawall
2018-06-26 11:03 ` Julia Lawall
2018-06-26 15:27   ` Michele Martone
2018-06-26 15:40     ` Julia Lawall
2018-06-27 22:23       ` Michele Martone
2018-06-27 22:52         ` Julia Lawall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).