All of lore.kernel.org
 help / color / mirror / Atom feed
* [cocci] Semantic diff?
@ 2022-03-19 17:48 Eric Wheeler
  2022-03-19 17:56 ` Julia Lawall
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Wheeler @ 2022-03-19 17:48 UTC (permalink / raw)
  To: cocci

Hello all,

Traditional diff would show the following two files very different, but 
Coccinelle understands the syntax so it might be able create a smarter 
diff.

t1.c:
	f(
	){
	}

t2.c:
	f(){}

How does Coccinelle do diffs internally?  Does it parse the whole syntax 
tree and then walk the old and new trees to show the difference when it 
writes patches to stdout?

If so, then perhaps implementing `smpl-diff` would be trivial.  Just load 
two files and compare them with the existing internal diff logic.

Here is the application:

NEC2 was originally written in Fortran and there have been two different 
ports to C from the original Fortran (xec2c and necpp).

The variable and function names are similar (usually exactly identical). 
However, the authors chose different data structures for global values.  
Still, the program flow is almost always the same.

Is there a way to diff two C implementations to see if there are any 
actual differences, not just differences in naming convention?

It seems that it could be possible, Coccinelle has structural awareness 
and understands datatypes.

Then bugs in one program (or the other) caused by author error while 
porting can be detected through such a static analysis. A human could then 
then be compare the C code implementations to the original Fortran to see 
which one is correct, or if the syntactically different representation was 
computationally equivalent.

For example, these two samples compute the same thing but comments, 
floating point notation, and the storage of variables like icon1 and ind1 
differ:

necpp:
     if( -icon1[iprx] != jx )
        ind1=2;
      else
      {
        xi= fabsl( cabj* cab[iprx]+ sabj* sab[iprx]+ salpj* salp[iprx]);
        if( (xi < 0.999999) || (fabsl(bi[iprx]/b-1.) > 1.e-6) )
          ind1=2;
        else
          ind1=0;
      }

xnec2c:
      if( -data.icon1[iprx] != jx )                       
        dataj.ind1=2;                                     
      else                                                
      {                                                   
        xi= fabs( dataj.cabj* data.cab[iprx]+ dataj.sabj* 
            data.sab[iprx]+ dataj.salpj* data.salp[iprx]);
        if( (xi < 0.999999) ||                            
            (fabs(data.bi[iprx]/dataj.b-1.0) > 1.0e-6) )  
          dataj.ind1=2;                                   
        else                                              
          dataj.ind1=0;                                                                                             
      } /* if( -data.icon1[iprx] != jx ) */               


--
Eric Wheeler

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [cocci] Semantic diff?
  2022-03-19 17:48 [cocci] Semantic diff? Eric Wheeler
@ 2022-03-19 17:56 ` Julia Lawall
  0 siblings, 0 replies; 2+ messages in thread
From: Julia Lawall @ 2022-03-19 17:56 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci



On Sat, 19 Mar 2022, Eric Wheeler wrote:

> Hello all,
>
> Traditional diff would show the following two files very different, but
> Coccinelle understands the syntax so it might be able create a smarter
> diff.
>
> t1.c:
> 	f(
> 	){
> 	}
>
> t2.c:
> 	f(){}
>
> How does Coccinelle do diffs internally?  Does it parse the whole syntax
> tree and then walk the old and new trees to show the difference when it
> writes patches to stdout?


It looks to see if there are differences in the tokens, and then if there
are any, it runs standard diff.

Maybe you want a rule like:

@@
parameter list pl;
statement list sl;
@@

f(
- pl
+ pl
  ) {
-sl
+sl
}

Then run spatch with the option: --force-diff

The resulting patch will use spatch to pretty print the file.  Then you
can use normal diff (or maybe something like ediff in emacs) to see the
differences, without being bothered with newline issues.

julia


>
> If so, then perhaps implementing `smpl-diff` would be trivial.  Just load
> two files and compare them with the existing internal diff logic.
>
> Here is the application:
>
> NEC2 was originally written in Fortran and there have been two different
> ports to C from the original Fortran (xec2c and necpp).
>
> The variable and function names are similar (usually exactly identical).
> However, the authors chose different data structures for global values.
> Still, the program flow is almost always the same.
>
> Is there a way to diff two C implementations to see if there are any
> actual differences, not just differences in naming convention?
>
> It seems that it could be possible, Coccinelle has structural awareness
> and understands datatypes.
>
> Then bugs in one program (or the other) caused by author error while
> porting can be detected through such a static analysis. A human could then
> then be compare the C code implementations to the original Fortran to see
> which one is correct, or if the syntactically different representation was
> computationally equivalent.
>
> For example, these two samples compute the same thing but comments,
> floating point notation, and the storage of variables like icon1 and ind1
> differ:
>
> necpp:
>      if( -icon1[iprx] != jx )
>         ind1=2;
>       else
>       {
>         xi= fabsl( cabj* cab[iprx]+ sabj* sab[iprx]+ salpj* salp[iprx]);
>         if( (xi < 0.999999) || (fabsl(bi[iprx]/b-1.) > 1.e-6) )
>           ind1=2;
>         else
>           ind1=0;
>       }
>
> xnec2c:
>       if( -data.icon1[iprx] != jx )
>         dataj.ind1=2;
>       else
>       {
>         xi= fabs( dataj.cabj* data.cab[iprx]+ dataj.sabj*
>             data.sab[iprx]+ dataj.salpj* data.salp[iprx]);
>         if( (xi < 0.999999) ||
>             (fabs(data.bi[iprx]/dataj.b-1.0) > 1.0e-6) )
>           dataj.ind1=2;
>         else
>           dataj.ind1=0;
>       } /* if( -data.icon1[iprx] != jx ) */
>
>
> --
> Eric Wheeler
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-03-19 17:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-19 17:48 [cocci] Semantic diff? Eric Wheeler
2022-03-19 17:56 ` Julia Lawall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.