* [cocci] Semantic diff?
@ 2022-03-19 17:48 Eric Wheeler
2022-03-19 17:56 ` Julia Lawall
0 siblings, 1 reply; 2+ messages in thread
From: Eric Wheeler @ 2022-03-19 17:48 UTC (permalink / raw)
To: cocci
Hello all,
Traditional diff would show the following two files very different, but
Coccinelle understands the syntax so it might be able create a smarter
diff.
t1.c:
f(
){
}
t2.c:
f(){}
How does Coccinelle do diffs internally? Does it parse the whole syntax
tree and then walk the old and new trees to show the difference when it
writes patches to stdout?
If so, then perhaps implementing `smpl-diff` would be trivial. Just load
two files and compare them with the existing internal diff logic.
Here is the application:
NEC2 was originally written in Fortran and there have been two different
ports to C from the original Fortran (xec2c and necpp).
The variable and function names are similar (usually exactly identical).
However, the authors chose different data structures for global values.
Still, the program flow is almost always the same.
Is there a way to diff two C implementations to see if there are any
actual differences, not just differences in naming convention?
It seems that it could be possible, Coccinelle has structural awareness
and understands datatypes.
Then bugs in one program (or the other) caused by author error while
porting can be detected through such a static analysis. A human could then
then be compare the C code implementations to the original Fortran to see
which one is correct, or if the syntactically different representation was
computationally equivalent.
For example, these two samples compute the same thing but comments,
floating point notation, and the storage of variables like icon1 and ind1
differ:
necpp:
if( -icon1[iprx] != jx )
ind1=2;
else
{
xi= fabsl( cabj* cab[iprx]+ sabj* sab[iprx]+ salpj* salp[iprx]);
if( (xi < 0.999999) || (fabsl(bi[iprx]/b-1.) > 1.e-6) )
ind1=2;
else
ind1=0;
}
xnec2c:
if( -data.icon1[iprx] != jx )
dataj.ind1=2;
else
{
xi= fabs( dataj.cabj* data.cab[iprx]+ dataj.sabj*
data.sab[iprx]+ dataj.salpj* data.salp[iprx]);
if( (xi < 0.999999) ||
(fabs(data.bi[iprx]/dataj.b-1.0) > 1.0e-6) )
dataj.ind1=2;
else
dataj.ind1=0;
} /* if( -data.icon1[iprx] != jx ) */
--
Eric Wheeler
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [cocci] Semantic diff?
2022-03-19 17:48 [cocci] Semantic diff? Eric Wheeler
@ 2022-03-19 17:56 ` Julia Lawall
0 siblings, 0 replies; 2+ messages in thread
From: Julia Lawall @ 2022-03-19 17:56 UTC (permalink / raw)
To: Eric Wheeler; +Cc: cocci
On Sat, 19 Mar 2022, Eric Wheeler wrote:
> Hello all,
>
> Traditional diff would show the following two files very different, but
> Coccinelle understands the syntax so it might be able create a smarter
> diff.
>
> t1.c:
> f(
> ){
> }
>
> t2.c:
> f(){}
>
> How does Coccinelle do diffs internally? Does it parse the whole syntax
> tree and then walk the old and new trees to show the difference when it
> writes patches to stdout?
It looks to see if there are differences in the tokens, and then if there
are any, it runs standard diff.
Maybe you want a rule like:
@@
parameter list pl;
statement list sl;
@@
f(
- pl
+ pl
) {
-sl
+sl
}
Then run spatch with the option: --force-diff
The resulting patch will use spatch to pretty print the file. Then you
can use normal diff (or maybe something like ediff in emacs) to see the
differences, without being bothered with newline issues.
julia
>
> If so, then perhaps implementing `smpl-diff` would be trivial. Just load
> two files and compare them with the existing internal diff logic.
>
> Here is the application:
>
> NEC2 was originally written in Fortran and there have been two different
> ports to C from the original Fortran (xec2c and necpp).
>
> The variable and function names are similar (usually exactly identical).
> However, the authors chose different data structures for global values.
> Still, the program flow is almost always the same.
>
> Is there a way to diff two C implementations to see if there are any
> actual differences, not just differences in naming convention?
>
> It seems that it could be possible, Coccinelle has structural awareness
> and understands datatypes.
>
> Then bugs in one program (or the other) caused by author error while
> porting can be detected through such a static analysis. A human could then
> then be compare the C code implementations to the original Fortran to see
> which one is correct, or if the syntactically different representation was
> computationally equivalent.
>
> For example, these two samples compute the same thing but comments,
> floating point notation, and the storage of variables like icon1 and ind1
> differ:
>
> necpp:
> if( -icon1[iprx] != jx )
> ind1=2;
> else
> {
> xi= fabsl( cabj* cab[iprx]+ sabj* sab[iprx]+ salpj* salp[iprx]);
> if( (xi < 0.999999) || (fabsl(bi[iprx]/b-1.) > 1.e-6) )
> ind1=2;
> else
> ind1=0;
> }
>
> xnec2c:
> if( -data.icon1[iprx] != jx )
> dataj.ind1=2;
> else
> {
> xi= fabs( dataj.cabj* data.cab[iprx]+ dataj.sabj*
> data.sab[iprx]+ dataj.salpj* data.salp[iprx]);
> if( (xi < 0.999999) ||
> (fabs(data.bi[iprx]/dataj.b-1.0) > 1.0e-6) )
> dataj.ind1=2;
> else
> dataj.ind1=0;
> } /* if( -data.icon1[iprx] != jx ) */
>
>
> --
> Eric Wheeler
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-03-19 17:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-19 17:48 [cocci] Semantic diff? Eric Wheeler
2022-03-19 17:56 ` Julia Lawall
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.