All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cocci] parsing of C code
@ 2017-08-22 21:15 Julia Lawall
  2017-08-22 21:39 ` Derek M Jones
  2017-08-24  9:50 ` SF Markus Elfring
  0 siblings, 2 replies; 17+ messages in thread
From: Julia Lawall @ 2017-08-22 21:15 UTC (permalink / raw)
  To: cocci

Hello,

I have tried to improve the parsing of C code recently.  The main changes,
currently available on github, are as follows:

1.  More aggressive inclusion of header files, combined with caching of
header files.  Now if there is only one occurrence of a header file with a
given name in the provided include paths, it will take that one, even if
there is no obvious connection between the location of the .c file and the
location of the header file.  This compensates for the lact of parsing of
Makefiles to extract -I options.  More header files will likely now be
included, particularly with options like --all-includes or
--recursive-includes.  But caching of previously parsed header files has
been reinstated, which improves performance.  This had been removed
because it wasn't doing nested includes, even if the --recursive-includes
option was provided, but that issue has been addressed.

2. If there is a parse error within the arguments of a function call, the
arguments are ignored, but not the entire enclosing functions definition,
as was done previously.  For the Linux kernel, this seemed to allow
thousands of extra lines of code to be parsed and matched by Coccinelle.

I have only tested this on the Linux kernel.  If you are using some other
software, you can run the following semantic patch on your software using
your current and the new versions of Coccinelle:

@r@
identifier f;
position p;
@@

f at p(...) { ... }

@script:ocaml@
f << r.f;
p << r.p;
@@

Printf.printf "%s:%d: %s\n" (List.hd p).file (List.hd p).line f;
flush stdout

If some functions are missing in the output when using the new version, as
compared to the output when using the old version, and if these functions
are things you might want to process in some way, thenlet me know about
the problem.

thanks,
julia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-22 21:15 [Cocci] parsing of C code Julia Lawall
@ 2017-08-22 21:39 ` Derek M Jones
  2017-08-23  6:43   ` Julia Lawall
  2017-08-23  9:09   ` Julia Lawall
  2017-08-24  9:50 ` SF Markus Elfring
  1 sibling, 2 replies; 17+ messages in thread
From: Derek M Jones @ 2017-08-22 21:39 UTC (permalink / raw)
  To: cocci

Julia,

> 1.  More aggressive inclusion of header files, combined with caching of
> header files.  Now if there is only one occurrence of a header file with a

Caching can be dangerous because macros may be defined differently
for different includes of the same header.  An option to switch off
caching could come in handy.

> given name in the provided include paths, it will take that one, even if
> there is no obvious connection between the location of the .c file and the
> location of the header file.  This compensates for the lact of parsing of
> Makefiles to extract -I options.  More header files will likely now be

There is no need to parse Makefiles.  Simply create a script, say coc99,
and configure it as the compiler that make uses.  coc99 parses its
arguments to extract whatever information cocci needs and passes
everything to the 'real' C compiler as-is.

How is the O'Reilly book coming along ;-)

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-22 21:39 ` Derek M Jones
@ 2017-08-23  6:43   ` Julia Lawall
  2017-08-23 15:48     ` Derek M Jones
  2017-08-24  9:55     ` SF Markus Elfring
  2017-08-23  9:09   ` Julia Lawall
  1 sibling, 2 replies; 17+ messages in thread
From: Julia Lawall @ 2017-08-23  6:43 UTC (permalink / raw)
  To: cocci



On Tue, 22 Aug 2017, Derek M Jones wrote:

> Julia,
>
> > 1.  More aggressive inclusion of header files, combined with caching of
> > header files.  Now if there is only one occurrence of a header file with a
>
> Caching can be dangerous because macros may be defined differently
> for different includes of the same header.  An option to switch off
> caching could come in handy.

I believe that macros are only applied if they are defined in the same
file.  At this point, I'm not 100% certain of that.  But when I ran the
provided test script on the Linux kernel, I list very few functions as
compared to the current --recursive-includes implementation.  The lost
functions were due to the recovery strategy, not to macro issues.
The Linux kernel has some declarations like

type fn(a1, a2)

with no trailing ; followed by a function definition.  This gives a parse
error on a1.  In the previous version, Coccinelle was able to recover an
parse the function definition.  In the new version, I comment out the
arguments and then another error is encountered that is after the ).  This
causes the recovery process to skip over the subsequent function.  The
recovery code is rather subtle to avoid infinite loops, and so far my
attempts to improve it have led to more of a mess than anything else.

The problem could be solved by not trying to patch up errors in
parentheses when they occur at top level.  When they occur inside a
function, they should not impact the recovery process.

Basically the recovery process is focused on finding a { in column 0 and
then a } in column 0, and then it goes on after that.  But it can go
backwards sometimes, because a parsing problem can cause a parsing attempt
to read too far, into the next function.

>
> > given name in the provided include paths, it will take that one, even if
> > there is no obvious connection between the location of the .c file and the
> > location of the header file.  This compensates for the lact of parsing of
> > Makefiles to extract -I options.  More header files will likely now be
>
> There is no need to parse Makefiles.  Simply create a script, say coc99,
> and configure it as the compiler that make uses.  coc99 parses its
> arguments to extract whatever information cocci needs and passes
> everything to the 'real' C compiler as-is.

At least for the Linux kernel, you can't just run one make and get all the
files to be compiled.  Some files are indeed very hard to compile.  For a
more uniform project this could work, though.

> How is the O'Reilly book coming along ;-)

Still looking for an author :)

julia

>
> --
> Derek M. Jones           Software analysis
> tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
> _______________________________________________
> Cocci mailing list
> Cocci at systeme.lip6.fr
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-22 21:39 ` Derek M Jones
  2017-08-23  6:43   ` Julia Lawall
@ 2017-08-23  9:09   ` Julia Lawall
  2017-08-24 10:18     ` SF Markus Elfring
  1 sibling, 1 reply; 17+ messages in thread
From: Julia Lawall @ 2017-08-23  9:09 UTC (permalink / raw)
  To: cocci



On Tue, 22 Aug 2017, Derek M Jones wrote:

> Julia,
>
> > 1.  More aggressive inclusion of header files, combined with caching of
> > header files.  Now if there is only one occurrence of a header file with a
>
> Caching can be dangerous because macros may be defined differently
> for different includes of the same header.  An option to switch off
> caching could come in handy.

I can add such an option.

Actually, I noticed that unfolding macros can sometimes hurt more than it
helps.  When Coccinelle decides to unfold macros it unfolds them for all
the rest of the code, but it only unfolds one level of macro.  Sometimes
the one-level unfolding is weirder than the original and the parser error
that triggered unfoding is not related to macros anyway.  So then the
whole rest of the code has a weird semi-unfolded macro in it for no
purpose, whe it could have compiled normally.  At least for the Linux
kernel, though, this seems like a rare occurrence.

julia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23  6:43   ` Julia Lawall
@ 2017-08-23 15:48     ` Derek M Jones
  2017-08-23 16:07       ` Julia Lawall
  2017-08-24  9:55     ` SF Markus Elfring
  1 sibling, 1 reply; 17+ messages in thread
From: Derek M Jones @ 2017-08-23 15:48 UTC (permalink / raw)
  To: cocci

Julia,

> The Linux kernel has some declarations like
> 
> type fn(a1, a2) >
> with no trailing ; followed by a function definition.  This gives a parse
> error on a1.  In the previous version, Coccinelle was able to recover an

The K&R style.  Perfectly legal C.

> Basically the recovery process is focused on finding a { in column 0 and
> then a } in column 0, and then it goes on after that.  But it can go
> backwards sometimes, because a parsing problem can cause a parsing attempt
> to read too far, into the next function.

Skip to ; is a remarkably effective syntax error recovery strategy.


-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23 15:48     ` Derek M Jones
@ 2017-08-23 16:07       ` Julia Lawall
  2017-08-23 16:54         ` Derek M Jones
  0 siblings, 1 reply; 17+ messages in thread
From: Julia Lawall @ 2017-08-23 16:07 UTC (permalink / raw)
  To: cocci



On Wed, 23 Aug 2017, Derek M Jones wrote:

> Julia,
>
> > The Linux kernel has some declarations like
> >
> > type fn(a1, a2) >
> > with no trailing ; followed by a function definition.  This gives a parse
> > error on a1.  In the previous version, Coccinelle was able to recover an
>
> The K&R style.  Perfectly legal C.

I don't think that is what it is.  The declarations are eg:

static inline __printf(2, 3)
void _dev_info(const struct device *dev, const char *fmt, ...)
{}

I guess that the whole first line is part of the declaration of _dev_info,
but Coccinelle can't cope with the __printf(2, 3).

>
> > Basically the recovery process is focused on finding a { in column 0 and
> > then a } in column 0, and then it goes on after that.  But it can go
> > backwards sometimes, because a parsing problem can cause a parsing attempt
> > to read too far, into the next function.
>
> Skip to ; is a remarkably effective syntax error recovery strategy.

The parser needs to restart at a top-level declaration.  It's a yacc-based
parser.  We can't recover within the parsing process.  Perhaps it would be
possible to remove what was betwen two ;s around the line and column with
the error, but it seems like there could be a risk of making things worse.

julia

>
>
> --
> Derek M. Jones           Software analysis
> tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23 16:07       ` Julia Lawall
@ 2017-08-23 16:54         ` Derek M Jones
  2017-08-23 16:58           ` Julia Lawall
  0 siblings, 1 reply; 17+ messages in thread
From: Derek M Jones @ 2017-08-23 16:54 UTC (permalink / raw)
  To: cocci

Julia,

> I don't think that is what it is.  The declarations are eg:
> 
> static inline __printf(2, 3)
> void _dev_info(const struct device *dev, const char *fmt, ...)
> {}
> 
> I guess that the whole first line is part of the declaration of _dev_info,
> but Coccinelle can't cope with the __printf(2, 3).

https://stackoverflow.com/questions/17825588/what-does-this-generic-function-do

>> Skip to ; is a remarkably effective syntax error recovery strategy.
> 
> The parser needs to restart at a top-level declaration.  It's a yacc-based
> parser.  We can't recover within the parsing process.  Perhaps it would be
> possible to remove what was betwen two ;s around the line and column with
> the error, but it seems like there could be a risk of making things worse.

If it's yacc based you can recover where ever you like.  Knowing how to
do it is something of a black art.

Bison supports a technique that does not require wearing a pointy hat:

stmt_list: error ';' |
            stmt_list error ';' ;

where error represents the something-went-wrong token.


-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23 16:54         ` Derek M Jones
@ 2017-08-23 16:58           ` Julia Lawall
  2017-08-24 10:06             ` SF Markus Elfring
  0 siblings, 1 reply; 17+ messages in thread
From: Julia Lawall @ 2017-08-23 16:58 UTC (permalink / raw)
  To: cocci



On Wed, 23 Aug 2017, Derek M Jones wrote:

> Julia,
>
> > I don't think that is what it is.  The declarations are eg:
> >
> > static inline __printf(2, 3)
> > void _dev_info(const struct device *dev, const char *fmt, ...)
> > {}
> >
> > I guess that the whole first line is part of the declaration of _dev_info,
> > but Coccinelle can't cope with the __printf(2, 3).
>
> https://stackoverflow.com/questions/17825588/what-does-this-generic-function-do
>
> > > Skip to ; is a remarkably effective syntax error recovery strategy.
> >
> > The parser needs to restart at a top-level declaration.  It's a yacc-based
> > parser.  We can't recover within the parsing process.  Perhaps it would be
> > possible to remove what was betwen two ;s around the line and column with
> > the error, but it seems like there could be a risk of making things worse.
>
> If it's yacc based you can recover where ever you like.  Knowing how to
> do it is something of a black art.

Well, ocamlyacc, to be precise.

>
> Bison supports a technique that does not require wearing a pointy hat:
>
> stmt_list: error ';' |
>            stmt_list error ';' ;
>
> where error represents the something-went-wrong token.

This does look like a nice feature.

julia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-22 21:15 [Cocci] parsing of C code Julia Lawall
  2017-08-22 21:39 ` Derek M Jones
@ 2017-08-24  9:50 ` SF Markus Elfring
  1 sibling, 0 replies; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24  9:50 UTC (permalink / raw)
  To: cocci

> I have tried to improve the parsing of C code recently.

This information is useful.


> 1.  More aggressive inclusion of header files,

Why do you need to become ?aggressive? there when the corresponding
data processing should be just correct?


> combined with caching of header files.

How do you think about to configure this approach with special parameters?


> flush stdout

Can such a function call be omitted here?


> If some functions are missing in the output when using the new version,
> as compared to the output when using the old version, and if these functions
> are things you might want to process in some way, thenlet me know about
> the problem.

I became more curious on the run time characteristics (as you already know)
for the presented software evolution. I stumbled on some execution challenges
during the application of special source code search patterns.
How would they increase because of the desired inclusion of additional
source (or header) files for analysis approaches which would need them?

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23  6:43   ` Julia Lawall
  2017-08-23 15:48     ` Derek M Jones
@ 2017-08-24  9:55     ` SF Markus Elfring
  1 sibling, 0 replies; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24  9:55 UTC (permalink / raw)
  To: cocci

> At least for the Linux kernel, you can't just run one make and get all the
> files to be compiled.  Some files are indeed very hard to compile.

How do you think about to point any specific source code examples out
which you find a bit too challenging so far?

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23 16:58           ` Julia Lawall
@ 2017-08-24 10:06             ` SF Markus Elfring
  0 siblings, 0 replies; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24 10:06 UTC (permalink / raw)
  To: cocci

>> If it's yacc based you can recover where ever you like.  Knowing how to
>> do it is something of a black art.

> Well, ocamlyacc, to be precise.

Can the software ?Menhir? help any more for the needed data processing?
http://gallium.inria.fr/%7Efpottier/menhir/

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-23  9:09   ` Julia Lawall
@ 2017-08-24 10:18     ` SF Markus Elfring
  2017-08-24 10:22       ` Julia Lawall
  0 siblings, 1 reply; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24 10:18 UTC (permalink / raw)
  To: cocci

> Actually, I noticed that unfolding macros can sometimes hurt more than it helps.

Would you like to discuss (or explain) involved implementation details
and configuration parameters any more?

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-24 10:18     ` SF Markus Elfring
@ 2017-08-24 10:22       ` Julia Lawall
  2017-08-24 11:10         ` SF Markus Elfring
  2017-08-24 15:23         ` SF Markus Elfring
  0 siblings, 2 replies; 17+ messages in thread
From: Julia Lawall @ 2017-08-24 10:22 UTC (permalink / raw)
  To: cocci



On Thu, 24 Aug 2017, SF Markus Elfring wrote:

> > Actually, I noticed that unfolding macros can sometimes hurt more than it helps.
>
> Would you like to discuss (or explain) involved implementation details
> and configuration parameters any more?

#define FOO(x) BAR(x,+,y)

int foo() __xxx(yyy) {
  FOO(12);
}

int xyz() {
  FOO(15);
}

Parsing of foo fails due to the attribute __xxx(yyy) that Coccinelle is
not able to cope with.  Coccinele hopes that expanding macros will solve
the problem.  Now the code becomes:

#define FOO(x) BAR(x,+,y)

int foo() __xxx(yyy) {
  BAR(12,+,y);
}

int xyz() {
  BAR(15,+,y);
}

Now foo is still unparsable, but xyz is unparsable as well.  It would have
been better off with FOO(15);

julia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-24 10:22       ` Julia Lawall
@ 2017-08-24 11:10         ` SF Markus Elfring
  2017-08-24 15:23         ` SF Markus Elfring
  1 sibling, 0 replies; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24 11:10 UTC (permalink / raw)
  To: cocci

> Parsing of foo fails due to the attribute __xxx(yyy) that Coccinelle is
> not able to cope with.

Why does the parsing software struggle with such input data so far?


> Coccinele hopes that expanding macros will solve the problem.

Why do you need to ?hope? something if the software could be designed
in the way to take also special care with these source code places?

Do you eventually stumble on target conflicts because of language processing
difficulties which could be explained from the Chomsky?Sch?tzenberger hierarchy?

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-24 10:22       ` Julia Lawall
  2017-08-24 11:10         ` SF Markus Elfring
@ 2017-08-24 15:23         ` SF Markus Elfring
  2017-08-24 15:41           ` Derek M Jones
  1 sibling, 1 reply; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24 15:23 UTC (permalink / raw)
  To: cocci

> Parsing of foo fails due to the attribute __xxx(yyy) that Coccinelle is
> not able to cope with.

* Do you find information relevant from answers to a question like
  ?Context-free grammars versus context-sensitive grammars???
  https://stackoverflow.com/questions/8236422/context-free-grammars-versus-context-sensitive-grammars#answer-8250104

* Do the ?attributes? which you would like to support trigger a need
  to work with context-dependent grammars?

* Did you ask any other developers (or software designers) for
  possible solutions around the mentioned aspect?

* Will the software situation improve any more also for the
  programming language ?OCaml? (besides tools like ?Menhir?)?

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-24 15:23         ` SF Markus Elfring
@ 2017-08-24 15:41           ` Derek M Jones
  2017-08-24 16:14             ` SF Markus Elfring
  0 siblings, 1 reply; 17+ messages in thread
From: Derek M Jones @ 2017-08-24 15:41 UTC (permalink / raw)
  To: cocci

On 24/08/2017 16:23, SF Markus Elfring wrote:

More importantly; does Julia like red jelly beans more than blue jelly
beans?

> * Do you find information relevant from answers to a question like
>    ?Context-free grammars versus context-sensitive grammars???
>    https://stackoverflow.com/questions/8236422/context-free-grammars-versus-context-sensitive-grammars#answer-8250104
> 
> * Do the ?attributes? which you would like to support trigger a need
>    to work with context-dependent grammars?
> 
> * Did you ask any other developers (or software designers) for
>    possible solutions around the mentioned aspect?
> 
> * Will the software situation improve any more also for the
>    programming language ?OCaml? (besides tools like ?Menhir?)?
> 
> Regards,
> Markus
> 

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Cocci] parsing of C code
  2017-08-24 15:41           ` Derek M Jones
@ 2017-08-24 16:14             ` SF Markus Elfring
  0 siblings, 0 replies; 17+ messages in thread
From: SF Markus Elfring @ 2017-08-24 16:14 UTC (permalink / raw)
  To: cocci

> More importantly; does Julia like red jelly beans more than blue jelly beans?

Would you like to discuss favourite sweets more than to clarify
further improvements in parsing technology also for application
together with the Coccinelle software?   ;-)

Regards,
Markus

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-08-24 16:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22 21:15 [Cocci] parsing of C code Julia Lawall
2017-08-22 21:39 ` Derek M Jones
2017-08-23  6:43   ` Julia Lawall
2017-08-23 15:48     ` Derek M Jones
2017-08-23 16:07       ` Julia Lawall
2017-08-23 16:54         ` Derek M Jones
2017-08-23 16:58           ` Julia Lawall
2017-08-24 10:06             ` SF Markus Elfring
2017-08-24  9:55     ` SF Markus Elfring
2017-08-23  9:09   ` Julia Lawall
2017-08-24 10:18     ` SF Markus Elfring
2017-08-24 10:22       ` Julia Lawall
2017-08-24 11:10         ` SF Markus Elfring
2017-08-24 15:23         ` SF Markus Elfring
2017-08-24 15:41           ` Derek M Jones
2017-08-24 16:14             ` SF Markus Elfring
2017-08-24  9:50 ` SF Markus Elfring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.