From: Jaskaran Singh <jaskaran.singh@collabora.com>
To: cocci@systeme.lip6.fr
Subject: Re: [Cocci] [PATCH v2 0/3] parsing_c: Optimize recursive header file parsing
Date: Thu, 10 Sep 2020 11:49:41 +0530 [thread overview]
Message-ID: <642305cb62e2920e1a02e5c6f623fcc073df240d.camel@collabora.com> (raw)
In-Reply-To: <20200910061703.2397-1-jaskaran.singh@collabora.com>
On Thu, 2020-09-10 at 11:47 +0530, Jaskaran Singh wrote:
> This patch series aims to optimize performance for recursively
> parsing
> header files in Coccinelle.
>
> Coccinelle's C parsing subsystem has an option called --recursive-
> includes
> to recursively parse header files. This is used for type
> inference/annotation.
>
> Previously, using --recursive-includes on the entire Linux kernel
> source
> code would take far too long. On my computer with the following
> specs,
> - Processor: AMD Ryzen 5 3550H
> - RAM: 8 GB
> it would take close to 7 hours to complete. The optimization that
> this
> patch series implements reduces that time to 1 hour.
>
> The following is a high-level description of what has been
> implemented:
> - As header files are recursively parsed, they are scanned for the
> following:
> - fields of structs/unions/enums
> - typedefs
> - function prototypes
> - global variables
> The names of the above are stored in a "name cache", i.e. a
> hashtable to
> map the name to the files it is declared in.
> - A dependency graph is built to determine dependencies between all
> the
> files in the codebase.
> - In the type annotation phase of the C subsystem, if a function
> call,
> struct/union field or identifier is encountered, the type of which
> is
> not known to the annoter, the name cache is checked for the name.
> - The name cache gives a list of files that the name is
> declared/defined
> in. These files are cross checked with the dependency graph to
> determine if any of these are reachable by the file that the
> annoter is
> working on.
> - If a reachable header file is found, that file is parsed and the
> type
> associated to the name is returned.
>
> Different approaches that were attempted to alleviate this issue, and
> the
> problems with each are as follows:
> - Caching the most recently used files: A LRU cache to store ASTs of
> the
> most recently encountered header files. The problem with this
> approach
> is the amount of memory it takes to cache the header file ASTs.
> - Caching the most troublesome files: A pseudo-LFU cache to store
> files
> that cumulatively take the longest to parse, and thus bloat the
> time
> taken. The problem with this approach is the amount of memory it
> takes
> to cache the header file ASTs.
> - Skipping unparsable locations in header files: Skipping top-level
> items
> in a header file that cannot be parsed. This approach does not
> produce
> even close to the amount of optimization needed.
>
> The next step from here would be:
> - Maintain a small but persistent cache of header files in groups of
> directories. Leverage multiprocessing for parsing these header
> files.
> - Leverage multiprocessing to parse header files initially for name
> extraction.
> - Performing some initial matching with the semantic patch to
> determine if
> a C file matches. If matches are found, call the annoter and
> recursively
> parse header files for type annotation.
> - Recursively parse all header files only once and build a large type
> environment. Use the dependency graph to determine reachability.
> This
> has potential memory usage issues though.
>
>
> Changes in v2:
> --------------
> - Change occurences of 'begin' and 'match' on the same line with
> something else
> to the next line for better readability.
>
>
> Makefile | 2
> parsing_c/includes_cache.ml | 286
> +++++++++++++++++++++++++++++++++++++++++++
> parsing_c/includes_cache.mli | 47 +++++++
> parsing_c/parse_c.ml | 27 +++-
> parsing_c/type_annoter_c.ml | 130 ++++++++++++++++---
> 5 files changed, 466 insertions(+), 26 deletions(-)
>
Yikes, the diffstat is the old one. Here's the latest (not very
different)
Makefile | 2
parsing_c/includes_cache.ml | 290
+++++++++++++++++++++++++++++++++++++++++++
parsing_c/includes_cache.mli | 47 ++++++
parsing_c/parse_c.ml | 28 +++-
parsing_c/type_annoter_c.ml | 134 +++++++++++++++++--
5 files changed, 475 insertions(+), 26 deletions(-)
>
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci
next prev parent reply other threads:[~2020-09-10 6:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-10 6:17 [Cocci] [PATCH v2 0/3] parsing_c: Optimize recursive header file parsing Jaskaran Singh
2020-09-10 6:17 ` [Cocci] [PATCH v2 1/3] parsing_c: includes_cache: Implement a name cache Jaskaran Singh
2020-09-10 6:17 ` [Cocci] [PATCH v2 2/3] parsing_c: parse_c: Build name cache and includes dependency graph Jaskaran Singh
2020-09-10 6:17 ` [Cocci] [PATCH v2 3/3] parsing_c: type_annoter_c: Use name cache for type annotation Jaskaran Singh
2020-09-10 6:19 ` Jaskaran Singh [this message]
2020-09-15 19:23 ` [Cocci] [PATCH v2 0/3] parsing_c: Optimize recursive header file parsing Julia Lawall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=642305cb62e2920e1a02e5c6f623fcc073df240d.camel@collabora.com \
--to=jaskaran.singh@collabora.com \
--cc=cocci@systeme.lip6.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).