cocci.inria.fr archive mirror
 help / color / mirror / Atom feed
* [Cocci] [RFC PATCH 0/3] parsing_c: Optimize recursive header file parsing
@ 2020-09-09 18:12 Jaskaran Singh
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 1/3] parsing_c: includes_cache: Implement a name cache Jaskaran Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jaskaran Singh @ 2020-09-09 18:12 UTC (permalink / raw)
  To: cocci

This patch series aims to optimize performance for recursively parsing
header files in Coccinelle.

Coccinelle's C parsing subsystem has an option called --recursive-includes
to recursively parse header files. This is used for type
inference/annotation.

Previously, using --recursive-includes on the entire Linux kernel source
code would take far too long. On my computer with the following specs,
	- Processor: AMD Ryzen 5 3550H
	- RAM: 8 GB
it would take close to 7 hours to complete.  The optimization that this
patch series implements reduces that time to 1 hour.

The following is a high-level description of what has been implemented:
- As header files are recursively parsed, they are scanned for the
  following:
	- fields of structs/unions/enums
	- typedefs
	- function prototypes
	- global variables
  The names of the above are stored in a "name cache", i.e. a hashtable to
  map the name to the files it is declared in.
- A dependency graph is built to determine dependencies between all the
  files in the codebase.
- In the type annotation phase of the C subsystem, if a function call,
  struct/union field or identifier is encountered, the type of which is
  not known to the annoter, the name cache is checked for the name.
- The name cache gives a list of files that the name is declared/defined
  in.  These files are cross checked with the dependency graph to
  determine if any of these are reachable by the file that the annoter is
  working on.
- If a reachable header file is found, that file is parsed and the type
  associated to the name is returned.

Different approaches that were attempted to alleviate this issue, and the
problems with each are as follows:
- Caching the most recently used files: A LRU cache to store ASTs of the
  most recently encountered header files. The problem with this approach
  is the amount of memory it takes to cache the header file ASTs.
- Caching the most troublesome files: A pseudo-LFU cache to store files
  that cumulatively take the longest to parse, and thus bloat the time
  taken. The problem with this approach is the amount of memory it takes
  to cache the header file ASTs.
- Skipping unparsable locations in header files: Skipping top-level items
  in a header file that cannot be parsed. This approach does not produce
  even close to the amount of optimization needed.

The next step from here would be:
- Maintain a small but persistent cache of header files in groups of
  directories. Leverage multiprocessing for parsing these header files.
- Leverage multiprocessing to parse header files initially for name
  extraction.
- Performing some initial matching with the semantic patch to determine if
  a C file matches. If matches are found, call the annoter and recursively
  parse header files for type annotation.
- Recursively parse all header files only once and build a large type
  environment. Use the dependency graph to determine reachability. This
  has potential memory usage issues though.


 Makefile                     |    2 
 parsing_c/includes_cache.ml  |  286 +++++++++++++++++++++++++++++++++++++++++++
 parsing_c/includes_cache.mli |   47 +++++++
 parsing_c/parse_c.ml         |   27 +++-
 parsing_c/type_annoter_c.ml  |  130 ++++++++++++++++---
 5 files changed, 466 insertions(+), 26 deletions(-)


_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cocci] [RFC PATCH 1/3] parsing_c: includes_cache: Implement a name cache
  2020-09-09 18:12 [Cocci] [RFC PATCH 0/3] parsing_c: Optimize recursive header file parsing Jaskaran Singh
@ 2020-09-09 18:13 ` Jaskaran Singh
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph Jaskaran Singh
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 3/3] parsing_c: type_annoter_c: Use name cache for type annotation Jaskaran Singh
  2 siblings, 0 replies; 6+ messages in thread
From: Jaskaran Singh @ 2020-09-09 18:13 UTC (permalink / raw)
  To: cocci

Implement a name cache and includes dependency graph to optimize
performance for recursive parsing of header files.

The following is a high-level description of what has been implemented:
- As header files are recursively parsed, they are scanned for the
  following:
	- fields of structs/unions/enums
	- typedefs
	- function prototypes
	- global variables
  The names of the above are stored in a "name cache", i.e. a hashtable
  to map the name to the files it is declared in.
- A dependency graph is built to determine dependencies between all the
  files in the codebase.
- In the type annotation phase of the C subsystem, if a function call,
  struct/union field or identifier is encountered, the type of which is
  not known to the annoter, the name cache is checked for the name.
- The name cache gives a list of files that the name is declared/defined
  in.  These files are cross checked with the dependency graph to
  determine if any of these are reachable by the file that the annoter is
  working on.
- If a reachable header file is found, that file is parsed and all of
  the above listed constructs are extracted from it.

Suggested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Jaskaran Singh <jaskaran.singh@collabora.com>
---
 Makefile                     |   2 +-
 parsing_c/includes_cache.ml  | 286 +++++++++++++++++++++++++++++++++++
 parsing_c/includes_cache.mli |  47 ++++++
 3 files changed, 334 insertions(+), 1 deletion(-)
 create mode 100644 parsing_c/includes_cache.ml
 create mode 100644 parsing_c/includes_cache.mli

diff --git a/Makefile b/Makefile
index e25174413..f8d3424c0 100644
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ SOURCES_parsing_cocci := \
 SOURCES_parsing_c := \
 	token_annot.ml flag_parsing_c.ml parsing_stat.ml \
 	token_c.ml ast_c.ml includes.ml control_flow_c.ml \
-	visitor_c.ml lib_parsing_c.ml control_flow_c_build.ml \
+	visitor_c.ml lib_parsing_c.ml includes_cache.ml control_flow_c_build.ml \
 	pretty_print_c.ml semantic_c.ml lexer_parser.ml parser_c.mly \
 	lexer_c.mll parse_string_c.ml token_helpers.ml token_views_c.ml \
 	cpp_token_c.ml parsing_hacks.ml cpp_analysis_c.ml \
diff --git a/parsing_c/includes_cache.ml b/parsing_c/includes_cache.ml
new file mode 100644
index 000000000..2c2cb0235
--- /dev/null
+++ b/parsing_c/includes_cache.ml
@@ -0,0 +1,286 @@
+(*
+ * This file is part of Coccinelle, licensed under the terms of the GPL v2.
+ * See copyright.txt in the Coccinelle source code for more information.
+ * The Coccinelle source code can be obtained at http://coccinelle.lip6.fr
+ *)
+
+open Common
+module Lib = Lib_parsing_c
+
+(*****************************************************************************)
+(* Wrappers *)
+(*****************************************************************************)
+let pr_inc s =
+  if !Flag_parsing_c.verbose_includes
+  then Common.pr2 s
+
+(*****************************************************************************)
+(* Graph types/modules *)
+(*****************************************************************************)
+
+(* Filenames as keys to check paths from file A to file B. *)
+module Key : Set.OrderedType with type t = Common.filename = struct
+  type t = Common.filename
+  let compare = String.compare
+end
+
+module KeySet = Set.Make (Key)
+
+module KeyMap = Map.Make (Key)
+
+module Node : Set.OrderedType with type t = unit = struct
+  type t = unit
+  let compare = compare
+end
+
+module Edge : Set.OrderedType with type t = unit = struct
+  type t = unit
+  let compare = compare
+end
+
+module KeyEdgePair : Set.OrderedType with type t = Key.t * Edge.t =
+struct
+  type t = Key.t * Edge.t
+  let compare = compare
+end
+
+module KeyEdgeSet = Set.Make (KeyEdgePair)
+
+module G = Ograph_simple.Make
+  (Key) (KeySet) (KeyMap) (Node) (Edge) (KeyEdgePair) (KeyEdgeSet)
+
+
+(*****************************************************************************)
+(* Includes dependency graph *)
+(*****************************************************************************)
+
+(* Header file includes dependency graph *)
+let dependency_graph = ref (new G.ograph_mutable)
+
+(* Check if a path exists between one node to another.
+ * Almost a copy of dfs_iter in commons/ograph_extended.ml with minor changes.
+ * Return true if g satisfies predicate f else false
+ *)
+let dfs_exists xi f g =
+  let already = Hashtbl.create 101 in
+  let rec aux_dfs xs =
+    let h xi =
+      if Hashtbl.mem already xi
+      then false
+      else begin
+        Hashtbl.add already xi true;
+        if f xi
+        then true
+        else begin
+          let f' (key, _) keyset = KeySet.add key keyset in
+          let newset =
+            try KeyEdgeSet.fold f' (g#successors xi) KeySet.empty
+            with Not_found -> KeySet.empty in
+          aux_dfs newset
+        end
+      end in
+    KeySet.exists h xs in
+  aux_dfs (KeySet.singleton xi)
+
+let add_to_dependency_graph parent file =
+  let add_node a =
+    if not (KeyMap.mem a !dependency_graph#nodes)
+    then !dependency_graph#add_node a () in
+  let add_arc (a, b) =
+    add_node a;
+    add_node b;
+    !dependency_graph#add_arc (a, b) () in
+  add_arc (parent, file)
+
+let path_exists_dependency_graph filea fileb =
+  dfs_exists filea (fun a -> a = fileb) !dependency_graph
+
+let print_dependency_graph _ =
+  G.print_ograph_generic
+    ~str_of_key:(fun k -> "\"" ^ k ^ "\"")
+    ~str_of_node:(fun k _ -> k)
+    "/tmp/headers.dot"
+    !dependency_graph
+
+
+(*****************************************************************************)
+(* Name cache *)
+(*****************************************************************************)
+
+type cache_exp =
+  | CacheField of string (* Name of the struct/union it is defined in *)
+  | CacheEnumConst
+  | CacheVarFunc
+  | CacheTypedef
+
+(* Not very elegant. Basically a copy-paste of namedef in type_annoter_c. *)
+type cache_return =
+  | RetVarOrFunc of string * Ast_c.exp_type
+  | RetEnumConstant of string * string option
+  | RetTypeDef   of string * Ast_c.fullType
+  | RetStructUnionNameDef of string * (Ast_c.structUnion * Ast_c.structType)
+                          Ast_c.wrap
+
+(* Map name to list of files it is defined in *)
+let name_cache : (string, (filename * cache_exp) list) Hashtbl.t ref =
+  ref (Hashtbl.create 101)
+
+let add_to_name_cache name (file, exp) =
+  let l =
+    try Hashtbl.find !name_cache name
+    with Not_found -> [] in
+  Hashtbl.replace !name_cache name ([(file, exp)] @ l)
+
+(* Visitor to cache names in a given file. *)
+let cache_name_visitor file =
+
+  let cache_struct_fields sname def =
+    let _cache_field field =
+      match (Ast_c.unwrap field) with
+        Ast_c.Simple (name, _)
+      | Ast_c.BitField (name, _, _, _) ->
+          name +>
+            do_option
+              (fun n ->
+                 add_to_name_cache (Ast_c.str_of_name n)
+                 (file, CacheField(sname))) in
+    let _cache_struct_fields field =
+      match field with
+        Ast_c.DeclarationField(Ast_c.FieldDeclList(l)) ->
+          List.iter _cache_field (Ast_c.unwrap l)
+      | _ -> () in
+    List.iter _cache_struct_fields def in
+
+  let cache_enum_constants def =
+    def +>
+      List.iter
+        (fun ec ->
+           add_to_name_cache
+             ((Ast_c.unwrap ec) +> fst +> Ast_c.str_of_name)
+             (file, CacheEnumConst)) in
+
+  { Visitor_c.default_visitor_c with
+    Visitor_c.ktoplevel = fun (k, bigf) p ->
+      match p with
+        Ast_c.Declaration
+          (Ast_c.DeclList (defs, _)) ->
+             let cache_name x =
+               (match (Ast_c.unwrap x) with
+                 {Ast_c.v_namei = Some (name, _);
+                  Ast_c.v_type = typ;
+                  Ast_c.v_storage = strg} ->
+                  (* Cache typedefs/variables/functions *)
+                    let exp_type =
+                      match (fst strg) with
+                        Ast_c.StoTypedef -> CacheTypedef
+                      | _ -> CacheVarFunc in
+                    add_to_name_cache
+                      (Ast_c.str_of_name name) (file, exp_type)
+               | {Ast_c.v_namei = None; Ast_c.v_type = typ} ->
+                    (match (Ast_c.unwrap (snd typ)) with
+                      Ast_c.StructUnion (_, Some n, def) ->
+                        (* Cache field names *)
+                        cache_struct_fields n def
+                    | Ast_c.Enum(_, def) ->
+                        (* Cache enumeration constants *)
+                        cache_enum_constants def
+                    | _ -> k p)) in
+             List.iter cache_name defs
+      | _ -> k p
+  }
+
+
+let get_type_visitor file l =
+  let add_to_ret ret =
+    l := [ret] @ !l in
+  let get_enum_constants sopt def =
+    def +>
+      List.iter
+        (fun ec ->
+           let s = (Ast_c.unwrap ec) +> fst +> Ast_c.str_of_name in
+           add_to_ret (RetEnumConstant(s, sopt))) in
+  { Visitor_c.default_visitor_c with
+    Visitor_c.ktoplevel = fun (k, bigf) p ->
+      match p with
+        Ast_c.Declaration
+          (Ast_c.DeclList (defs, _)) ->
+             let get_name x =
+               (match (Ast_c.unwrap x) with
+                 {Ast_c.v_namei = Some (n, _);
+                  Ast_c.v_type = typ;
+                  Ast_c.v_storage = strg} ->
+                  (* Cache typedefs/variables/functions *)
+                    let f _ =
+                      let s = Ast_c.str_of_name n in
+                      match (fst strg) with
+                        Ast_c.StoTypedef ->
+                          add_to_ret (RetTypeDef(s, typ))
+                      | Ast_c.NoSto
+                      | Ast_c.Sto _ ->
+                          add_to_ret
+                            (RetVarOrFunc(s, (typ, Ast_c.NotLocalVar))) in
+                    f ()
+               | {Ast_c.v_namei = None; Ast_c.v_type = typ} ->
+                   (match (Ast_c.unwrap (snd typ)) with
+                     Ast_c.StructUnion (su, snameopt, def) ->
+                       (match snameopt with
+                         Some s ->
+                           let def' = Lib.al_fields def in
+                           let ii = Ast_c.get_ii_typeC_take_care (snd typ) in
+                           let ii' = Lib.al_ii ii in
+                           add_to_ret
+                             (RetStructUnionNameDef (s, ((su, def'),ii')))
+                       | None -> k p)
+                   | Ast_c.Enum(sopt, def) ->
+                       get_enum_constants sopt def
+                   | _ -> k p)) in
+             List.iter get_name defs
+      | _ -> k p
+  }
+
+let extract_names file ast =
+  ignore (Visitor_c.vk_program (cache_name_visitor file) ast)
+
+(* Use the visitor to get all the types from a file. *)
+let get_types_from_name_cache file name exp_types parse_fn =
+  if !Includes.include_headers_for_types && Includes.is_header file
+  then []
+  else
+    let rec find_file l =
+      match l with
+        [] -> None
+      | (x, y)::xs ->
+          if List.mem y exp_types
+          then begin
+            if path_exists_dependency_graph file x then Some (x, y)
+            else find_file xs
+          end
+          else find_file xs in
+    let file_list =
+      try Hashtbl.find !name_cache name
+      with Not_found -> [] in
+    match (find_file file_list) with
+      None -> []
+    | Some (x, t) ->
+        let ast = parse_fn x in
+        let env = ref [] in
+        ignore
+          (Visitor_c.vk_program
+            (get_type_visitor file env) ast);
+        !env
+
+
+(*****************************************************************************)
+(* Set of parsed files *)
+(*****************************************************************************)
+
+module StringSet = Set.Make (String)
+
+(* Set of files parsed *)
+let seen_files = ref (StringSet.empty)
+
+let has_been_parsed file =
+  StringSet.mem file !seen_files
+
+let add_to_parsed_files file =
+  seen_files := StringSet.add file !seen_files
diff --git a/parsing_c/includes_cache.mli b/parsing_c/includes_cache.mli
new file mode 100644
index 000000000..2384cb2f7
--- /dev/null
+++ b/parsing_c/includes_cache.mli
@@ -0,0 +1,47 @@
+(*
+ * This file is part of Coccinelle, licensed under the terms of the GPL v2.
+ * See copyright.txt in the Coccinelle source code for more information.
+ * The Coccinelle source code can be obtained at http://coccinelle.lip6.fr
+ *)
+
+(*****************************************************************************)
+(* Includes dependency graph *)
+(*****************************************************************************)
+
+val add_to_dependency_graph : Common.filename -> Common.filename -> unit
+
+val print_dependency_graph : unit -> unit
+
+(*****************************************************************************)
+(* Name cache *)
+(*****************************************************************************)
+
+type cache_exp =
+  | CacheField of string (* Name of the struct/union it is defined in *)
+  | CacheEnumConst
+  | CacheVarFunc
+  | CacheTypedef
+
+type cache_return =
+  | RetVarOrFunc of string * Ast_c.exp_type
+  | RetEnumConstant of string * string option
+  | RetTypeDef   of string * Ast_c.fullType
+  | RetStructUnionNameDef of string * (Ast_c.structUnion * Ast_c.structType)
+                          Ast_c.wrap
+
+val extract_names : Common.filename -> Ast_c.program -> unit
+
+val get_types_from_name_cache :
+  Common.filename ->
+    string ->
+    cache_exp list ->
+    (Common.filename -> Ast_c.program) ->
+    cache_return list
+
+(*****************************************************************************)
+(* Set of parsed files *)
+(*****************************************************************************)
+
+val has_been_parsed : Common.filename -> bool
+
+val add_to_parsed_files : Common.filename -> unit
-- 
2.21.3

_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph
  2020-09-09 18:12 [Cocci] [RFC PATCH 0/3] parsing_c: Optimize recursive header file parsing Jaskaran Singh
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 1/3] parsing_c: includes_cache: Implement a name cache Jaskaran Singh
@ 2020-09-09 18:13 ` Jaskaran Singh
  2020-09-09 19:41   ` Julia Lawall
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 3/3] parsing_c: type_annoter_c: Use name cache for type annotation Jaskaran Singh
  2 siblings, 1 reply; 6+ messages in thread
From: Jaskaran Singh @ 2020-09-09 18:13 UTC (permalink / raw)
  To: cocci

Build the includes dependency graph and name cache while parsing header
files. Every header file is parsed only once for name caching and, while
parsing these files, an includes dependency graph is built to determine
reachability of one header file from another file.

Signed-off-by: Jaskaran Singh <jaskaran.singh@collabora.com>
---
 parsing_c/parse_c.ml | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/parsing_c/parse_c.ml b/parsing_c/parse_c.ml
index 5574cb11b..3b250720f 100644
--- a/parsing_c/parse_c.ml
+++ b/parsing_c/parse_c.ml
@@ -17,6 +17,7 @@ open Common
 
 module TH = Token_helpers
 module LP = Lexer_parser
+module IC = Includes_cache
 
 module Stat = Parsing_stat
 
@@ -995,15 +996,29 @@ let rec _parse_print_error_heuristic2 saved_typedefs saved_macros
 and handle_include file wrapped_incl k =
     let incl = Ast_c.unwrap wrapped_incl.Ast_c.i_include in
     let parsing_style = Includes.get_parsing_style () in
+    let f = Includes.resolve file parsing_style incl in
     if Includes.should_parse parsing_style file incl
     then
-      match Includes.resolve file parsing_style incl with
+      match f with
       | Some header_filename when Common.lfile_exists header_filename ->
-	  (if !Flag_parsing_c.verbose_includes
-	  then pr2 ("including "^header_filename));
-	  let nonlocal =
-	    match incl with Ast_c.NonLocal _ -> true | _ -> false in
-          ignore (k nonlocal header_filename)
+          if not (IC.has_been_parsed header_filename)
+          then begin
+            IC.add_to_parsed_files header_filename;
+	    (if !Flag_parsing_c.verbose_includes
+	    then pr2 ("including "^header_filename));
+	    let nonlocal =
+	      match incl with Ast_c.NonLocal _ -> true | _ -> false in
+            let res = k nonlocal header_filename in
+            match res with
+              None -> ()
+            | Some x ->
+                let pt = x.parse_trees in
+                let (p, _, _) = pt in
+                with_program2_unit
+                  (IC.extract_names header_filename)
+                  p
+          end;
+          IC.add_to_dependency_graph file header_filename;
       | _ -> ()
 
 and _parse_print_error_heuristic2bis saved_typedefs saved_macros
-- 
2.21.3

_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cocci] [RFC PATCH 3/3] parsing_c: type_annoter_c: Use name cache for type annotation
  2020-09-09 18:12 [Cocci] [RFC PATCH 0/3] parsing_c: Optimize recursive header file parsing Jaskaran Singh
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 1/3] parsing_c: includes_cache: Implement a name cache Jaskaran Singh
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph Jaskaran Singh
@ 2020-09-09 18:13 ` Jaskaran Singh
  2 siblings, 0 replies; 6+ messages in thread
From: Jaskaran Singh @ 2020-09-09 18:13 UTC (permalink / raw)
  To: cocci

Use the name cache for type annotation. On encountering the following
which are not stored in the environment, the name cache is looked up and
the relevant header file is parsed for type information:
	- struct field use
	- typedef
	- function call
	- identifier
	- enumeration constant

Signed-off-by: Jaskaran Singh <jaskaran.singh@collabora.com>
---
 parsing_c/type_annoter_c.ml | 130 ++++++++++++++++++++++++++++++------
 1 file changed, 111 insertions(+), 19 deletions(-)

diff --git a/parsing_c/type_annoter_c.ml b/parsing_c/type_annoter_c.ml
index 25cb6c0ee..497332544 100644
--- a/parsing_c/type_annoter_c.ml
+++ b/parsing_c/type_annoter_c.ml
@@ -19,6 +19,7 @@ open Common
 open Ast_c
 
 module Lib = Lib_parsing_c
+module IC = Includes_cache
 
 (*****************************************************************************)
 (* Prelude *)
@@ -186,6 +187,13 @@ type nameenv = {
 
 type environment = nameenv list
 
+let includes_parse_fn file =
+  let choose_includes = Includes.get_parsing_style () in
+  Includes.set_parsing_style Includes.Parse_no_includes;
+  let ret = Parse_c.parse_c_and_cpp false false file in
+  Includes.set_parsing_style choose_includes;
+  List.map fst (fst ret)
+
 (* ------------------------------------------------------------ *)
 (* can be modified by the init_env function below, by
  * the file environment_unix.h
@@ -294,6 +302,39 @@ let member_env_lookup_enum s env =
   | [] -> false
   | env :: _ -> StringMap.mem s env.enum_constant
 
+(* ------------------------------------------------------------ *)
+
+let add_cache_binding_in_scope namedef =
+  let (current, older) = Common.uncons !_scoped_env in
+  let new_frame fr =
+    match namedef with
+      | IC.RetVarOrFunc (s, typ) ->
+	  {fr with
+	   var_or_func = StringMap.add s typ fr.var_or_func}
+      | IC.RetTypeDef   (s, typ) ->
+	  let cv = typ, fr.typedef, fr.level in
+	  let new_typedef_c : typedefs = { defs = StringMap.add s cv fr.typedef.defs } in
+	  {fr with typedef = new_typedef_c}
+      | IC.RetStructUnionNameDef (s, (su, typ)) ->
+	  {fr with
+	   struct_union_name_def = StringMap.add s (su, typ) fr.struct_union_name_def}
+      | IC.RetEnumConstant (s, body) ->
+	  {fr with
+	   enum_constant = StringMap.add s body fr.enum_constant} in
+  (* These are global, so have to reflect them in all the frames. *)
+  _scoped_env := (new_frame current)::(List.map new_frame older)
+
+(* Has side-effects on the environment.
+ * TODO: profile? *)
+let get_type_from_includes_cache file name exp_types on_success on_failure =
+  let file_bindings =
+    IC.get_types_from_name_cache
+      file name exp_types includes_parse_fn in
+  List.iter add_cache_binding_in_scope file_bindings;
+  match file_bindings with
+    [] -> on_failure ()
+  | _ -> on_success ()
+
 
 (*****************************************************************************)
 (* "type-lookup"  *)
@@ -394,7 +435,14 @@ let rec type_unfold_one_step ty env =
 	  then type_unfold_one_step t' env'
           else loop (s::seen) t' env
        with Not_found ->
-          ty
+          let f = Ast_c.file_of_info (Ast_c.info_of_name name) in
+          get_type_from_includes_cache
+            f s [IC.CacheTypedef]
+            (fun () ->
+              let (t', env') = lookup_typedef s !_scoped_env in
+              TypeName (name, Some t') +>
+              Ast_c.rewrap_typeC ty)
+            (fun () -> ty)
       )
 
   | FieldType (t, _, _) -> type_unfold_one_step t env
@@ -474,7 +522,15 @@ let rec typedef_fix ty env =
 	      TypeName (name, Some fixed) +>
 	      Ast_c.rewrap_typeC ty
             with Not_found ->
-              ty))
+              let f = Ast_c.file_of_info (Ast_c.info_of_name name) in
+              get_type_from_includes_cache
+                f s [IC.CacheTypedef]
+                (fun () ->
+                  let (t', env') = lookup_typedef s !_scoped_env in
+                  TypeName (name, Some t') +>
+                  Ast_c.rewrap_typeC ty)
+                (fun () -> ty)
+              ))
 
     | FieldType (t, a, b) ->
 	FieldType (typedef_fix t env, a, b) +> Ast_c.rewrap_typeC ty
@@ -797,8 +853,15 @@ let annotater_expr_visitor_subpart = (fun (k,bigf) expr ->
                     Type_c.noTypeHere
                 )
             | None ->
-                pr2_once ("type_annotater: no type for function ident: " ^ s);
-                Type_c.noTypeHere
+                let f =
+                  Ast_c.file_of_info
+                    (Ast_c.info_of_name ident) in
+                get_type_from_includes_cache
+                  f s [IC.CacheVarFunc]
+                  (fun () -> match lookup_opt_env lookup_var s with
+                    Some (typ, local) -> make_info_fix (typ, local)
+                  | None -> Type_c.noTypeHere)
+                  (fun () -> Type_c.noTypeHere)
             )
         )
 
@@ -848,22 +911,33 @@ let annotater_expr_visitor_subpart = (fun (k,bigf) expr ->
                 | Some _ ->
                     make_info_def (type_of_s "int")
                 | None ->
-                    if not (s =~ "[A-Z_]+") (* if macro then no warning *)
-                    then
-                      if !Flag_parsing_c.check_annotater then
-                        if not (Hashtbl.mem _notyped_var s)
-                        then begin
-                          pr2 ("Type_annoter: no type found for: " ^ s);
-                          Hashtbl.add _notyped_var s true;
-                        end
-                        else ()
-                      else
-                        pr2 ("Type_annoter: no type found for: " ^ s)
-                    ;
-                    Type_c.noTypeHere
-                )
+                    let f = Ast_c.file_of_info (Ast_c.info_of_name ident) in
+                    let failure_fn =
+                      (fun () ->
+                         if not (s =~ "[A-Z_]+") (* if macro then no warning *)
+                         then
+                           if !Flag_parsing_c.check_annotater then
+                             if not (Hashtbl.mem _notyped_var s)
+                             then begin
+                               pr2
+                                 ("Type_annoter: no type found for: " ^ s);
+                               Hashtbl.add _notyped_var s true;
+                             end
+                             else ()
+                           else pr2 ("Type_annoter: no type found for: " ^ s);
+                         Type_c.noTypeHere) in
+                    get_type_from_includes_cache
+                      f s [IC.CacheEnumConst;IC.CacheVarFunc]
+                      (fun () -> match lookup_opt_env lookup_enum s with
+                        Some _ -> make_info_def (type_of_s "int")
+                      | None ->
+                          (match lookup_opt_env lookup_var s with
+                            Some (typ,local) -> make_info_fix (typ,local)
+                          | None -> failure_fn ()))
+                      failure_fn
             )
         )
+    )
 
     (* -------------------------------------------------- *)
     (* C isomorphism on type on array and pointers *)
@@ -954,7 +1028,25 @@ let annotater_expr_visitor_subpart = (fun (k,bigf) expr ->
                         Type_c.noTypeHere
                   )
               | _ ->
-		  Type_c.noTypeHere
+                let s = Ast_c.str_of_name namefld in
+                let f = Ast_c.file_of_info (Ast_c.info_of_name namefld) in
+                let ret_typ =
+                  (match (Ast_c.unwrap (snd t)) with
+                    Ast_c.StructUnionName(su, sname) ->
+                      get_type_from_includes_cache
+                        f s [IC.CacheField sname]
+                      (fun () ->
+                        try
+                          let ((su,fields),ii) =
+                            lookup_structunion (su, sname) !_scoped_env in
+                            (try
+                               make_info_def_fix
+                                 (Type_c.type_field fld (su, fields))
+                             with _ -> Type_c.noTypeHere)
+                        with Not_found -> Type_c.noTypeHere)
+                      (fun () -> Type_c.noTypeHere)
+                  | _ -> Type_c.noTypeHere)
+                in ret_typ
           )
         )
 
-- 
2.21.3

_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph
  2020-09-09 18:13 ` [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph Jaskaran Singh
@ 2020-09-09 19:41   ` Julia Lawall
  2020-09-10  5:17     ` Jaskaran Singh
  0 siblings, 1 reply; 6+ messages in thread
From: Julia Lawall @ 2020-09-09 19:41 UTC (permalink / raw)
  To: Jaskaran Singh; +Cc: cocci



On Wed, 9 Sep 2020, Jaskaran Singh wrote:

> Build the includes dependency graph and name cache while parsing header
> files. Every header file is parsed only once for name caching and, while
> parsing these files, an includes dependency graph is built to determine
> reachability of one header file from another file.

So you really parse the whole file here?  Could you avoid that?  Is it
that you need to parse something to find the end of it?

julia

>
> Signed-off-by: Jaskaran Singh <jaskaran.singh@collabora.com>
> ---
>  parsing_c/parse_c.ml | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/parsing_c/parse_c.ml b/parsing_c/parse_c.ml
> index 5574cb11b..3b250720f 100644
> --- a/parsing_c/parse_c.ml
> +++ b/parsing_c/parse_c.ml
> @@ -17,6 +17,7 @@ open Common
>
>  module TH = Token_helpers
>  module LP = Lexer_parser
> +module IC = Includes_cache
>
>  module Stat = Parsing_stat
>
> @@ -995,15 +996,29 @@ let rec _parse_print_error_heuristic2 saved_typedefs saved_macros
>  and handle_include file wrapped_incl k =
>      let incl = Ast_c.unwrap wrapped_incl.Ast_c.i_include in
>      let parsing_style = Includes.get_parsing_style () in
> +    let f = Includes.resolve file parsing_style incl in
>      if Includes.should_parse parsing_style file incl
>      then
> -      match Includes.resolve file parsing_style incl with
> +      match f with
>        | Some header_filename when Common.lfile_exists header_filename ->
> -	  (if !Flag_parsing_c.verbose_includes
> -	  then pr2 ("including "^header_filename));
> -	  let nonlocal =
> -	    match incl with Ast_c.NonLocal _ -> true | _ -> false in
> -          ignore (k nonlocal header_filename)
> +          if not (IC.has_been_parsed header_filename)
> +          then begin
> +            IC.add_to_parsed_files header_filename;
> +	    (if !Flag_parsing_c.verbose_includes
> +	    then pr2 ("including "^header_filename));
> +	    let nonlocal =
> +	      match incl with Ast_c.NonLocal _ -> true | _ -> false in
> +            let res = k nonlocal header_filename in
> +            match res with
> +              None -> ()
> +            | Some x ->
> +                let pt = x.parse_trees in
> +                let (p, _, _) = pt in
> +                with_program2_unit
> +                  (IC.extract_names header_filename)
> +                  p
> +          end;
> +          IC.add_to_dependency_graph file header_filename;
>        | _ -> ()
>
>  and _parse_print_error_heuristic2bis saved_typedefs saved_macros
> --
> 2.21.3
>
>
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph
  2020-09-09 19:41   ` Julia Lawall
@ 2020-09-10  5:17     ` Jaskaran Singh
  0 siblings, 0 replies; 6+ messages in thread
From: Jaskaran Singh @ 2020-09-10  5:17 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Wed, 2020-09-09 at 21:41 +0200, Julia Lawall wrote:
> 
> On Wed, 9 Sep 2020, Jaskaran Singh wrote:
> 
> > Build the includes dependency graph and name cache while parsing
> > header
> > files. Every header file is parsed only once for name caching and,
> > while
> > parsing these files, an includes dependency graph is built to
> > determine
> > reachability of one header file from another file.
> 
> So you really parse the whole file here?  

Yes.

> Could you avoid that? 

Well, we need the AST for name caching. I guess we could lazily parse
these on demand and do the name caching in the type annoter. But I'm
not sure how much of a difference that would make, it'd probably end up
parsing about 80% of those branches anyway.

>  Is it
> that you need to parse something to find the end of it?
> 

We need everything from it, i.e. struct fields, enumeration constants,
function prototypes, etc. as well as the #includes in it.

Cheers,
Jaskaran.

> julia
> 
> > Signed-off-by: Jaskaran Singh <jaskaran.singh@collabora.com>
> > ---
> >  parsing_c/parse_c.ml | 27 +++++++++++++++++++++------
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> > 
> > diff --git a/parsing_c/parse_c.ml b/parsing_c/parse_c.ml
> > index 5574cb11b..3b250720f 100644
> > --- a/parsing_c/parse_c.ml
> > +++ b/parsing_c/parse_c.ml
> > @@ -17,6 +17,7 @@ open Common
> > 
> >  module TH = Token_helpers
> >  module LP = Lexer_parser
> > +module IC = Includes_cache
> > 
> >  module Stat = Parsing_stat
> > 
> > @@ -995,15 +996,29 @@ let rec _parse_print_error_heuristic2
> > saved_typedefs saved_macros
> >  and handle_include file wrapped_incl k =
> >      let incl = Ast_c.unwrap wrapped_incl.Ast_c.i_include in
> >      let parsing_style = Includes.get_parsing_style () in
> > +    let f = Includes.resolve file parsing_style incl in
> >      if Includes.should_parse parsing_style file incl
> >      then
> > -      match Includes.resolve file parsing_style incl with
> > +      match f with
> >        | Some header_filename when Common.lfile_exists
> > header_filename ->
> > -	  (if !Flag_parsing_c.verbose_includes
> > -	  then pr2 ("including "^header_filename));
> > -	  let nonlocal =
> > -	    match incl with Ast_c.NonLocal _ -> true | _ -> false in
> > -          ignore (k nonlocal header_filename)
> > +          if not (IC.has_been_parsed header_filename)
> > +          then begin
> > +            IC.add_to_parsed_files header_filename;
> > +	    (if !Flag_parsing_c.verbose_includes
> > +	    then pr2 ("including "^header_filename));
> > +	    let nonlocal =
> > +	      match incl with Ast_c.NonLocal _ -> true | _ -> false in
> > +            let res = k nonlocal header_filename in
> > +            match res with
> > +              None -> ()
> > +            | Some x ->
> > +                let pt = x.parse_trees in
> > +                let (p, _, _) = pt in
> > +                with_program2_unit
> > +                  (IC.extract_names header_filename)
> > +                  p
> > +          end;
> > +          IC.add_to_dependency_graph file header_filename;
> >        | _ -> ()
> > 
> >  and _parse_print_error_heuristic2bis saved_typedefs saved_macros
> > --
> > 2.21.3
> > 
> > 

_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-10  5:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 18:12 [Cocci] [RFC PATCH 0/3] parsing_c: Optimize recursive header file parsing Jaskaran Singh
2020-09-09 18:13 ` [Cocci] [RFC PATCH 1/3] parsing_c: includes_cache: Implement a name cache Jaskaran Singh
2020-09-09 18:13 ` [Cocci] [RFC PATCH 2/3] parsing_c: parse_c: Build name cache and includes dependency graph Jaskaran Singh
2020-09-09 19:41   ` Julia Lawall
2020-09-10  5:17     ` Jaskaran Singh
2020-09-09 18:13 ` [Cocci] [RFC PATCH 3/3] parsing_c: type_annoter_c: Use name cache for type annotation Jaskaran Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).