cocci.inria.fr archive mirror
 help / color / mirror / Atom feed
* [cocci] spatch --jobs N missing matches?
@ 2022-09-26 18:52 Kees Cook
  2022-09-26 21:14 ` Julia Lawall
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2022-09-26 18:52 UTC (permalink / raw)
  To: cocci; +Cc: Julia Lawall, linux-hardening, Gustavo A. R. Silva

Hi,

I have been unable to figure out what is going wrong with spatch doing
what seems like a simple match, but not finding correct results. Here is
the .cocci file:

----
// "level1" matches a struct ending in a flexible array.
@level1@
identifier inner, flex;
type T;
@@
        struct inner {
                ...
                T flex[];
        };

// "level2" matches a composite flexible array struct (struct ending with "level1")
@level2@
identifier level1.inner;
identifier outer, compflex;
@@
        struct outer {
                ...
                struct inner compflex;
        };

// match memcpy() which has a composite flexible array struct as the destination
@memcpy_compflex_dest depends on level2@
identifier level2.outer, level2.compflex;
struct outer *PTR;
expression SRC, SIZE;
@@

  memcpy(
*       &PTR->compflex
  , SRC, SIZE)

----

I am using spatch on Ubuntu 22.04.1 LTS:

$ spatch --version
spatch version 1.1.1 compiled with OCaml version 4.13.1

But I've also tried this with the latest from git with (worse?) results
(see below).

I'm using the same "include" options as generated by current Linux builds:

$ INCLUDES="-I ./arch/x86/include -I ./arch/x86/include/generated -I ./include -I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi --include ./include/linux/compiler-version.h --include ./include/linux/kconfig.h"

And here are the other common arguments:

$ ARGS="--very-quiet $INCLUDES --cocci-file compflex-simple.cocci"

I'm running this against linux-next (20220923).

The first issue I encountered, was that the default didn't match for a
known case:

$ time spatch $ARGS net/sched/cls_u32.c | grep ^---

real    0m0.096s
...

Also it didn't work with "--all-includes":

$ time spatch $ARGS --all-includes net/sched/cls_u32.c | grep ^---

real    0m1.150s
...

In reading the documentation carefully, it seem the desired option is
actually "--recursive-includes" option, which _does_ work:

$ time spatch $ARGS --recursive-includes net/sched/cls_u32.c | grep ^---
--- net/sched/cls_u32.c

real    0m25.332s
...

This takes _much_ longer to run, though. (25 seconds vs 1 ...)

However, if I run this in parallel (using the options shown in the
kernel's build), it does _not_ find the hit in net/sched/cls_u32.c (?!)

$ time spatch $ARGS --recursive-includes --jobs 36 --chunksize 1 --dir . | grep ^---
6594 files match
EXN: Sys_error("./sound/firewire/fireworks/packets-buffer.h: No such file or directory") in ./sound/firewire/fireworks/fireworks_command.c
EXN: Sys_error("./sound/firewire/bebob/lib.h: No such file or directory") in ./sound/firewire/bebob/bebob_command.c
EXN: Sys_error("./sound/firewire/fireworks/packets-buffer.h: No such file or directory") in ./sound/firewire/fireworks/fireworks_transaction.c
--- ./fs/dlm/requestqueue.c
--- ./drivers/w1/w1_netlink.c
--- ./drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
--- ./drivers/platform/surface/surface_acpi_notify.c
--- ./net/nfc/hci/hcp.c
--- ./drivers/s390/net/qeth_l2_main.c

real    8m50.993s
...

With the latest from git, it additionally misses "fs/dlm/requestqueue.c",
(but it takes half the time):

$ time ~/.local/bin/spatch $ARGS --recursive-includes --jobs 36 --chunksize 1 --dir . | grep ^---
6594 files match
--- ./drivers/w1/w1_netlink.c
--- ./drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
--- ./drivers/platform/surface/surface_acpi_notify.c
--- ./net/nfc/hci/hcp.c
--- ./drivers/s390/net/qeth_l2_main.c

real    4m39.263s
...

If I run it _not_ in parallel it is obviously MUCH slower, but ends up
finding _no_ hits at all!

$ $ time spatch $ARGS --recursive-includes --dir . | tee /tmp/slow.log | grep ^---
6594 files match
EXN: Sys_error("./sound/firewire/bebob/lib.h: No such file or directory") in ./sound/firewire/bebob/bebob_command.c
EXN: Sys_error("./sound/firewire/fireworks/packets-buffer.h: No such file or directory") in ./sound/firewire/fireworks/fireworks_command.c

real    66m49.620s
...

Do you have any idea what is going on?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [cocci] spatch --jobs N missing matches?
  2022-09-26 18:52 [cocci] spatch --jobs N missing matches? Kees Cook
@ 2022-09-26 21:14 ` Julia Lawall
  2022-09-26 22:09   ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Julia Lawall @ 2022-09-26 21:14 UTC (permalink / raw)
  To: Kees Cook; +Cc: cocci, linux-hardening, Gustavo A. R. Silva



On Mon, 26 Sep 2022, Kees Cook wrote:

> Hi,
>
> I have been unable to figure out what is going wrong with spatch doing
> what seems like a simple match, but not finding correct results. Here is
> the .cocci file:
>
> ----
> // "level1" matches a struct ending in a flexible array.
> @level1@
> identifier inner, flex;
> type T;
> @@
>         struct inner {
>                 ...
>                 T flex[];
>         };
>
> // "level2" matches a composite flexible array struct (struct ending with "level1")
> @level2@
> identifier level1.inner;
> identifier outer, compflex;
> @@
>         struct outer {
>                 ...
>                 struct inner compflex;
>         };
>
> // match memcpy() which has a composite flexible array struct as the destination
> @memcpy_compflex_dest depends on level2@
> identifier level2.outer, level2.compflex;
> struct outer *PTR;
> expression SRC, SIZE;
> @@
>
>   memcpy(
> *       &PTR->compflex
>   , SRC, SIZE)
>
> ----
>
> I am using spatch on Ubuntu 22.04.1 LTS:
>
> $ spatch --version
> spatch version 1.1.1 compiled with OCaml version 4.13.1
>
> But I've also tried this with the latest from git with (worse?) results
> (see below).
>
> I'm using the same "include" options as generated by current Linux builds:
>
> $ INCLUDES="-I ./arch/x86/include -I ./arch/x86/include/generated -I ./include -I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi --include ./include/linux/compiler-version.h --include ./include/linux/kconfig.h"
>
> And here are the other common arguments:
>
> $ ARGS="--very-quiet $INCLUDES --cocci-file compflex-simple.cocci"
>
> I'm running this against linux-next (20220923).
>
> The first issue I encountered, was that the default didn't match for a
> known case:
>
> $ time spatch $ARGS net/sched/cls_u32.c | grep ^---
>
> real    0m0.096s
> ...
>
> Also it didn't work with "--all-includes":
>
> $ time spatch $ARGS --all-includes net/sched/cls_u32.c | grep ^---
>
> real    0m1.150s
> ...
>
> In reading the documentation carefully, it seem the desired option is
> actually "--recursive-includes" option, which _does_ work:

--recursive-includes is a good choice, and it is understandable that it
should be much slower.  It is parsing all of those recursively included
files.

I likewise see the following puzzling behavior:

i44:~/misc: spatch.opt ~/linux-next/fs/dlm/requestqueue.c flex.cocci  -I
~/linux-next/include
init_defs_builtins: /home/julia/main_cocci/standard.h
HANDLING: /home/julia/linux-next/fs/dlm/requestqueue.c
diff =
--- /home/julia/linux-next/fs/dlm/requestqueue.c
+++ /tmp/cocci-output-2281212-d01f89-requestqueue.c
@@ -44,7 +44,6 @@ void dlm_add_requestqueue(struct dlm_ls

 	e->recover_seq = ls->ls_recover_seq & 0xFFFFFFFF;
 	e->nodeid = nodeid;
-	memcpy(&e->request, ms, le16_to_cpu(ms->m_header.h_length));

 	atomic_inc(&ls->ls_requestqueue_cnt);
 	mutex_lock(&ls->ls_requestqueue_mutex);
i44:~/misc: spatch.opt ~/linux-next/fs/dlm flex.cocci -I
~/linux-next/include
init_defs_builtins: /home/julia/main_cocci/standard.h
(ONCE) Expected tokens memcpy
Skipping: /home/julia/linux-next/fs/dlm/recoverd.h
HANDLING: /home/julia/linux-next/fs/dlm/ast.c
Skipping: /home/julia/linux-next/fs/dlm/user.h
HANDLING: /home/julia/linux-next/fs/dlm/config.c
HANDLING: /home/julia/linux-next/fs/dlm/rcom.c
Skipping: /home/julia/linux-next/fs/dlm/lockspace.h
Skipping: /home/julia/linux-next/fs/dlm/dir.h
HANDLING: /home/julia/linux-next/fs/dlm/netlink.c
Skipping: /home/julia/linux-next/fs/dlm/dlm_internal.h
Skipping: /home/julia/linux-next/fs/dlm/memory.c
HANDLING: /home/julia/linux-next/fs/dlm/lowcomms.c
Skipping: /home/julia/linux-next/fs/dlm/util.h
Skipping: /home/julia/linux-next/fs/dlm/recoverd.c
Skipping: /home/julia/linux-next/fs/dlm/lowcomms.h
Skipping: /home/julia/linux-next/fs/dlm/lock.h
Skipping: /home/julia/linux-next/fs/dlm/recover.h
HANDLING: /home/julia/linux-next/fs/dlm/plock.c
HANDLING: /home/julia/linux-next/fs/dlm/lockspace.c
Skipping: /home/julia/linux-next/fs/dlm/main.c
Skipping: /home/julia/linux-next/fs/dlm/member.c
HANDLING: /home/julia/linux-next/fs/dlm/dir.c
Skipping: /home/julia/linux-next/fs/dlm/requestqueue.h
Skipping: /home/julia/linux-next/fs/dlm/debug_fs.c
Skipping: /home/julia/linux-next/fs/dlm/midcomms.h
HANDLING: /home/julia/linux-next/fs/dlm/lock.c
HANDLING: /home/julia/linux-next/fs/dlm/midcomms.c
HANDLING: /home/julia/linux-next/fs/dlm/requestqueue.c
Skipping: /home/julia/linux-next/fs/dlm/lvb_table.h
Skipping: /home/julia/linux-next/fs/dlm/ast.h
HANDLING: /home/julia/linux-next/fs/dlm/user.c
HANDLING: /home/julia/linux-next/fs/dlm/recover.c
Skipping: /home/julia/linux-next/fs/dlm/member.h
Skipping: /home/julia/linux-next/fs/dlm/rcom.h
Skipping: /home/julia/linux-next/fs/dlm/config.h
Skipping: /home/julia/linux-next/fs/dlm/util.c
Skipping: /home/julia/linux-next/fs/dlm/memory.h

So it worked when I tried only one .c file, but not when I tried the whole
directory.  When I put some tracing (--debug --show-trying) it seems to be
considering only the .c file in the full directory case, not the header
file.  I will look into it.

In the short term, you could try using the option --selected-only to get a
list of files considered to be relevant, and then run Coccinelle on each
of them individually.

julia




>
> $ time spatch $ARGS --recursive-includes net/sched/cls_u32.c | grep ^---
> --- net/sched/cls_u32.c
>
> real    0m25.332s
> ...
>
> This takes _much_ longer to run, though. (25 seconds vs 1 ...)
>
> However, if I run this in parallel (using the options shown in the
> kernel's build), it does _not_ find the hit in net/sched/cls_u32.c (?!)
>
> $ time spatch $ARGS --recursive-includes --jobs 36 --chunksize 1 --dir . | grep ^---
> 6594 files match
> EXN: Sys_error("./sound/firewire/fireworks/packets-buffer.h: No such file or directory") in ./sound/firewire/fireworks/fireworks_command.c
> EXN: Sys_error("./sound/firewire/bebob/lib.h: No such file or directory") in ./sound/firewire/bebob/bebob_command.c
> EXN: Sys_error("./sound/firewire/fireworks/packets-buffer.h: No such file or directory") in ./sound/firewire/fireworks/fireworks_transaction.c
> --- ./fs/dlm/requestqueue.c
> --- ./drivers/w1/w1_netlink.c
> --- ./drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
> --- ./drivers/platform/surface/surface_acpi_notify.c
> --- ./net/nfc/hci/hcp.c
> --- ./drivers/s390/net/qeth_l2_main.c
>
> real    8m50.993s
> ...
>
> With the latest from git, it additionally misses "fs/dlm/requestqueue.c",
> (but it takes half the time):
>
> $ time ~/.local/bin/spatch $ARGS --recursive-includes --jobs 36 --chunksize 1 --dir . | grep ^---
> 6594 files match
> --- ./drivers/w1/w1_netlink.c
> --- ./drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
> --- ./drivers/platform/surface/surface_acpi_notify.c
> --- ./net/nfc/hci/hcp.c
> --- ./drivers/s390/net/qeth_l2_main.c
>
> real    4m39.263s
> ...
>
> If I run it _not_ in parallel it is obviously MUCH slower, but ends up
> finding _no_ hits at all!
>
> $ $ time spatch $ARGS --recursive-includes --dir . | tee /tmp/slow.log | grep ^---
> 6594 files match
> EXN: Sys_error("./sound/firewire/bebob/lib.h: No such file or directory") in ./sound/firewire/bebob/bebob_command.c
> EXN: Sys_error("./sound/firewire/fireworks/packets-buffer.h: No such file or directory") in ./sound/firewire/fireworks/fireworks_command.c
>
> real    66m49.620s
> ...
>
> Do you have any idea what is going on?
>
> --
> Kees Cook
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [cocci] spatch --jobs N missing matches?
  2022-09-26 21:14 ` Julia Lawall
@ 2022-09-26 22:09   ` Kees Cook
  2022-09-27 20:37     ` Julia Lawall
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2022-09-26 22:09 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci, linux-hardening, Gustavo A. R. Silva

On Mon, Sep 26, 2022 at 11:14:55PM +0200, Julia Lawall wrote:
> I likewise see the following puzzling behavior:

Okay, that's a relief -- I'm usually holding it wrong. :P

> [...]
> So it worked when I tried only one .c file, but not when I tried the whole
> directory.  When I put some tracing (--debug --show-trying) it seems to be
> considering only the .c file in the full directory case, not the header
> file.  I will look into it.

Thank you! If you end up with patches, I'm happy to try them out. I've
got a fresh build of the latest cocci git tree building, so I can easily
test changes.

> In the short term, you could try using the option --selected-only to get a
> list of files considered to be relevant, and then run Coccinelle on each
> of them individually.

Ah-ha! I hadn't noticed --selected-only, that's nicer than my
sorta-by-hand version of just using "git grep memcpy | filter...", but
yes, that's what I ran a while ago. I see it just finished after two
hours. :) But it found (what I hope is) the full set!

--- ./drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
--- ./drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
--- ./drivers/platform/surface/surface_acpi_notify.c
--- ./drivers/s390/net/qeth_l2_main.c
--- ./drivers/w1/w1_netlink.c
--- ./fs/dlm/requestqueue.c
--- ./net/sched/cls_u32.c
--- ./net/wireless/nl80211.c

real    111m15.762s

Thanks for taking a look at this!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [cocci] spatch --jobs N missing matches?
  2022-09-26 22:09   ` Kees Cook
@ 2022-09-27 20:37     ` Julia Lawall
  2022-09-27 21:09       ` Julia Lawall
  0 siblings, 1 reply; 7+ messages in thread
From: Julia Lawall @ 2022-09-27 20:37 UTC (permalink / raw)
  To: Kees Cook; +Cc: cocci, linux-hardening, Gustavo A. R. Silva



On Mon, 26 Sep 2022, Kees Cook wrote:

> On Mon, Sep 26, 2022 at 11:14:55PM +0200, Julia Lawall wrote:
> > I likewise see the following puzzling behavior:
>
> Okay, that's a relief -- I'm usually holding it wrong. :P
>
> > [...]
> > So it worked when I tried only one .c file, but not when I tried the whole
> > directory.  When I put some tracing (--debug --show-trying) it seems to be
> > considering only the .c file in the full directory case, not the header
> > file.  I will look into it.
>
> Thank you! If you end up with patches, I'm happy to try them out. I've
> got a fresh build of the latest cocci git tree building, so I can easily
> test changes.

The problem is fixed in github.  Coccinelle was doing some caching of
header files, that was not desirable in the case where one actually wants
to match the code, and not just get type information.

The change may affect any semantic patch that uses --recursive-includes.

julia

>
> > In the short term, you could try using the option --selected-only to get a
> > list of files considered to be relevant, and then run Coccinelle on each
> > of them individually.
>
> Ah-ha! I hadn't noticed --selected-only, that's nicer than my
> sorta-by-hand version of just using "git grep memcpy | filter...", but
> yes, that's what I ran a while ago. I see it just finished after two
> hours. :) But it found (what I hope is) the full set!
>
> --- ./drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
> --- ./drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
> --- ./drivers/platform/surface/surface_acpi_notify.c
> --- ./drivers/s390/net/qeth_l2_main.c
> --- ./drivers/w1/w1_netlink.c
> --- ./fs/dlm/requestqueue.c
> --- ./net/sched/cls_u32.c
> --- ./net/wireless/nl80211.c
>
> real    111m15.762s
>
> Thanks for taking a look at this!
>
> --
> Kees Cook
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [cocci] spatch --jobs N missing matches?
  2022-09-27 20:37     ` Julia Lawall
@ 2022-09-27 21:09       ` Julia Lawall
  2022-09-28  0:06         ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Julia Lawall @ 2022-09-27 21:09 UTC (permalink / raw)
  To: Kees Cook; +Cc: cocci, linux-hardening, Gustavo A. R. Silva

The running time is cut in half for the command line:

spatch ~/linux-next/fs/dlm flex2.cocci -I ~/linux-next/include

Where flex2.cocci is:

------------

#spatch --recursive-includes --include-headers

// match memcpy() which has a composite flexible array struct as the destination
@memcpy_compflex_dest@
identifier outer, compflex;
struct outer *PTR;
expression SRC, SIZE;
@@

  memcpy(
       &PTR->compflex
  , SRC, SIZE)

// "level2" matches a composite flexible array struct (struct ending with "level1")
@level2@
identifier inner;
identifier memcpy_compflex_dest.outer, memcpy_compflex_dest.compflex;
@@
        struct outer {
                ...
                struct inner compflex;
        };

// "level1" matches a struct ending in a flexible array.
@level1@
identifier level2.inner, flex;
type T;
@@
        struct inner {
                ...
                T flex[];
        };

// match memcpy() which has a composite flexible array struct as the destination
@depends on level1@
identifier memcpy_compflex_dest.outer, memcpy_compflex_dest.compflex;
struct outer *memcpy_compflex_dest.PTR;
expression SRC, SIZE;
@@

  memcpy(
*       &PTR->compflex
  , SRC, SIZE)

------------

Actually, there are not that many memcpys in the considered code.  Then
there are not that many that refer to the last element of a structure.  If
level2 produces nothing, then level 1 should not be applied.

In the original rule order, all of the pairs of a flexible structure and
any structure are considered, regardless of whether any memcpys are
present.

julia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [cocci] spatch --jobs N missing matches?
  2022-09-27 21:09       ` Julia Lawall
@ 2022-09-28  0:06         ` Kees Cook
  2022-09-28  5:23           ` Julia Lawall
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2022-09-28  0:06 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci, linux-hardening, Gustavo A. R. Silva


On Tue, Sep 27, 2022 at 11:09:35PM +0200, Julia Lawall wrote:
> The problem is fixed in github.  Coccinelle was doing some caching of
> header files, that was not desirable in the case where one actually wants
> to match the code, and not just get type information.

Thank you for the fix! I can confirm things appear to be working
correctly now. (And took 124 minutes to run.)

> [...]
> Actually, there are not that many memcpys in the considered code.  Then
> there are not that many that refer to the last element of a structure.  If
> level2 produces nothing, then level 1 should not be applied.
> 
> In the original rule order, all of the pairs of a flexible structure and
> any structure are considered, regardless of whether any memcpys are
> present.

Ah! Yes, I keep forgetting to start with the narrowest part first. :P

I also forget that I can do a "depends" on something that has no other
matches, but if it's built on prior rules that I use in later rules,
then it limits that rule directly. I haven't quite managed to think
sideways hard enough. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [cocci] spatch --jobs N missing matches?
  2022-09-28  0:06         ` Kees Cook
@ 2022-09-28  5:23           ` Julia Lawall
  0 siblings, 0 replies; 7+ messages in thread
From: Julia Lawall @ 2022-09-28  5:23 UTC (permalink / raw)
  To: Kees Cook; +Cc: cocci, linux-hardening, Gustavo A. R. Silva



On Tue, 27 Sep 2022, Kees Cook wrote:

>
> On Tue, Sep 27, 2022 at 11:09:35PM +0200, Julia Lawall wrote:
> > The problem is fixed in github.  Coccinelle was doing some caching of
> > header files, that was not desirable in the case where one actually wants
> > to match the code, and not just get type information.
>
> Thank you for the fix! I can confirm things appear to be working
> correctly now. (And took 124 minutes to run.)

OK, long, but at least you get the result.

> > [...]
> > Actually, there are not that many memcpys in the considered code.  Then
> > there are not that many that refer to the last element of a structure.  If
> > level2 produces nothing, then level 1 should not be applied.
> >
> > In the original rule order, all of the pairs of a flexible structure and
> > any structure are considered, regardless of whether any memcpys are
> > present.
>
> Ah! Yes, I keep forgetting to start with the narrowest part first. :P
>
> I also forget that I can do a "depends" on something that has no other
> matches, but if it's built on prior rules that I use in later rules,
> then it limits that rule directly. I haven't quite managed to think
> sideways hard enough. :)

Actually, that is the only purpose of depends on.  Your original rule had
a depends on level2 that was unnecessary, since the rule couldn't match if
some metavariables from level2 were not bound.

julia

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-28  5:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-26 18:52 [cocci] spatch --jobs N missing matches? Kees Cook
2022-09-26 21:14 ` Julia Lawall
2022-09-26 22:09   ` Kees Cook
2022-09-27 20:37     ` Julia Lawall
2022-09-27 21:09       ` Julia Lawall
2022-09-28  0:06         ` Kees Cook
2022-09-28  5:23           ` Julia Lawall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).