All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cocci] Filter out field names
@ 2016-01-21  9:47 Daniel Wagner
  2016-01-21 11:02 ` Julia Lawall
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Wagner @ 2016-01-21  9:47 UTC (permalink / raw)
  To: cocci

Hi,

I am trying to transform Linux kernel code in order to test for data
races. Let me illustrate this by an example


struct lpctx {
	spinlock_t lock;
	int cnt;
};
static struct lpctx *lpctx;

static void lpctx_good1(struct lpctx *lx)
{
	spin_lock(&lx->lock);

	lx->cnt++;

	spin_unlock(&lx->lock);
}

static void lpctx_warning1(struct lpctx *lx)
{
	lx->cnt++;
}


As you can see lptctx_warning1() is increasing the counter without
holding the lock. The main idea(*) is to annotate all accesses to lx by
adding a WARN_ON(!lock_is_held(&lx->lock.dep_map)).

Thanks to Julia's excellent help on IRC I have currently this somewhat
working cocci script:


@depends on patch@
struct lpctx *x;
identifier f;
statement S;
@@

( S
+	WARN_ON(!lock_is_held(&x->lock.dep_map));
&
	x->f
)


This results in:

@@ -16,10 +16,13 @@ static void lpctx_good1(struct lpctx *lx
        DBG("");

        spin_lock(&lx->lock);
+       WARN_ON(!lock_is_held(&lx->lock.dep_map));

        lx->cnt++;
+       WARN_ON(!lock_is_held(&lx->lock.dep_map));

        spin_unlock(&lx->lock);
+       WARN_ON(!lock_is_held(&lx->lock.dep_map));
 }

 static void lpctx_warning1(struct lpctx *lx)
@@ -27,6 +30,7 @@ static void lpctx_warning1(struct lpctx
        DBG("");

        lx->cnt++;
+       WARN_ON(!lock_is_held(&lx->lock.dep_map));
 }


There are a bunch of problems, e,g. the hard coded type/names or the
WARN_ON() should be put in front of the deferences but that is something
for later to improve. The main problem I face at this point is to filter
out the spin_lock() and spin_unlock() access. All my attempts didn't let
to the expected result. I think the best thing would be to match on the
type. So if 'f' is of type spinlock_t ignore it. Any ideas how this
could be expressed?

Thanks,
Daniel


(*) Talking with Nicholas over a coffee helps a lot. Thanks a lot!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cocci] Filter out field names
  2016-01-21  9:47 [Cocci] Filter out field names Daniel Wagner
@ 2016-01-21 11:02 ` Julia Lawall
  2016-01-22 14:32   ` Daniel Wagner
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Julia Lawall @ 2016-01-21 11:02 UTC (permalink / raw)
  To: cocci

> There are a bunch of problems, e,g. the hard coded type/names or the
> WARN_ON() should be put in front of the dereferences but that is something
> for later to improve. The main problem I face at this point is to filter
> out the spin_lock() and spin_unlock() access. All my attempts didn't let
> to the expected result. I think the best thing would be to match on the
> type. So if 'f' is of type spinlock_t ignore it. Any ideas how this
> could be expressed?

Here is my suggestion (see below for explanations):

@r@
identifier i, virtual.ty;
@@

struct ty *i;

@s@
identifier i,f,virtual.ty;
@@

f (...,struct ty *i,...) { ... }

@locks@
typedef spinlock_t;
spinlock_t *e;
statement S;
position p;
@@

e at S@p

@safe@
position p1;
identifier virtual.ty;
struct ty *x;
identifier f;
@@

 \(spin_lock\|spin_lock_irqsave\)(&x->lock,...)
 ... when != &x->lock
 x ->@p1 f

@depends on r || s@
identifier virtual.ty;
struct ty *x;
identifier f;
statement S;
position p != locks.p;
position p1 != safe.p1;
@@

(
++       WARN_ON(!lock_is_held(&x->lock.dep_map));
S at p
&
        x ->@p1 f
)

--------------------------------------------------------------------------

Same thing with explanations:

@r@
identifier i, virtual.ty;
@@

struct ty *i;

@s@
identifier i,f,virtual.ty;
@@

f (...,struct ty *i,...) { ... }

The above two rules look for files that contain declarations of your type
of interest.  If you are sure that the only files you care about modifying
contain this code, then including these rules will reduce running time very
substantially, because Coccinelle will only work on files that contain
these declarations.  Otherwise, it will work on all files, which will be
very slow, because your pattern is very generic.  Perhaps something to do
in a second step.


@locks@
typedef spinlock_t;
spinlock_t *e;
statement S;
position p;
@@

e at S@p

This finds all expressions that have type spinlock_t *, and stores their
position in p.  We will use this later to avoid transforming spin_lock
calls.  It would be nice if this could have been integrated into the later
rule, but that triggers some parsing problem.

@safe@
position p1;
identifier virtual.ty;
struct ty *x;
identifier f;
@@

 \(spin_lock\|spin_lock_irqsave\)(&x->lock,...)
 ... when != &x->lock
 x ->@p1 f

This is an optimization.  There is no point to add the WARN_ON if there was
a call to spin_lock on your lock just before.  You may want to add more
locking functions here.


@depends on r || s@
identifier virtual.ty;
struct ty *x;
identifier f;
statement S;
position p != locks.p;
position p1 != safe.p1;
@@

(
++       WARN_ON(!lock_is_held(&x->lock.dep_map));
S at p
&
        x ->@p1 f
)

This adds the WARN_ON before the code.  Using positions that should be
different than previously collected positions, this does not add WARN_ON on
statements that contain a lock somewhere, and does not add WARN_ON on
references where there was a lock call recently.

This should be run with at least the argument -D ty=typeofinterest.

Because this is looking for expressions of type spinlock_t *, it may be
useful to cause Coccinelle to take into account header files.  The
following options may be useful:

--recursive-includes --include-headers-for-types

Recursive includes causes not only files explicitly included, but also file
included by other includes to be taken into account.  If you think that
only explicitly included files would be good enough, then --all-includes
would be fine.  Include headers for types means that Coccinelle will look
at the header files for parsing and type information, but will not take
them into account during the transformation process.  This makes things
more efficient, if you don't think that transformations are needed in the
header files anyway.  The transformation being considered here would be
transformation of the header file in the context of the information
available in the .c file.  If you think that it would be useful to
transform the header file on its own, then you can use the argument
--include-headers.

An issue that remains is that you assume that the lock field has type
lock.  A quick grep in the Linux kernel shows that this is not always the
case.  You can make a virtual identifier for the lock field as well.  A
more complicated approach would be to have Coccinelle figure out the type
from the structure type definition.

A final issue is that my quick grep in the Linux kernel also shows that not
all locks are in structures.  But then it is less obvious how to connect
the locks to the protected references.

And thinking on, it is not always the case that because a structure has a
lock, that every field has to be accessed under a lock.  So in the end, it
may be necessary to do something even more clever.  For example,
if a field is somewhere referenced under a lock, and not referenced under a
lock in some other place, then perhaps that is more likely to be a real
problem.  But even that is not 100% certain, because there can be eg an
initialization phase that is not done concurrently, so no locks are
actually needed at that point, even though locks are needed elsewhere.

julia

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cocci] Filter out field names
  2016-01-21 11:02 ` Julia Lawall
@ 2016-01-22 14:32   ` Daniel Wagner
  2016-01-22 16:09   ` SF Markus Elfring
  2016-02-26 10:40   ` Daniel Wagner
  2 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2016-01-22 14:32 UTC (permalink / raw)
  To: cocci

Hi Julia,

On 21.01.16 12:02, Julia Lawall wrote:
>> There are a bunch of problems, e,g. the hard coded type/names or the
>> WARN_ON() should be put in front of the dereferences but that is something
>> for later to improve. The main problem I face at this point is to filter
>> out the spin_lock() and spin_unlock() access. All my attempts didn't let
>> to the expected result. I think the best thing would be to match on the
>> type. So if 'f' is of type spinlock_t ignore it. Any ideas how this
>> could be expressed?
>
> Here is my suggestion (see below for explanations):

Thanks a lot for your help! It's a really appreciated. Unfortunately, I 
haven't found time to play with it because of some fights with 
paperwork. Hopefully I have time on Monday to continue with this work.

Have a nice weekend,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cocci] Filter out field names
  2016-01-21 11:02 ` Julia Lawall
  2016-01-22 14:32   ` Daniel Wagner
@ 2016-01-22 16:09   ` SF Markus Elfring
  2016-02-26 10:40   ` Daniel Wagner
  2 siblings, 0 replies; 6+ messages in thread
From: SF Markus Elfring @ 2016-01-22 16:09 UTC (permalink / raw)
  To: cocci

>@safe@
>position p1;
>identifier virtual.ty;
>struct ty *x;
>identifier f;
> @@
>
> \(spin_lock\|spin_lock_irqsave\)(&x->lock,...)

Can this SmPL specifcation be extended a bit easier
if the suggested disjunction would be written like the following instead?

(spin_lock
|spin_lock_irqsave
)(&x->lock,...)


How do you think about the reuse of a SmPL constraint
for passing a longer function name list eventually?


> Same thing with explanations:
>
> @r@
> identifier i, virtual.ty;
> @@
>
> struct ty *i;
>
> @s@
> identifier i,f,virtual.ty;
> @@
>
> f (...,struct ty *i,...) { ... }

> If you are sure that the only files you care about modifying
> contain this code, then including these rules will reduce running time
> very substantially,

I am also interested in similar fine-tuning for scripts of the semantic
patch language.


> because Coccinelle will only work on files that contain these declarations.

Should filtering usually be performed for each source file in the
desired analysis?


> Otherwise, it will work on all files, which will bevery slow,
> because your pattern is very generic.

Do you consider any other file preprocessing here which will influence
the mentioned execution speed?


> It would be nice if this could have been integrated into the
> later rule, but that triggers some parsing problem.

Would you like to explain this information a bit more?


> And thinking on, it is not always the case that because a structure
> has a lock, that every field has to be accessed under a lock.
> So in the end, it may be necessary to do something even more clever.
> For example, if a field is somewhere referenced under a lock, and not
> referenced under a lock in some other place, then perhaps that is more
> likely to be a real problem. But even that is not 100% certain,
> because there can be eg an initialization phase that is not done concurrently,
> so no locks are actually needed at that point, even though locks are
> needed elsewhere.

Can it be that all the special cases and further challenges you mention here
belong to the technology around data flow analysis?

Regards,
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cocci] Filter out field names
  2016-01-21 11:02 ` Julia Lawall
  2016-01-22 14:32   ` Daniel Wagner
  2016-01-22 16:09   ` SF Markus Elfring
@ 2016-02-26 10:40   ` Daniel Wagner
  2016-02-26 12:40     ` SF Markus Elfring
  2 siblings, 1 reply; 6+ messages in thread
From: Daniel Wagner @ 2016-02-26 10:40 UTC (permalink / raw)
  To: cocci

Hi Julia,

Sorry for the long delay. Got distracted again.

On 01/21/2016 12:02 PM, Julia Lawall wrote:
> An issue that remains is that you assume that the lock field has type
> lock.  A quick grep in the Linux kernel shows that this is not always the
> case.  You can make a virtual identifier for the lock field as well.  A
> more complicated approach would be to have Coccinelle figure out the type
> from the structure type definition.
> 
> A final issue is that my quick grep in the Linux kernel also shows that not
> all locks are in structures.  But then it is less obvious how to connect
> the locks to the protected references.
> 
> And thinking on, it is not always the case that because a structure has a
> lock, that every field has to be accessed under a lock.  So in the end, it
> may be necessary to do something even more clever.  For example,
> if a field is somewhere referenced under a lock, and not referenced under a
> lock in some other place, then perhaps that is more likely to be a real
> problem.  But even that is not 100% certain, because there can be eg an
> initialization phase that is not done concurrently, so no locks are
> actually needed at that point, even though locks are needed elsewhere.

Your analysis is absolutely correct. Automatically annotating is not
really feasibly. At least without a lot of additional hand work.
Especially the more interesting data structure do no have a simple
locking pattern. Instead all access is heavily optimized which makes
writing  generic cocci rules really difficult (impossible?).

I think my idea isn't that simple as I hoped. Well, at least I have
learned something.

thanks a lot for your help,
daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cocci] Filter out field names
  2016-02-26 10:40   ` Daniel Wagner
@ 2016-02-26 12:40     ` SF Markus Elfring
  0 siblings, 0 replies; 6+ messages in thread
From: SF Markus Elfring @ 2016-02-26 12:40 UTC (permalink / raw)
  To: cocci

> I think my idea isn't that simple as I hoped. Well, at least I have
> learned something.

How do you think about to extend your software development experiences
for technology around data flow analysis?

Regards,
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-26 12:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21  9:47 [Cocci] Filter out field names Daniel Wagner
2016-01-21 11:02 ` Julia Lawall
2016-01-22 14:32   ` Daniel Wagner
2016-01-22 16:09   ` SF Markus Elfring
2016-02-26 10:40   ` Daniel Wagner
2016-02-26 12:40     ` SF Markus Elfring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.