All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-04 22:12 ` Denis Efremov
  0 siblings, 0 replies; 14+ messages in thread
From: Denis Efremov @ 2019-09-04 22:12 UTC (permalink / raw)
  To: Coccinelle
  Cc: Denis Efremov, Julia Lawall, Gilles Muller, Nicolas Palix,
	Michal Marek, linux-kernel

This is an RFC. I will resend the patch after feedback. Currently
I'm preparing big patchset with bsearch warnings fixed. The rule will
be a part of this patchset if it will be considered good enough for
checking.

There is a known integer overflow error [1] in the binary search
algorithm. Google faced it in 2006 [2]. This rule checks midpoint
calculation in binary search for overflow, i.e., (l + h) / 2.
Not every match is an actual error since the array could be small
enough. However, a custom implementation of binary search is
error-prone and it's better to use the library function (lib/bsearch.c)
or to apply defensive programming for midpoint calculation.

[1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
[2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100644 scripts/coccinelle/misc/bsearch.cocci

diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
new file mode 100644
index 000000000000..a99d9a8d3ee5
--- /dev/null
+++ b/scripts/coccinelle/misc/bsearch.cocci
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/// Check midpoint calculation in binary search algorithm for integer overflow
+/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
+/// since the array can be small enough. However, a custom implementation of
+/// binary search is error-prone and it's better to use the library function
+/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
+///
+/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
+/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
+//
+// Confidence: Medium
+// Copyright: (C) 2019 Denis Efremov, ISPRAS
+// Comments:
+// Options: --no-includes --include-headers
+
+virtual report
+virtual org
+
+@r depends on org || report@
+identifier l, h, m;
+statement S;
+position p;
+// to match 1 in <<
+// to match 2 in /
+// Can't use exact values, e.g. 2, because it fails to match 2L.
+// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
+constant c;
+// TODO: Is there an isomorphism for (a / 2) === (a >> 1)?
+@@
+(
+ while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
+  ...
+(
+  ((l + h)@p / c)
+|
+  ((l + h)@p >> c)
+)
+  ...
+ }
+//TODO: Is it possible to match "do {} while ();" loops?
+// |
+//  do {
+//   ...
+// (
+//   ((l + h)@p / c)
+// |
+//   ((l + h)@p >> c)
+// )
+//   ...
+//  } while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\));
+|
+ for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\);
+      m = \(((l + h)@p / c)\|((l + h)@p >> c)\)) S
+|
+ for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\); ...) {
+  ...
+(
+  ((l + h)@p / c)
+|
+  ((l + h)@p >> c)
+)
+  ...
+ }
+)
+
+@script:python depends on org@
+p << r.p;
+@@
+
+coccilib.org.print_todo(p[0], "WARNING opportunity for bsearch() or 'm = l + (h - l) / 2;'")
+
+@script:python depends on report@
+p << r.p;
+@@
+
+msg="WARNING: custom implementation of bsearch is error-prone. "
+msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
+msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
+coccilib.report.print_report(p[0], msg)
+
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Cocci] [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-04 22:12 ` Denis Efremov
  0 siblings, 0 replies; 14+ messages in thread
From: Denis Efremov @ 2019-09-04 22:12 UTC (permalink / raw)
  To: Coccinelle; +Cc: Michal Marek, Nicolas Palix, linux-kernel

This is an RFC. I will resend the patch after feedback. Currently
I'm preparing big patchset with bsearch warnings fixed. The rule will
be a part of this patchset if it will be considered good enough for
checking.

There is a known integer overflow error [1] in the binary search
algorithm. Google faced it in 2006 [2]. This rule checks midpoint
calculation in binary search for overflow, i.e., (l + h) / 2.
Not every match is an actual error since the array could be small
enough. However, a custom implementation of binary search is
error-prone and it's better to use the library function (lib/bsearch.c)
or to apply defensive programming for midpoint calculation.

[1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
[2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100644 scripts/coccinelle/misc/bsearch.cocci

diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
new file mode 100644
index 000000000000..a99d9a8d3ee5
--- /dev/null
+++ b/scripts/coccinelle/misc/bsearch.cocci
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/// Check midpoint calculation in binary search algorithm for integer overflow
+/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
+/// since the array can be small enough. However, a custom implementation of
+/// binary search is error-prone and it's better to use the library function
+/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
+///
+/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
+/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
+//
+// Confidence: Medium
+// Copyright: (C) 2019 Denis Efremov, ISPRAS
+// Comments:
+// Options: --no-includes --include-headers
+
+virtual report
+virtual org
+
+@r depends on org || report@
+identifier l, h, m;
+statement S;
+position p;
+// to match 1 in <<
+// to match 2 in /
+// Can't use exact values, e.g. 2, because it fails to match 2L.
+// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
+constant c;
+// TODO: Is there an isomorphism for (a / 2) === (a >> 1)?
+@@
+(
+ while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
+  ...
+(
+  ((l + h)@p / c)
+|
+  ((l + h)@p >> c)
+)
+  ...
+ }
+//TODO: Is it possible to match "do {} while ();" loops?
+// |
+//  do {
+//   ...
+// (
+//   ((l + h)@p / c)
+// |
+//   ((l + h)@p >> c)
+// )
+//   ...
+//  } while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\));
+|
+ for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\);
+      m = \(((l + h)@p / c)\|((l + h)@p >> c)\)) S
+|
+ for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\); ...) {
+  ...
+(
+  ((l + h)@p / c)
+|
+  ((l + h)@p >> c)
+)
+  ...
+ }
+)
+
+@script:python depends on org@
+p << r.p;
+@@
+
+coccilib.org.print_todo(p[0], "WARNING opportunity for bsearch() or 'm = l + (h - l) / 2;'")
+
+@script:python depends on report@
+p << r.p;
+@@
+
+msg="WARNING: custom implementation of bsearch is error-prone. "
+msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
+msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
+coccilib.report.print_report(p[0], msg)
+
-- 
2.21.0

_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
  2019-09-04 22:12 ` [Cocci] " Denis Efremov
@ 2019-09-05  6:20   ` Julia Lawall
  -1 siblings, 0 replies; 14+ messages in thread
From: Julia Lawall @ 2019-09-05  6:20 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Coccinelle, Gilles Muller, Nicolas Palix, Michal Marek,
	linux-kernel, tacingiht



On Thu, 5 Sep 2019, Denis Efremov wrote:

> This is an RFC. I will resend the patch after feedback. Currently
> I'm preparing big patchset with bsearch warnings fixed. The rule will
> be a part of this patchset if it will be considered good enough for
> checking.
>
> There is a known integer overflow error [1] in the binary search
> algorithm. Google faced it in 2006 [2]. This rule checks midpoint
> calculation in binary search for overflow, i.e., (l + h) / 2.
> Not every match is an actual error since the array could be small
> enough. However, a custom implementation of binary search is
> error-prone and it's better to use the library function (lib/bsearch.c)
> or to apply defensive programming for midpoint calculation.
>
> [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
>
> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
>  create mode 100644 scripts/coccinelle/misc/bsearch.cocci
>
> diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
> new file mode 100644
> index 000000000000..a99d9a8d3ee5
> --- /dev/null
> +++ b/scripts/coccinelle/misc/bsearch.cocci
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/// Check midpoint calculation in binary search algorithm for integer overflow
> +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
> +/// since the array can be small enough. However, a custom implementation of
> +/// binary search is error-prone and it's better to use the library function
> +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
> +///
> +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
> +//
> +// Confidence: Medium
> +// Copyright: (C) 2019 Denis Efremov, ISPRAS
> +// Comments:
> +// Options: --no-includes --include-headers
> +
> +virtual report
> +virtual org
> +
> +@r depends on org || report@
> +identifier l, h, m;
> +statement S;
> +position p;
> +// to match 1 in <<
> +// to match 2 in /
> +// Can't use exact values, e.g. 2, because it fails to match 2L.
> +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
> +constant c;

As far as I can see, you aren't checking for 2 at all at the moment?  You
should be able to say constant c = {2, 2L, etc};.  Actually, we do
consider several variants of 0, so it could be reasonable to allow eg 2 to
match other variants as well.

> +// TODO: Is there an isomorphism for (a / 2) === (a >> 1)?

No.  I'm not sure it's common enough to be worth it.  But it could be
added if it seems like a good idea.

> +@@
> +(
> + while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }
> +//TODO: Is it possible to match "do {} while ();" loops?

Not at the moment.  There is some work on it, but it is not complete
(adding Evan in CC).

julia

> +// |
> +//  do {
> +//   ...
> +// (
> +//   ((l + h)@p / c)
> +// |
> +//   ((l + h)@p >> c)
> +// )
> +//   ...
> +//  } while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\));
> +|
> + for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\);
> +      m = \(((l + h)@p / c)\|((l + h)@p >> c)\)) S
> +|
> + for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\); ...) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }
> +)
> +
> +@script:python depends on org@
> +p << r.p;
> +@@
> +
> +coccilib.org.print_todo(p[0], "WARNING opportunity for bsearch() or 'm = l + (h - l) / 2;'")
> +
> +@script:python depends on report@
> +p << r.p;
> +@@
> +
> +msg="WARNING: custom implementation of bsearch is error-prone. "
> +msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
> +msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
> +coccilib.report.print_report(p[0], msg)
> +
> --
> 2.21.0
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cocci] [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05  6:20   ` Julia Lawall
  0 siblings, 0 replies; 14+ messages in thread
From: Julia Lawall @ 2019-09-05  6:20 UTC (permalink / raw)
  To: Denis Efremov; +Cc: Michal Marek, Nicolas Palix, linux-kernel, Coccinelle



On Thu, 5 Sep 2019, Denis Efremov wrote:

> This is an RFC. I will resend the patch after feedback. Currently
> I'm preparing big patchset with bsearch warnings fixed. The rule will
> be a part of this patchset if it will be considered good enough for
> checking.
>
> There is a known integer overflow error [1] in the binary search
> algorithm. Google faced it in 2006 [2]. This rule checks midpoint
> calculation in binary search for overflow, i.e., (l + h) / 2.
> Not every match is an actual error since the array could be small
> enough. However, a custom implementation of binary search is
> error-prone and it's better to use the library function (lib/bsearch.c)
> or to apply defensive programming for midpoint calculation.
>
> [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
>
> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
>  create mode 100644 scripts/coccinelle/misc/bsearch.cocci
>
> diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
> new file mode 100644
> index 000000000000..a99d9a8d3ee5
> --- /dev/null
> +++ b/scripts/coccinelle/misc/bsearch.cocci
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/// Check midpoint calculation in binary search algorithm for integer overflow
> +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
> +/// since the array can be small enough. However, a custom implementation of
> +/// binary search is error-prone and it's better to use the library function
> +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
> +///
> +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
> +//
> +// Confidence: Medium
> +// Copyright: (C) 2019 Denis Efremov, ISPRAS
> +// Comments:
> +// Options: --no-includes --include-headers
> +
> +virtual report
> +virtual org
> +
> +@r depends on org || report@
> +identifier l, h, m;
> +statement S;
> +position p;
> +// to match 1 in <<
> +// to match 2 in /
> +// Can't use exact values, e.g. 2, because it fails to match 2L.
> +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
> +constant c;

As far as I can see, you aren't checking for 2 at all at the moment?  You
should be able to say constant c = {2, 2L, etc};.  Actually, we do
consider several variants of 0, so it could be reasonable to allow eg 2 to
match other variants as well.

> +// TODO: Is there an isomorphism for (a / 2) === (a >> 1)?

No.  I'm not sure it's common enough to be worth it.  But it could be
added if it seems like a good idea.

> +@@
> +(
> + while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }
> +//TODO: Is it possible to match "do {} while ();" loops?

Not at the moment.  There is some work on it, but it is not complete
(adding Evan in CC).

julia

> +// |
> +//  do {
> +//   ...
> +// (
> +//   ((l + h)@p / c)
> +// |
> +//   ((l + h)@p >> c)
> +// )
> +//   ...
> +//  } while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\));
> +|
> + for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\);
> +      m = \(((l + h)@p / c)\|((l + h)@p >> c)\)) S
> +|
> + for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\); ...) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }
> +)
> +
> +@script:python depends on org@
> +p << r.p;
> +@@
> +
> +coccilib.org.print_todo(p[0], "WARNING opportunity for bsearch() or 'm = l + (h - l) / 2;'")
> +
> +@script:python depends on report@
> +p << r.p;
> +@@
> +
> +msg="WARNING: custom implementation of bsearch is error-prone. "
> +msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
> +msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
> +coccilib.report.print_report(p[0], msg)
> +
> --
> 2.21.0
>
>
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
  2019-09-05  6:20   ` [Cocci] " Julia Lawall
@ 2019-09-05  7:36     ` Denis Efremov
  -1 siblings, 0 replies; 14+ messages in thread
From: Denis Efremov @ 2019-09-05  7:36 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Coccinelle, Gilles Muller, Nicolas Palix, Michal Marek,
	linux-kernel, tacingiht



On 05.09.2019 09:20, Julia Lawall wrote:
> 
> 
> On Thu, 5 Sep 2019, Denis Efremov wrote:
> 
>> This is an RFC. I will resend the patch after feedback. Currently
>> I'm preparing big patchset with bsearch warnings fixed. The rule will
>> be a part of this patchset if it will be considered good enough for
>> checking.
>>
>> There is a known integer overflow error [1] in the binary search
>> algorithm. Google faced it in 2006 [2]. This rule checks midpoint
>> calculation in binary search for overflow, i.e., (l + h) / 2.
>> Not every match is an actual error since the array could be small
>> enough. However, a custom implementation of binary search is
>> error-prone and it's better to use the library function (lib/bsearch.c)
>> or to apply defensive programming for midpoint calculation.
>>
>> [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
>> [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
>>
>> Signed-off-by: Denis Efremov <efremov@linux.com>
>> ---
>>  scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
>>  1 file changed, 80 insertions(+)
>>  create mode 100644 scripts/coccinelle/misc/bsearch.cocci
>>
>> diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
>> new file mode 100644
>> index 000000000000..a99d9a8d3ee5
>> --- /dev/null
>> +++ b/scripts/coccinelle/misc/bsearch.cocci
>> @@ -0,0 +1,80 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/// Check midpoint calculation in binary search algorithm for integer overflow
>> +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
>> +/// since the array can be small enough. However, a custom implementation of
>> +/// binary search is error-prone and it's better to use the library function
>> +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
>> +///
>> +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
>> +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
>> +//
>> +// Confidence: Medium
>> +// Copyright: (C) 2019 Denis Efremov, ISPRAS
>> +// Comments:
>> +// Options: --no-includes --include-headers
>> +
>> +virtual report
>> +virtual org
>> +
>> +@r depends on org || report@
>> +identifier l, h, m;
>> +statement S;
>> +position p;
>> +// to match 1 in <<
>> +// to match 2 in /
>> +// Can't use exact values, e.g. 2, because it fails to match 2L.
>> +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
>> +constant c;
> 
> As far as I can see, you aren't checking for 2 at all at the moment?

Yes, there are no false positives even without pinning constants to 1, 2.
However, it's better to express this in the rule.

> You
> should be able to say constant c = {2, 2L, etc};.  Actually, we do
> consider several variants of 0, so it could be reasonable to allow eg 2 to
> match other variants as well.

It looks like integer literals aren't fully supported. When I'm trying to write
'constant c = {2L}; ' it fails with int_of_string error.

Denis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cocci] [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05  7:36     ` Denis Efremov
  0 siblings, 0 replies; 14+ messages in thread
From: Denis Efremov @ 2019-09-05  7:36 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Michal Marek, Nicolas Palix, linux-kernel, Coccinelle



On 05.09.2019 09:20, Julia Lawall wrote:
> 
> 
> On Thu, 5 Sep 2019, Denis Efremov wrote:
> 
>> This is an RFC. I will resend the patch after feedback. Currently
>> I'm preparing big patchset with bsearch warnings fixed. The rule will
>> be a part of this patchset if it will be considered good enough for
>> checking.
>>
>> There is a known integer overflow error [1] in the binary search
>> algorithm. Google faced it in 2006 [2]. This rule checks midpoint
>> calculation in binary search for overflow, i.e., (l + h) / 2.
>> Not every match is an actual error since the array could be small
>> enough. However, a custom implementation of binary search is
>> error-prone and it's better to use the library function (lib/bsearch.c)
>> or to apply defensive programming for midpoint calculation.
>>
>> [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
>> [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
>>
>> Signed-off-by: Denis Efremov <efremov@linux.com>
>> ---
>>  scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
>>  1 file changed, 80 insertions(+)
>>  create mode 100644 scripts/coccinelle/misc/bsearch.cocci
>>
>> diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
>> new file mode 100644
>> index 000000000000..a99d9a8d3ee5
>> --- /dev/null
>> +++ b/scripts/coccinelle/misc/bsearch.cocci
>> @@ -0,0 +1,80 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/// Check midpoint calculation in binary search algorithm for integer overflow
>> +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
>> +/// since the array can be small enough. However, a custom implementation of
>> +/// binary search is error-prone and it's better to use the library function
>> +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
>> +///
>> +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
>> +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
>> +//
>> +// Confidence: Medium
>> +// Copyright: (C) 2019 Denis Efremov, ISPRAS
>> +// Comments:
>> +// Options: --no-includes --include-headers
>> +
>> +virtual report
>> +virtual org
>> +
>> +@r depends on org || report@
>> +identifier l, h, m;
>> +statement S;
>> +position p;
>> +// to match 1 in <<
>> +// to match 2 in /
>> +// Can't use exact values, e.g. 2, because it fails to match 2L.
>> +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
>> +constant c;
> 
> As far as I can see, you aren't checking for 2 at all at the moment?

Yes, there are no false positives even without pinning constants to 1, 2.
However, it's better to express this in the rule.

> You
> should be able to say constant c = {2, 2L, etc};.  Actually, we do
> consider several variants of 0, so it could be reasonable to allow eg 2 to
> match other variants as well.

It looks like integer literals aren't fully supported. When I'm trying to write
'constant c = {2L}; ' it fails with int_of_string error.

Denis
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
  2019-09-05  7:36     ` [Cocci] " Denis Efremov
@ 2019-09-05  7:51       ` Julia Lawall
  -1 siblings, 0 replies; 14+ messages in thread
From: Julia Lawall @ 2019-09-05  7:51 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Coccinelle, Gilles Muller, Nicolas Palix, Michal Marek,
	linux-kernel, tacingiht



On Thu, 5 Sep 2019, Denis Efremov wrote:

>
>
> On 05.09.2019 09:20, Julia Lawall wrote:
> >
> >
> > On Thu, 5 Sep 2019, Denis Efremov wrote:
> >
> >> This is an RFC. I will resend the patch after feedback. Currently
> >> I'm preparing big patchset with bsearch warnings fixed. The rule will
> >> be a part of this patchset if it will be considered good enough for
> >> checking.
> >>
> >> There is a known integer overflow error [1] in the binary search
> >> algorithm. Google faced it in 2006 [2]. This rule checks midpoint
> >> calculation in binary search for overflow, i.e., (l + h) / 2.
> >> Not every match is an actual error since the array could be small
> >> enough. However, a custom implementation of binary search is
> >> error-prone and it's better to use the library function (lib/bsearch.c)
> >> or to apply defensive programming for midpoint calculation.
> >>
> >> [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> >> [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
> >>
> >> Signed-off-by: Denis Efremov <efremov@linux.com>
> >> ---
> >>  scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
> >>  1 file changed, 80 insertions(+)
> >>  create mode 100644 scripts/coccinelle/misc/bsearch.cocci
> >>
> >> diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
> >> new file mode 100644
> >> index 000000000000..a99d9a8d3ee5
> >> --- /dev/null
> >> +++ b/scripts/coccinelle/misc/bsearch.cocci
> >> @@ -0,0 +1,80 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/// Check midpoint calculation in binary search algorithm for integer overflow
> >> +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
> >> +/// since the array can be small enough. However, a custom implementation of
> >> +/// binary search is error-prone and it's better to use the library function
> >> +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
> >> +///
> >> +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> >> +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
> >> +//
> >> +// Confidence: Medium
> >> +// Copyright: (C) 2019 Denis Efremov, ISPRAS
> >> +// Comments:
> >> +// Options: --no-includes --include-headers
> >> +
> >> +virtual report
> >> +virtual org
> >> +
> >> +@r depends on org || report@
> >> +identifier l, h, m;
> >> +statement S;
> >> +position p;
> >> +// to match 1 in <<
> >> +// to match 2 in /
> >> +// Can't use exact values, e.g. 2, because it fails to match 2L.
> >> +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
> >> +constant c;
> >
> > As far as I can see, you aren't checking for 2 at all at the moment?
>
> Yes, there are no false positives even without pinning constants to 1, 2.
> However, it's better to express this in the rule.
>
> > You
> > should be able to say constant c = {2, 2L, etc};.  Actually, we do
> > consider several variants of 0, so it could be reasonable to allow eg 2 to
> > match other variants as well.
>
> It looks like integer literals aren't fully supported. When I'm trying to write
> 'constant c = {2L}; ' it fails with int_of_string error.

Oops.  I'll fix it, but since people may be using older versions of
Coccinelle, perhaps it is not worth taking this strategy in this case.
Could you make a disjunction, or check for the proper value in the python
code?

julia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cocci] [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05  7:51       ` Julia Lawall
  0 siblings, 0 replies; 14+ messages in thread
From: Julia Lawall @ 2019-09-05  7:51 UTC (permalink / raw)
  To: Denis Efremov; +Cc: Michal Marek, Nicolas Palix, linux-kernel, Coccinelle



On Thu, 5 Sep 2019, Denis Efremov wrote:

>
>
> On 05.09.2019 09:20, Julia Lawall wrote:
> >
> >
> > On Thu, 5 Sep 2019, Denis Efremov wrote:
> >
> >> This is an RFC. I will resend the patch after feedback. Currently
> >> I'm preparing big patchset with bsearch warnings fixed. The rule will
> >> be a part of this patchset if it will be considered good enough for
> >> checking.
> >>
> >> There is a known integer overflow error [1] in the binary search
> >> algorithm. Google faced it in 2006 [2]. This rule checks midpoint
> >> calculation in binary search for overflow, i.e., (l + h) / 2.
> >> Not every match is an actual error since the array could be small
> >> enough. However, a custom implementation of binary search is
> >> error-prone and it's better to use the library function (lib/bsearch.c)
> >> or to apply defensive programming for midpoint calculation.
> >>
> >> [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> >> [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
> >>
> >> Signed-off-by: Denis Efremov <efremov@linux.com>
> >> ---
> >>  scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++
> >>  1 file changed, 80 insertions(+)
> >>  create mode 100644 scripts/coccinelle/misc/bsearch.cocci
> >>
> >> diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci
> >> new file mode 100644
> >> index 000000000000..a99d9a8d3ee5
> >> --- /dev/null
> >> +++ b/scripts/coccinelle/misc/bsearch.cocci
> >> @@ -0,0 +1,80 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/// Check midpoint calculation in binary search algorithm for integer overflow
> >> +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error
> >> +/// since the array can be small enough. However, a custom implementation of
> >> +/// binary search is error-prone and it's better to use the library function
> >> +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation.
> >> +///
> >> +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues
> >> +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
> >> +//
> >> +// Confidence: Medium
> >> +// Copyright: (C) 2019 Denis Efremov, ISPRAS
> >> +// Comments:
> >> +// Options: --no-includes --include-headers
> >> +
> >> +virtual report
> >> +virtual org
> >> +
> >> +@r depends on org || report@
> >> +identifier l, h, m;
> >> +statement S;
> >> +position p;
> >> +// to match 1 in <<
> >> +// to match 2 in /
> >> +// Can't use exact values, e.g. 2, because it fails to match 2L.
> >> +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc?
> >> +constant c;
> >
> > As far as I can see, you aren't checking for 2 at all at the moment?
>
> Yes, there are no false positives even without pinning constants to 1, 2.
> However, it's better to express this in the rule.
>
> > You
> > should be able to say constant c = {2, 2L, etc};.  Actually, we do
> > consider several variants of 0, so it could be reasonable to allow eg 2 to
> > match other variants as well.
>
> It looks like integer literals aren't fully supported. When I'm trying to write
> 'constant c = {2L}; ' it fails with int_of_string error.

Oops.  I'll fix it, but since people may be using older versions of
Coccinelle, perhaps it is not worth taking this strategy in this case.
Could you make a disjunction, or check for the proper value in the python
code?

julia
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
  2019-09-04 22:12 ` [Cocci] " Denis Efremov
  (?)
@ 2019-09-05 12:32   ` Markus Elfring
  -1 siblings, 0 replies; 14+ messages in thread
From: Markus Elfring @ 2019-09-05 12:32 UTC (permalink / raw)
  To: Denis Efremov, cocci, kernel-janitors
  Cc: linux-kernel, Gilles Muller, Julia Lawall, Masahiro Yamada,
	Michal Marek, Nicolas Palix

> +identifier l, h, m;

Can expressions make sense for these metavariables?


> +@@
> +(
> + while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }

* I suggest again to look at further possibilities to reduce undesirable
  code duplication also together with the usage of SmPL disjunctions.

* The condition specification might be easier to read with a few
  additional spaces (or the following variant).

* The SmPL ellipses will probably need further considerations.


+@@
+(
+ while (
+(       l \( < \| <= \) h
+|       (h - l) > 1
+|       (h - 1) > l
+|       (l + 1) < h
+)      )
+ {
+ <+...
+ ((l + h)@p \( / \| >> \) c)
+ ...+>
+ }


> +@script:python depends on report@
> +p << r.p;
> +@@
> +
> +msg="WARNING: custom implementation of bsearch is error-prone. "
> +msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
> +msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
> +coccilib.report.print_report(p[0], msg)

The Linux coding style supports to put a long string literal also into a single line.
Thus I find such a message construction nicer without the extra variable “msg”.

Regards,
Markus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05 12:32   ` Markus Elfring
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Elfring @ 2019-09-05 12:32 UTC (permalink / raw)
  To: Denis Efremov, cocci, kernel-janitors
  Cc: Michal Marek, Nicolas Palix, linux-kernel

> +identifier l, h, m;

Can expressions make sense for these metavariables?


> +@@
> +(
> + while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }

* I suggest again to look at further possibilities to reduce undesirable
  code duplication also together with the usage of SmPL disjunctions.

* The condition specification might be easier to read with a few
  additional spaces (or the following variant).

* The SmPL ellipses will probably need further considerations.


+@@
+(
+ while (
+(       l \( < \| <= \) h
+|       (h - l) > 1
+|       (h - 1) > l
+|       (l + 1) < h
+)      )
+ {
+ <+...
+ ((l + h)@p \( / \| >> \) c)
+ ...+>
+ }


> +@script:python depends on report@
> +p << r.p;
> +@@
> +
> +msg="WARNING: custom implementation of bsearch is error-prone. "
> +msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
> +msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
> +coccilib.report.print_report(p[0], msg)

The Linux coding style supports to put a long string literal also into a single line.
Thus I find such a message construction nicer without the extra variable “msg”.

Regards,
Markus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cocci] [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05 12:32   ` Markus Elfring
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Elfring @ 2019-09-05 12:32 UTC (permalink / raw)
  To: Denis Efremov, cocci, kernel-janitors
  Cc: Michal Marek, Nicolas Palix, linux-kernel

> +identifier l, h, m;

Can expressions make sense for these metavariables?


> +@@
> +(
> + while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) {
> +  ...
> +(
> +  ((l + h)@p / c)
> +|
> +  ((l + h)@p >> c)
> +)
> +  ...
> + }

* I suggest again to look at further possibilities to reduce undesirable
  code duplication also together with the usage of SmPL disjunctions.

* The condition specification might be easier to read with a few
  additional spaces (or the following variant).

* The SmPL ellipses will probably need further considerations.


+@@
+(
+ while (
+(       l \( < \| <= \) h
+|       (h - l) > 1
+|       (h - 1) > l
+|       (l + 1) < h
+)      )
+ {
+ <+...
+ ((l + h)@p \( / \| >> \) c)
+ ...+>
+ }


> +@script:python depends on report@
> +p << r.p;
> +@@
> +
> +msg="WARNING: custom implementation of bsearch is error-prone. "
> +msg+="Consider using lib/bsearch.c or fix the midpoint calculation "
> +msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow."
> +coccilib.report.print_report(p[0], msg)

The Linux coding style supports to put a long string literal also into a single line.
Thus I find such a message construction nicer without the extra variable “msg”.

Regards,
Markus
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
  2019-09-04 22:12 ` [Cocci] " Denis Efremov
  (?)
@ 2019-09-05 14:23   ` Markus Elfring
  -1 siblings, 0 replies; 14+ messages in thread
From: Markus Elfring @ 2019-09-05 14:23 UTC (permalink / raw)
  To: Denis Efremov, kernel-janitors, cocci
  Cc: linux-kernel, Gilles Muller, Julia Lawall, Masahiro Yamada,
	Michal Marek, Nicolas Palix

> +@@
> +(
> + while (\(…\)) {
> + }

It seems that compound statements are mainly checked for
control flow statements by this source code search approach
so far.
Would you like to handle also single statements (without the
curly brackets)?
(Will additional SmPL disjunctions be needed then?)


> +statement S;
> +|
> + for (...; \(…\);
> +      m = \(…\)) S

* Can the metavariable “S” look nicer on a separate line?

* Should assignments be taken into account for more variables?


> +|
> + for (...; \(…\); ...) {
> + }
> +)

I find the shown case distinction incomplete.
Will loop initialisations trigger further SmPL development challenges?

Regards,
Markus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05 14:23   ` Markus Elfring
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Elfring @ 2019-09-05 14:23 UTC (permalink / raw)
  To: Denis Efremov, kernel-janitors, cocci
  Cc: Michal Marek, Nicolas Palix, linux-kernel

> +@@
> +(
> + while (\(…\)) {
> + }

It seems that compound statements are mainly checked for
control flow statements by this source code search approach
so far.
Would you like to handle also single statements (without the
curly brackets)?
(Will additional SmPL disjunctions be needed then?)


> +statement S;
> +|
> + for (...; \(…\);
> +      m = \(…\)) S

* Can the metavariable “S” look nicer on a separate line?

* Should assignments be taken into account for more variables?


> +|
> + for (...; \(…\); ...) {
> + }
> +)

I find the shown case distinction incomplete.
Will loop initialisations trigger further SmPL development challenges?

Regards,
Markus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cocci] [RFC PATCH] coccinelle: check for integer overflow in binary search
@ 2019-09-05 14:23   ` Markus Elfring
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Elfring @ 2019-09-05 14:23 UTC (permalink / raw)
  To: Denis Efremov, kernel-janitors, cocci
  Cc: Michal Marek, Nicolas Palix, linux-kernel

> +@@
> +(
> + while (\(…\)) {
> + }

It seems that compound statements are mainly checked for
control flow statements by this source code search approach
so far.
Would you like to handle also single statements (without the
curly brackets)?
(Will additional SmPL disjunctions be needed then?)


> +statement S;
> +|
> + for (...; \(…\);
> +      m = \(…\)) S

* Can the metavariable “S” look nicer on a separate line?

* Should assignments be taken into account for more variables?


> +|
> + for (...; \(…\); ...) {
> + }
> +)

I find the shown case distinction incomplete.
Will loop initialisations trigger further SmPL development challenges?

Regards,
Markus
_______________________________________________
Cocci mailing list
Cocci@systeme.lip6.fr
https://systeme.lip6.fr/mailman/listinfo/cocci

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-09-05 14:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-04 22:12 [RFC PATCH] coccinelle: check for integer overflow in binary search Denis Efremov
2019-09-04 22:12 ` [Cocci] " Denis Efremov
2019-09-05  6:20 ` Julia Lawall
2019-09-05  6:20   ` [Cocci] " Julia Lawall
2019-09-05  7:36   ` Denis Efremov
2019-09-05  7:36     ` [Cocci] " Denis Efremov
2019-09-05  7:51     ` Julia Lawall
2019-09-05  7:51       ` [Cocci] " Julia Lawall
2019-09-05 12:32 ` Markus Elfring
2019-09-05 12:32   ` [Cocci] " Markus Elfring
2019-09-05 12:32   ` Markus Elfring
2019-09-05 14:23 ` Markus Elfring
2019-09-05 14:23   ` [Cocci] " Markus Elfring
2019-09-05 14:23   ` Markus Elfring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.