linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Unsigned widening casts of binary "not" operations..
       [not found] <CA+55aFwpLu0qNms=hkQr43yqD0K9DgESNm91OEWKT1ZuT8MU6Q@mail.gmail.com>
@ 2013-04-23  0:23 ` Linus Torvalds
  2013-04-23  8:59   ` David Laight
  2013-04-23  0:32 ` H. Peter Anvin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2013-04-23  0:23 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Miller,
	Theodore Ts'o
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 4910 bytes --]

[ Ugh, resending because I had mistakenly set the html bit and all the
mailing lists just refused the original... ]

So I was playing with sparse again, because the MIPS people had hit a
bug with the fact that they had made PAGE_MASK an *unsigned* type, and
then doing the ~ (binary not) operation on it does the wrong thing
when you operate on bigger types, like hardware 36-bit physical
addresses.

See commit 3b5e50edaf50 (and commit c17a6554 that it reverts).

So the issue is that let's say that you have a constant (or variable,
for that matter) that is of type "unsigned int", and then you use that
to mask a variable of a larger size (say it's an "unsigned long" on a
64-bit arch, or it's a phys_addr_t on a 32-bit arch with PAE). So the
code looks basically like a variation of something like this:

     u64 value = ...;

     value &= ~bitmask;

What happens?

What happens is that the "~bitmask" is done in the *narrower* type,
and then - because the narrower type is unsigned - the cast to the
wider type is done as an *unsigned* cast, so what you *think* happens
is that it clears the bits that are set in "bitmask", but what
*actually* happens is that yes, you clear the bits that are set in
:bitmask", but you *also* clear the upper bits of value.

Now, why am I posting about this MIPS-specific small detail on to x86 people?

Because I hacked up a sparse patch that looks for the pattern of
unsigned casts of a (bit) negation to a wider type, and there's a fair
number of them.

Now, it may well be that my sparse hack (and it really is a hack) is a
broken piece of crap, but from a quick look the warnings it gives look
reasonable.

And there's quite a lot of them. Even in my (fairly small) config I
use on my desktop. And the first warnings I see are in x86 code:

     arch/x86/kernel/traps.c:405:16: warning: implicit unsigned
widening cast of a '~' expression
     arch/x86/kernel/cpu/perf_event_p4.c:912:15: warning: implicit
unsigned widening cast of a '~' expression

that particular one is something that has an *explicit* cast to "u64",
and then it does binary 'and' operations with these things that are of
a 32-bit unsigned type with a binary not in front of them. IOW, the
types in that expression are *very* confused.

Here's a ext4 code snippet that looks like an actual bug (but seems to
only hit read-ahead):

     ext4_fsblk_t b, block;

     b = block & ~(EXT4_SB(sb)->s_inode_readahead_blks-1);

where "b" actually ends up having the upper bits cleared, because the
s_inode_readahead_blks thing is an unsigned int, so you're masking off
not just the low bits, but the high bits too. Ted? Of course, it's
just read-ahead, so it probably doesn't matter, but.

We have a number of generic code examples where this kind of thing
seems harmless:

                 *vm_flags &= ~VM_MERGEABLE;

(we only have flags in the low 32 bits, and the only reason we get
warnings for VM_MERGEABLE is because 0x80000000 is an implicitly
unsigned constant, while 0x40000000 is *not*), or

     kernel/trace/trace.c:2910:32: warning: implicit unsigned widening
cast of a '~' expression

where "trace_flags" is "unsigned long" and "mask" is "unsigned int",
and the expression

     trace_flags &= ~mask;

actually clears the upper 32 bits too, but presumably they are always
clear anyway so we don't *really care. But it's an example of code
that is potentially very subtly dangerous.

Now, a lot of the other warnings I get seem to be more benign - the
networking code results in a lot of these because of code like this:

   #define NLMSG_ALIGNTO      4U
   #define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )

which technically has the same problem if "len" is 64-bit (which
sizeof() is on x86-64), but we don't really *care* because it turns
out that the sizeof values will always have the high bits clear
*anyway*. So I'm not sure the hacky sparse warning is useful, because
my code isn't smart enough to figure out when this kind of widening
cast is a problem, and when it isn't.

That said, I'm cc'ing David and netdev too, just in case. There is
likely some *reason* why it uses an "unsigned int" for a constant that
is then commonly expanded with a binary "not" and the upper bits end
up being surprising. So this thing doesn't look like a bug, but it
does cause these warnings:

     net/netlink/af_netlink.c:1889:38: warning: implicit unsigned
widening cast of a '~' expression

and maybe the networking people care about this and maybe they don't.

(It turns out that any use of "UINT_MAX" for a "long" value also
results in this warning, because we define it as "~0ul", so there are
other cases of spurious things where we *intentionally* drop the high
bits).

ANYWAY.

Only a few of the warnings looked like they might be bugs, but I'm
attaching my sparse patch here in case somebody wants to play with the
hacky thing..

                           Linus

[-- Attachment #2: sparse.diff --]
[-- Type: application/octet-stream, Size: 3804 bytes --]

 compile-i386.c | 10 ----------
 evaluate.c     | 30 +++++++++++++++++++++++++-----
 show-parse.c   |  9 ---------
 symbol.h       |  1 +
 4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/compile-i386.c b/compile-i386.c
index b47095252053..31d3dbca1060 100644
--- a/compile-i386.c
+++ b/compile-i386.c
@@ -176,7 +176,6 @@ static const char *current_section;
 static void emit_comment(const char * fmt, ...) FORMAT_ATTR(1);
 static void emit_move(struct storage *src, struct storage *dest,
 		      struct symbol *ctype, const char *comment);
-static int type_is_signed(struct symbol *sym);
 static struct storage *x86_address_gen(struct expression *expr);
 static struct storage *x86_symbol_expr(struct symbol *sym);
 static void x86_symbol(struct symbol *sym);
@@ -2250,15 +2249,6 @@ static void x86_symbol_init(struct symbol *sym)
 	priv->addr = new;
 }
 
-static int type_is_signed(struct symbol *sym)
-{
-	if (sym->type == SYM_NODE)
-		sym = sym->ctype.base_type;
-	if (sym->type == SYM_PTR)
-		return 0;
-	return !(sym->ctype.modifiers & MOD_UNSIGNED);
-}
-
 static struct storage *x86_label_expr(struct expression *expr)
 {
 	struct storage *new = stack_alloc(4);
diff --git a/evaluate.c b/evaluate.c
index 99dedb5c5bde..5c49c9c92a69 100644
--- a/evaluate.c
+++ b/evaluate.c
@@ -31,6 +31,15 @@ struct symbol *current_fn;
 static struct symbol *degenerate(struct expression *expr);
 static struct symbol *evaluate_symbol(struct symbol *sym);
 
+int type_is_signed(struct symbol *sym)
+{
+	if (sym->type == SYM_NODE)
+		sym = sym->ctype.base_type;
+	if (sym->type == SYM_PTR)
+		return 0;
+	return !(sym->ctype.modifiers & MOD_UNSIGNED);
+}
+
 static struct symbol *evaluate_symbol_expression(struct expression *expr)
 {
 	struct expression *addr;
@@ -282,12 +291,18 @@ static struct expression * cast_to(struct expression *old, struct symbol *type)
 	 */
 	switch (old->type) {
 	case EXPR_PREOP:
-		if (old->ctype->bit_size < type->bit_size)
+		if (old->op == '~' || old->op == '-') {
+			if (old->ctype->bit_size > type->bit_size) {
+				old->ctype = type;
+				old->unop = cast_to(old->unop, type);
+				return old;
+			}
+			if (old->ctype->bit_size == type->bit_size)
+				break;
+			if (type_is_signed(old->ctype))
+				break;
+			warning(old->pos, "implicit unsigned widening cast of a '%c' expression", old->op);
 			break;
-		if (old->op == '~') {
-			old->ctype = type;
-			old->unop = cast_to(old->unop, type);
-			return old;
 		}
 		break;
 
@@ -310,6 +325,7 @@ static struct expression * cast_to(struct expression *old, struct symbol *type)
 		/* nothing */;
 	}
 
+
 	expr = alloc_expression(old->pos, EXPR_IMPLIED_CAST);
 	expr->flags = old->flags;
 	expr->ctype = type;
@@ -1757,6 +1773,10 @@ Normal:
 	}
 	if (expr->op == '+')
 		*expr = *expr->unop;
+	else if (expr->unop->type == EXPR_PREOP && expr->unop->op == expr->op) {
+		if (expr->unop->ctype == ctype)
+			*expr = *expr->unop->unop;
+	}
 	expr->ctype = ctype;
 	return ctype;
 Restr:
diff --git a/show-parse.c b/show-parse.c
index 1333e301c151..c8e864476fac 100644
--- a/show-parse.c
+++ b/show-parse.c
@@ -933,15 +933,6 @@ static int show_symbol_init(struct symbol *sym)
 	return 0;
 }
 
-static int type_is_signed(struct symbol *sym)
-{
-	if (sym->type == SYM_NODE)
-		sym = sym->ctype.base_type;
-	if (sym->type == SYM_PTR)
-		return 0;
-	return !(sym->ctype.modifiers & MOD_UNSIGNED);
-}
-
 static int show_cast_expr(struct expression *expr)
 {
 	struct symbol *old_type, *new_type;
diff --git a/symbol.h b/symbol.h
index 1e745799a8d0..6928faf4fec6 100644
--- a/symbol.h
+++ b/symbol.h
@@ -389,5 +389,6 @@ extern int is_ptr_type(struct symbol *);
 
 void create_fouled(struct symbol *type);
 struct symbol *befoul(struct symbol *type);
+int type_is_signed(struct symbol *sym);
 
 #endif /* SYMBOL_H */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
       [not found] <CA+55aFwpLu0qNms=hkQr43yqD0K9DgESNm91OEWKT1ZuT8MU6Q@mail.gmail.com>
  2013-04-23  0:23 ` Unsigned widening casts of binary "not" operations Linus Torvalds
@ 2013-04-23  0:32 ` H. Peter Anvin
  2013-04-23 13:00 ` Theodore Ts'o
  2013-04-24  7:26 ` Ingo Molnar
  3 siblings, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-23  0:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On 04/22/2013 05:15 PM, Linus Torvalds wrote:
> 
> (It turns out that any use of "UINT_MAX" for a "long" value also results in
> this warning, because we define it as "~0ul", so there are other cases of
> spurious things where we *intentionally* drop the high bits).
> 

I'm responsible for this particular piece of drain bramage... we should
probably just open-code the correct types conditional on the sizes like
userspace does.

	-hpa



^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: Unsigned widening casts of binary "not" operations..
  2013-04-23  0:23 ` Unsigned widening casts of binary "not" operations Linus Torvalds
@ 2013-04-23  8:59   ` David Laight
  2013-04-23 14:29     ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: David Laight @ 2013-04-23  8:59 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	David Miller, Theodore Ts'o
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 708 bytes --]

> What happens is that the "~bitmask" is done in the *narrower* type,
> and then - because the narrower type is unsigned - the cast to the
> wider type is done as an *unsigned* cast, so what you *think* happens
> is that it clears the bits that are set in "bitmask", but what
> *actually* happens is that yes, you clear the bits that are set in
> :bitmask", but you *also* clear the upper bits of value.

If the narrower type is signed it is probably even more confusing!
The high bits will be preserved unless you are masking off bit 31.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
       [not found] <CA+55aFwpLu0qNms=hkQr43yqD0K9DgESNm91OEWKT1ZuT8MU6Q@mail.gmail.com>
  2013-04-23  0:23 ` Unsigned widening casts of binary "not" operations Linus Torvalds
  2013-04-23  0:32 ` H. Peter Anvin
@ 2013-04-23 13:00 ` Theodore Ts'o
  2013-04-24  7:26 ` Ingo Molnar
  3 siblings, 0 replies; 41+ messages in thread
From: Theodore Ts'o @ 2013-04-23 13:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Miller,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On Mon, Apr 22, 2013 at 05:15:19PM -0700, Linus Torvalds wrote:
> Here's a ext4 code snippet that looks like an actual bug (but seems to only
> hit read-ahead):
> 
>     ext4_fsblk_t b, block;
> 
>     b = block & ~(EXT4_SB(sb)->s_inode_readahead_blks-1);
> 
> where "b" actually ends up having the upper bits cleared, because the
> s_inode_readahead_blks thing is an unsigned int, so you're masking off not
> just the low bits, but the high bits too. Ted? Of course, it's just
> read-ahead, so it probably doesn't matter, but.

Yep, it's a bug alright.  Thanks for catching it!

     	    		  	     	      - Ted


>From 0d606e2c9fccdd4e67febf1e2da500e1bfe9e045 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Tue, 23 Apr 2013 08:59:35 -0400
Subject: [PATCH] ext4: fix type-widening bug in inode table readahead code

Due to a missing cast, the high 32-bits of a 64-bit block number used
when calculating the readahead block for inode tables can get lost.
This means we can end up fetching the wrong blocks for readahead for
file systems > 16TB.

Linus found this when experimenting with an enhacement to the sparse
static code checker which checks for missing widening casts before
binary "not" operators.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d7518e2..793d44b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4011,13 +4011,14 @@ make_io:
 		if (EXT4_SB(sb)->s_inode_readahead_blks) {
 			ext4_fsblk_t b, end, table;
 			unsigned num;
+			__u32 ra_blks = EXT4_SB(sb)->s_inode_readahead_blks;
 
 			table = ext4_inode_table(sb, gdp);
 			/* s_inode_readahead_blks is always a power of 2 */
-			b = block & ~(EXT4_SB(sb)->s_inode_readahead_blks-1);
+			b = block & ~((ext4_fsblk_t) ra_blks - 1);
 			if (table > b)
 				b = table;
-			end = b + EXT4_SB(sb)->s_inode_readahead_blks;
+			end = b + ra_blks;
 			num = EXT4_INODES_PER_GROUP(sb);
 			if (ext4_has_group_desc_csum(sb))
 				num -= ext4_itable_unused_count(sb, gdp);
-- 
1.7.12.rc0.22.gcdd159b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23  8:59   ` David Laight
@ 2013-04-23 14:29     ` Linus Torvalds
  2013-04-23 15:24       ` David Laight
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2013-04-23 14:29 UTC (permalink / raw)
  To: David Laight
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Miller,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On Tue, Apr 23, 2013 at 1:59 AM, David Laight <David.Laight@aculab.com> wrote:
>
> If the narrower type is signed it is probably even more confusing!
> The high bits will be preserved unless you are masking off bit 31.

Yes. However, that case doesn't trigger with the normal case of small
values. So "~4" works fine with widening, in a way that "~4u" does
not.

Which doesn't mean that you aren't right, it only means that it's
harder to check for in sparse. The hacky little patch I sent out with
already resulted in a lot of noise for things like "~0u" (UINT_MAX)
and the "~4u" use in NLMSG_ALIGNTO, I'd dread to do the same for
signed values.

That said, with the most minimal value analysis (ie only looking at
constants), maybe it wouldn't be too bad. I started out just worrying
about the PAGE_MASK case, though, where we're talking about (somewhat
complicated) generated constants, and then the signed case is largely
irrelevant (although a signed "1 << 31" would - as you say - trigger
this same thing too).

                         Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: Unsigned widening casts of binary "not" operations..
  2013-04-23 14:29     ` Linus Torvalds
@ 2013-04-23 15:24       ` David Laight
  2013-04-23 15:42         ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: David Laight @ 2013-04-23 15:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Miller,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 631 bytes --]

> > If the narrower type is signed it is probably even more confusing!
> > The high bits will be preserved unless you are masking off bit 31.
> 
> Yes. However, that case doesn't trigger with the normal case of small
> values. So "~4" works fine with widening, in a way that "~4u" does
> not.

Thinks ... converting:
	foo &= ~bar;
to:
	foo = ~(~foo | bar);
would generally DTRT.
Whether the compiler has the relevant patterns to optimise
it is another question.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 15:24       ` David Laight
@ 2013-04-23 15:42         ` Linus Torvalds
  2013-04-23 15:52           ` Theodore Ts'o
  2013-04-23 17:37           ` David Miller
  0 siblings, 2 replies; 41+ messages in thread
From: Linus Torvalds @ 2013-04-23 15:42 UTC (permalink / raw)
  To: David Laight
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Miller,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On Tue, Apr 23, 2013 at 8:24 AM, David Laight <David.Laight@aculab.com> wrote:
>
> Thinks ... converting:
>         foo &= ~bar;
> to:
>         foo = ~(~foo | bar);
> would generally DTRT.

Quite frankly, I'd rather just try to make people (re)move the widening cast.

So while

    foo &= ~bar;

is unsafe if bar is narrower than foo, it's unsafe simply because the
implicit cast from the C type conversions comes *after* the binary
not.

An explicit cast fixes it, and shows that you were aware of the issue:

   foo &= ~(foo_t)bar;

and gcc will generate the right logic. Of course, casts then have
their own problems, which your thing avoids (as would just having a
"andn" operation in C)

The best case is for code that does bitmasking ops like this to avoid
any casts (implicit or explicit) by just avoiding mixed types. That
was, to a large degree, my hope for the sparse patch, but it's
nontrivial. Many types are rather "natural" (ie constants have a
natural "int" type, sizeof() is size_t, etc) and in other cases you
want to do the same ops for two different types (with the case that
caused me to start to look at it being the "align to page boundary"
for a virtual address vs a PAE phys_addr_t) so you can't really avoid
mixing things in some circumstances.

                                 Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 15:42         ` Linus Torvalds
@ 2013-04-23 15:52           ` Theodore Ts'o
  2013-04-23 16:05             ` Linus Torvalds
  2013-04-23 17:37           ` David Miller
  1 sibling, 1 reply; 41+ messages in thread
From: Theodore Ts'o @ 2013-04-23 15:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Laight, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	David Miller, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On Tue, Apr 23, 2013 at 08:42:49AM -0700, Linus Torvalds wrote:
> The best case is for code that does bitmasking ops like this to avoid
> any casts (implicit or explicit) by just avoiding mixed types. That
> was, to a large degree, my hope for the sparse patch, but it's
> nontrivial. Many types are rather "natural" (ie constants have a
> natural "int" type, sizeof() is size_t, etc) and in other cases you
> want to do the same ops for two different types (with the case that
> caused me to start to look at it being the "align to page boundary"
> for a virtual address vs a PAE phys_addr_t) so you can't really avoid
> mixing things in some circumstances.

Maybe it's worth creating a magic helper function, called something
like mask_out() that handles the casting automatically, and it makes
it clear to a reader what you're trying to do?

   	      	     	  	 	   - Ted

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 15:52           ` Theodore Ts'o
@ 2013-04-23 16:05             ` Linus Torvalds
  0 siblings, 0 replies; 41+ messages in thread
From: Linus Torvalds @ 2013-04-23 16:05 UTC (permalink / raw)
  To: Theodore Ts'o, Linus Torvalds, David Laight, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner, David Miller,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On Tue, Apr 23, 2013 at 8:52 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>
> Maybe it's worth creating a magic helper function, called something
> like mask_out() that handles the casting automatically, and it makes
> it clear to a reader what you're trying to do?

We have that for some things. Like the aligning code (see
include/{uapi/}linux/kernel.h). We have that whole ALIGN()
infrastructure that casts the alignment to the result type, exactly
because people got things wrong so often (and exactly the "& ~mask"
thing in particular).

But doing so becomes *less* readable when the types already match, and
it's another extra complication in kernel programming to know all the
"oh, don't use the standard C &~ constructs because it has some subtle
type handling under *some* circumstances". So for aligning things up,
we've been able to do it, because people hate doing that duplicated "
(val + mask) & ~mask" thing anyway (especially since "mask" tends to
be something like "size-1". IOW, having a helper function/macro ends
up solving more than just the type issue. But if it's just the type,
it ends up being very inconvenient.

We don't have the equivalent "align down" macro, for example, exactly
because aligning things down is *just* the logical mask, and so in
most cases it's actually just more pain to have a macro helper. So
ALIGN() only aligns upwards, even though it doesn't even say so in the
name. Inconsistent? Unclear? Yes, but there's a reason for it.

                  Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 15:42         ` Linus Torvalds
  2013-04-23 15:52           ` Theodore Ts'o
@ 2013-04-23 17:37           ` David Miller
  2013-04-23 17:52             ` Linus Torvalds
  2013-04-24 12:36             ` Geert Uytterhoeven
  1 sibling, 2 replies; 41+ messages in thread
From: David Miller @ 2013-04-23 17:37 UTC (permalink / raw)
  To: torvalds
  Cc: David.Laight, mingo, hpa, tglx, tytso, linux-kernel, x86, netdev,
	linux-ext4

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 23 Apr 2013 08:42:49 -0700

> An explicit cast fixes it, and shows that you were aware of the issue:
> 
>    foo &= ~(foo_t)bar;
> 
> and gcc will generate the right logic. Of course, casts then have
> their own problems, which your thing avoids (as would just having a
> "andn" operation in C)

I just want to mention that this is dangerous in different ways, we
just recently got a patch in the networking that removed such a cast.
The problem is when the cast narrows, f.e.:

	~(u8)0

doesn't do what you think it does.  That doesn't evaluate to 0xff.

You all are very bright and probably know this already.

So,if it widens, which is the situation we're talking about, you're
good.  But until I saw the above u8 thing I never suspected that
narrowing in this kind of expression was dangerous.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 17:37           ` David Miller
@ 2013-04-23 17:52             ` Linus Torvalds
  2013-04-23 17:56               ` David Miller
  2013-04-24 12:36             ` Geert Uytterhoeven
  1 sibling, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2013-04-23 17:52 UTC (permalink / raw)
  To: David Miller
  Cc: David Laight, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On Tue, Apr 23, 2013 at 10:37 AM, David Miller <davem@davemloft.net> wrote:
>
> I just want to mention that this is dangerous in different ways, we
> just recently got a patch in the networking that removed such a cast.
> The problem is when the cast narrows, f.e.:
>
>         ~(u8)0
>
> doesn't do what you think it does.  That doesn't evaluate to 0xff.

Yeah, sparse will get that right, but won't warn about it even with my
patch. The normal "all arithmetic is done in *at*least* 'int'" will
always kick any C expression like that up to 'int' before the binary
not op is done. So in your example, the implicit cast is widening the
value *before* the binary not, not after.

                     Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 17:52             ` Linus Torvalds
@ 2013-04-23 17:56               ` David Miller
  2013-04-23 18:21                 ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2013-04-23 17:56 UTC (permalink / raw)
  To: torvalds
  Cc: David.Laight, mingo, hpa, tglx, tytso, linux-kernel, x86, netdev,
	linux-ext4

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 23 Apr 2013 10:52:33 -0700

> On Tue, Apr 23, 2013 at 10:37 AM, David Miller <davem@davemloft.net> wrote:
>>
>> I just want to mention that this is dangerous in different ways, we
>> just recently got a patch in the networking that removed such a cast.
>> The problem is when the cast narrows, f.e.:
>>
>>         ~(u8)0
>>
>> doesn't do what you think it does.  That doesn't evaluate to 0xff.
> 
> Yeah, sparse will get that right, but won't warn about it even with my
> patch. The normal "all arithmetic is done in *at*least* 'int'" will
> always kick any C expression like that up to 'int' before the binary
> not op is done. So in your example, the implicit cast is widening the
> value *before* the binary not, not after.

If you're not bored, and could add a check for that kind of narrowing
situation, I'd really appreciate it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 17:56               ` David Miller
@ 2013-04-23 18:21                 ` Linus Torvalds
  0 siblings, 0 replies; 41+ messages in thread
From: Linus Torvalds @ 2013-04-23 18:21 UTC (permalink / raw)
  To: David Miller
  Cc: David Laight, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On Tue, Apr 23, 2013 at 10:56 AM, David Miller <davem@davemloft.net> wrote:
>
> If you're not bored, and could add a check for that kind of narrowing
> situation, I'd really appreciate it.

It's pretty non-trivial.

For example, this is perfectly normal C:

   char a,b;
   ...
   a &= ~b;

and the arithmetic is technically done in "int" (because C really
never does any arithmetic in a narrower type). So the value of 'b'
will be widened to int before doing the binary not, and then the
binary 'and' will be done in 'int', and then in the end it will be
cast down to 'char' again and written back to 'a'.

And none of that matters. All the normal integer operations (both
arithmetic and logical) are "stable" in the smaller size. The upper
bits can be basically random uninteresting crap (ie do zero or sign
extension depending on the smaller type), but they don't matter
because they go away in the end. In fact, the compiler will usually
then end up doing the actual arithmetic/logical op in the narrower
type, even though *conceptually* there were lots of casts going on
back and forth.

The point being that at the actual time of the widening, it's hard to
say whether "~(u8)0" is problematic or not - it will depend on how it
ends up being used. At a much later point, sparse will have optimized
away the casts and will indeed linearize it to a simple 8-bit
operation, but by then sparse will *also* have simplified away the
obviously constant arithmetic, so by the time the casts back and forth
are gone, so is the logical "not", and all you have left of the
"~(u8)0" is either a 8-bit value (255) or an "int" (-1) depending on
how it was used.

Put another way: it's not possible to say that "~(u8)0" is wrong early
on (because it may be part of something that ends up only doing byte
arithmetic) and it is *also* not possible to say that it's invalid
much later on, because by then we will have munged it into something
totally different.

I'm not saying it's impossible to do. You could certainly do it with
sparse, but it would involve a fair amount of effort. You'd have to
add a whole new separate phase of "type narrowing": sparse *does* do
that as part of the simplification, but right now it's done while it
does all the *other* simplification too, so ..

Now, if you only wanted the warning when there is an *explicit* cast,
that would be different. It would be easy enough to find the pattern
where we do a "~" op on an explicit cast to something smaller than
"int" (which means that there would be an implicit cast after the
explicit one). But quite frankly, you could likely do that almost
equally well with just a judicious use of "grep".

                      Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
       [not found] <CA+55aFwpLu0qNms=hkQr43yqD0K9DgESNm91OEWKT1ZuT8MU6Q@mail.gmail.com>
                   ` (2 preceding siblings ...)
  2013-04-23 13:00 ` Theodore Ts'o
@ 2013-04-24  7:26 ` Ingo Molnar
  2013-04-24  7:47   ` Cyrill Gorcunov
                     ` (2 more replies)
  3 siblings, 3 replies; 41+ messages in thread
From: Ingo Molnar @ 2013-04-24  7:26 UTC (permalink / raw)
  To: Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra
  Cc: H. Peter Anvin, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Oleg Nesterov,
	Fr??d??ric Weisbecker


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> And there's quite a lot of them. Even in my (fairly small) config I use on
> my desktop. And the first warnings I see are in x86 code:
> 
>     arch/x86/kernel/traps.c:405:16: warning: implicit unsigned widening
> cast of a '~' expression
>     arch/x86/kernel/cpu/perf_event_p4.c:912:15: warning: implicit unsigned
> widening cast of a '~' expression

Hm, the perf_event_p4.c code is indeed confused.

I think the bug is real but probably benign in effect: we allow narrower 
values into the MSR register than probably intended. Only a couple of low 
bits are reserved AFAICS.

Here's an (untested!) patch that tries to untangle it all: it just moves 
to clean 64-bit types everywhere - these MSRs are 64-bit wide regardless 
of whether we run on 32-bit or not.

Would be nice if someone with a working P4 could test it - Cyrill? [It 
should also be double checked whether the high words are really not 
reserved and can be written to ...]

Thanks,

	Ingo
---->


Linus, while extending integer type extension checks in the sparse static 
code checker, found fragile patterns of mixed signed/unsigned 
64-bit/32-bit integer use in perf_events_p4.c.

The relevant hardware register ABI is 64 bit wide on 32-bit kernels as 
well, so clean it all up a bit, remove unnecessary casts, and make sure we 
use 64-bit unsigned integers in these places.

Signed-off-by: Ingo Molnar <mingo@kernel.org>


diff --git a/arch/x86/include/asm/perf_event_p4.h b/arch/x86/include/asm/perf_event_p4.h
index 4f7e67e..85e13cc 100644
--- a/arch/x86/include/asm/perf_event_p4.h
+++ b/arch/x86/include/asm/perf_event_p4.h
@@ -24,45 +24,45 @@
 #define ARCH_P4_CNTRVAL_MASK	((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
 #define ARCH_P4_UNFLAGGED_BIT	((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))
 
-#define P4_ESCR_EVENT_MASK	0x7e000000U
+#define P4_ESCR_EVENT_MASK	0x7e000000ULL
 #define P4_ESCR_EVENT_SHIFT	25
-#define P4_ESCR_EVENTMASK_MASK	0x01fffe00U
+#define P4_ESCR_EVENTMASK_MASK	0x01fffe00ULL
 #define P4_ESCR_EVENTMASK_SHIFT	9
-#define P4_ESCR_TAG_MASK	0x000001e0U
+#define P4_ESCR_TAG_MASK	0x000001e0ULL
 #define P4_ESCR_TAG_SHIFT	5
-#define P4_ESCR_TAG_ENABLE	0x00000010U
-#define P4_ESCR_T0_OS		0x00000008U
-#define P4_ESCR_T0_USR		0x00000004U
-#define P4_ESCR_T1_OS		0x00000002U
-#define P4_ESCR_T1_USR		0x00000001U
+#define P4_ESCR_TAG_ENABLE	0x00000010ULL
+#define P4_ESCR_T0_OS		0x00000008ULL
+#define P4_ESCR_T0_USR		0x00000004ULL
+#define P4_ESCR_T1_OS		0x00000002ULL
+#define P4_ESCR_T1_USR		0x00000001ULL
 
 #define P4_ESCR_EVENT(v)	((v) << P4_ESCR_EVENT_SHIFT)
 #define P4_ESCR_EMASK(v)	((v) << P4_ESCR_EVENTMASK_SHIFT)
 #define P4_ESCR_TAG(v)		((v) << P4_ESCR_TAG_SHIFT)
 
-#define P4_CCCR_OVF			0x80000000U
-#define P4_CCCR_CASCADE			0x40000000U
-#define P4_CCCR_OVF_PMI_T0		0x04000000U
-#define P4_CCCR_OVF_PMI_T1		0x08000000U
-#define P4_CCCR_FORCE_OVF		0x02000000U
-#define P4_CCCR_EDGE			0x01000000U
-#define P4_CCCR_THRESHOLD_MASK		0x00f00000U
+#define P4_CCCR_OVF			0x80000000ULL
+#define P4_CCCR_CASCADE			0x40000000ULL
+#define P4_CCCR_OVF_PMI_T0		0x04000000ULL
+#define P4_CCCR_OVF_PMI_T1		0x08000000ULL
+#define P4_CCCR_FORCE_OVF		0x02000000ULL
+#define P4_CCCR_EDGE			0x01000000ULL
+#define P4_CCCR_THRESHOLD_MASK		0x00f00000ULL
 #define P4_CCCR_THRESHOLD_SHIFT		20
-#define P4_CCCR_COMPLEMENT		0x00080000U
-#define P4_CCCR_COMPARE			0x00040000U
-#define P4_CCCR_ESCR_SELECT_MASK	0x0000e000U
+#define P4_CCCR_COMPLEMENT		0x00080000ULL
+#define P4_CCCR_COMPARE			0x00040000ULL
+#define P4_CCCR_ESCR_SELECT_MASK	0x0000e000ULL
 #define P4_CCCR_ESCR_SELECT_SHIFT	13
-#define P4_CCCR_ENABLE			0x00001000U
-#define P4_CCCR_THREAD_SINGLE		0x00010000U
-#define P4_CCCR_THREAD_BOTH		0x00020000U
-#define P4_CCCR_THREAD_ANY		0x00030000U
-#define P4_CCCR_RESERVED		0x00000fffU
+#define P4_CCCR_ENABLE			0x00001000ULL
+#define P4_CCCR_THREAD_SINGLE		0x00010000ULL
+#define P4_CCCR_THREAD_BOTH		0x00020000ULL
+#define P4_CCCR_THREAD_ANY		0x00030000ULL
+#define P4_CCCR_RESERVED		0x00000fffULL
 
 #define P4_CCCR_THRESHOLD(v)		((v) << P4_CCCR_THRESHOLD_SHIFT)
 #define P4_CCCR_ESEL(v)			((v) << P4_CCCR_ESCR_SELECT_SHIFT)
 
 #define P4_GEN_ESCR_EMASK(class, name, bit)	\
-	class##__##name = ((1 << bit) << P4_ESCR_EVENTMASK_SHIFT)
+	class##__##name = ((1ULL << bit) << P4_ESCR_EVENTMASK_SHIFT)
 #define P4_ESCR_EMASK_BIT(class, name)		class##__##name
 
 /*
@@ -107,7 +107,7 @@
  * P4_PEBS_CONFIG_MASK and related bits on
  * modification.)
  */
-#define P4_CONFIG_ALIASABLE		(1 << 9)
+#define P4_CONFIG_ALIASABLE		(1ULL << 9)
 
 /*
  * The bits we allow to pass for RAW events
@@ -784,17 +784,17 @@ enum P4_ESCR_EMASKS {
  * Note we have UOP and PEBS bits reserved for now
  * just in case if we will need them once
  */
-#define P4_PEBS_CONFIG_ENABLE		(1 << 7)
-#define P4_PEBS_CONFIG_UOP_TAG		(1 << 8)
-#define P4_PEBS_CONFIG_METRIC_MASK	0x3f
-#define P4_PEBS_CONFIG_MASK		0xff
+#define P4_PEBS_CONFIG_ENABLE		(1ULL << 7)
+#define P4_PEBS_CONFIG_UOP_TAG		(1ULL << 8)
+#define P4_PEBS_CONFIG_METRIC_MASK	0x3FLL
+#define P4_PEBS_CONFIG_MASK		0xFFLL
 
 /*
  * mem: Only counters MSR_IQ_COUNTER4 (16) and
  * MSR_IQ_COUNTER5 (17) are allowed for PEBS sampling
  */
-#define P4_PEBS_ENABLE			0x02000000U
-#define P4_PEBS_ENABLE_UOP_TAG		0x01000000U
+#define P4_PEBS_ENABLE			0x02000000ULL
+#define P4_PEBS_ENABLE_UOP_TAG		0x01000000ULL
 
 #define p4_config_unpack_metric(v)	(((u64)(v)) & P4_PEBS_CONFIG_METRIC_MASK)
 #define p4_config_unpack_pebs(v)	(((u64)(v)) & P4_PEBS_CONFIG_MASK)
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 92c7e39..3486e66 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -895,8 +895,8 @@ static void p4_pmu_disable_pebs(void)
 	 * So at moment let leave metrics turned on forever -- it's
 	 * ok for now but need to be revisited!
 	 *
-	 * (void)wrmsrl_safe(MSR_IA32_PEBS_ENABLE, (u64)0);
-	 * (void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT, (u64)0);
+	 * (void)wrmsrl_safe(MSR_IA32_PEBS_ENABLE, 0);
+	 * (void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT, 0);
 	 */
 }
 
@@ -910,8 +910,7 @@ static inline void p4_pmu_disable_event(struct perf_event *event)
 	 * asserted again and again
 	 */
 	(void)wrmsrl_safe(hwc->config_base,
-		(u64)(p4_config_unpack_cccr(hwc->config)) &
-			~P4_CCCR_ENABLE & ~P4_CCCR_OVF & ~P4_CCCR_RESERVED);
+		p4_config_unpack_cccr(hwc->config) & ~P4_CCCR_ENABLE & ~P4_CCCR_OVF & ~P4_CCCR_RESERVED);
 }
 
 static void p4_pmu_disable_all(void)
@@ -957,7 +956,7 @@ static void p4_pmu_enable_event(struct perf_event *event)
 	u64 escr_addr, cccr;
 
 	bind = &p4_event_bind_map[idx];
-	escr_addr = (u64)bind->escr_msr[thread];
+	escr_addr = bind->escr_msr[thread];
 
 	/*
 	 * - we dont support cascaded counters yet


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-24  7:26 ` Ingo Molnar
@ 2013-04-24  7:47   ` Cyrill Gorcunov
  2013-04-25  1:13     ` Lin Ming
  2013-04-24 17:07   ` [PATCH] x86: make DR*_RESERVED unsigned long Oleg Nesterov
  2013-04-26 14:20   ` [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types tip-bot for Ingo Molnar
  2 siblings, 1 reply; 41+ messages in thread
From: Cyrill Gorcunov @ 2013-04-24  7:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, H. Peter Anvin, Thomas Gleixner,
	David Miller, Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4,
	Oleg Nesterov, Frederic Weisbecker, Lin Ming

On Wed, Apr 24, 2013 at 09:26:30AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > And there's quite a lot of them. Even in my (fairly small) config I use on
> > my desktop. And the first warnings I see are in x86 code:
> > 
> >     arch/x86/kernel/traps.c:405:16: warning: implicit unsigned widening
> > cast of a '~' expression
> >     arch/x86/kernel/cpu/perf_event_p4.c:912:15: warning: implicit unsigned
> > widening cast of a '~' expression
> 
> Hm, the perf_event_p4.c code is indeed confused.
> 
> I think the bug is real but probably benign in effect: we allow narrower 
> values into the MSR register than probably intended. Only a couple of low 
> bits are reserved AFAICS.
> 
> Here's an (untested!) patch that tries to untangle it all: it just moves 
> to clean 64-bit types everywhere - these MSRs are 64-bit wide regardless 
> of whether we run on 32-bit or not.
> 
> Would be nice if someone with a working P4 could test it - Cyrill? [It 
> should also be double checked whether the high words are really not 
> reserved and can be written to ...]

Hi Ingo! Ufortunately I don't have access to real p4 hardware,
thus I'm CC'ing Ming who has been helping a lot in testing
this code pieces.

Still the patch itself is perfectly fine to me

Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-23 17:37           ` David Miller
  2013-04-23 17:52             ` Linus Torvalds
@ 2013-04-24 12:36             ` Geert Uytterhoeven
  1 sibling, 0 replies; 41+ messages in thread
From: Geert Uytterhoeven @ 2013-04-24 12:36 UTC (permalink / raw)
  To: David Miller
  Cc: Linus Torvalds, David.Laight, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner, Theodore Ts'o, linux-kernel,
	the arch/x86 maintainers, netdev, linux-ext4

On Tue, Apr 23, 2013 at 7:37 PM, David Miller <davem@davemloft.net> wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 23 Apr 2013 08:42:49 -0700
>
>> An explicit cast fixes it, and shows that you were aware of the issue:
>>
>>    foo &= ~(foo_t)bar;
>>
>> and gcc will generate the right logic. Of course, casts then have
>> their own problems, which your thing avoids (as would just having a
>> "andn" operation in C)
>
> I just want to mention that this is dangerous in different ways, we
> just recently got a patch in the networking that removed such a cast.
> The problem is when the cast narrows, f.e.:
>
>         ~(u8)0
>
> doesn't do what you think it does.  That doesn't evaluate to 0xff.

This is the definition of MAC802154_CHAN_NONE?

We _should_ have noticed this earlier, as old gcc (e.g. 4.1.2) emits a
warning when comparing it to a u8:

net/mac802154/monitor.c: In function ‘mac802154_monitor_xmit’:
net/mac802154/monitor.c:49: warning: comparison is always false due to
limited range of data type
net/mac802154/wpan.c: In function ‘mac802154_wpan_xmit’:
net/mac802154/wpan.c:323: warning: comparison is always false due to
limited range of data type

Interestingly, none of this is seen in the build logs of the linux-next build
service, which uses gcc 4.2.3, 4.2.4, 4.5.1, and 4.6...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24  7:26 ` Ingo Molnar
  2013-04-24  7:47   ` Cyrill Gorcunov
@ 2013-04-24 17:07   ` Oleg Nesterov
  2013-04-24 18:45     ` H. Peter Anvin
  2013-04-24 22:48     ` [PATCH] " Frederic Weisbecker
  2013-04-26 14:20   ` [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types tip-bot for Ingo Molnar
  2 siblings, 2 replies; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-24 17:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra, H. Peter Anvin,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
bits in the "unsigned long" data, make them long to ensure that
"&~" doesn't clear the upper bits.

do_debug() and ptrace_write_dr7() which use DR*_RESERVED look
safe, but probably it makes sense to cleanup anyway.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/uapi/asm/debugreg.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index 3c0874d..75d07dd 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -15,7 +15,7 @@
    are either reserved or not of interest to us. */
 
 /* Define reserved bits in DR6 which are always set to 1 */
-#define DR6_RESERVED	(0xFFFF0FF0)
+#define DR6_RESERVED	(0xFFFF0FF0ul)
 
 #define DR_TRAP0	(0x1)		/* db0 */
 #define DR_TRAP1	(0x2)		/* db1 */
@@ -65,7 +65,7 @@
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
 #ifdef __i386__
-#define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
+#define DR_CONTROL_RESERVED (0xFC00ul) /* Reserved by Intel */
 #else
 #define DR_CONTROL_RESERVED (0xFFFFFFFF0000FC00UL) /* Reserved */
 #endif
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24 17:07   ` [PATCH] x86: make DR*_RESERVED unsigned long Oleg Nesterov
@ 2013-04-24 18:45     ` H. Peter Anvin
  2013-04-25 14:48       ` Oleg Nesterov
  2013-04-24 22:48     ` [PATCH] " Frederic Weisbecker
  1 sibling, 1 reply; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-24 18:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

On 04/24/2013 10:07 AM, Oleg Nesterov wrote:
> DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
> bits in the "unsigned long" data, make them long to ensure that
> "&~" doesn't clear the upper bits.
> 
> do_debug() and ptrace_write_dr7() which use DR*_RESERVED look
> safe, but probably it makes sense to cleanup anyway.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Can you please put in the description if this is a manifest or
non-manifest bug and, if manifest, what the issues are?  It greatly
affects what we otherwise have to do to address the bug.

Also, the -UL suffix is usually capitalized in our codebase.

	-hpa



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24 17:07   ` [PATCH] x86: make DR*_RESERVED unsigned long Oleg Nesterov
  2013-04-24 18:45     ` H. Peter Anvin
@ 2013-04-24 22:48     ` Frederic Weisbecker
  2013-04-24 23:06       ` H. Peter Anvin
  1 sibling, 1 reply; 41+ messages in thread
From: Frederic Weisbecker @ 2013-04-24 22:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	H. Peter Anvin, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

2013/4/24 Oleg Nesterov <oleg@redhat.com>:
> DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
> bits in the "unsigned long" data, make them long to ensure that
> "&~" doesn't clear the upper bits.
>
> do_debug() and ptrace_write_dr7() which use DR*_RESERVED look
> safe, but probably it makes sense to cleanup anyway.

Agreed. The code looks safe, but the pattern is error prone. I'm all
for that cleanup.
Just something below:

>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/include/uapi/asm/debugreg.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
> index 3c0874d..75d07dd 100644
> --- a/arch/x86/include/uapi/asm/debugreg.h
> +++ b/arch/x86/include/uapi/asm/debugreg.h
> @@ -15,7 +15,7 @@
>     are either reserved or not of interest to us. */
>
>  /* Define reserved bits in DR6 which are always set to 1 */
> -#define DR6_RESERVED   (0xFFFF0FF0)
> +#define DR6_RESERVED   (0xFFFF0FF0ul)

You told in an earlier email that intel manual says upper 32 bits of
dr6 are reserved.
In this case don't we need to expand the mask in 64 bits like is done
for DR_CONTROL_RESERVED?

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24 22:48     ` [PATCH] " Frederic Weisbecker
@ 2013-04-24 23:06       ` H. Peter Anvin
  2013-04-24 23:31         ` Frederic Weisbecker
  0 siblings, 1 reply; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-24 23:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Cyrill Gorcunov,
	Peter Zijlstra, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On 04/24/2013 03:48 PM, Frederic Weisbecker wrote:
> 2013/4/24 Oleg Nesterov <oleg@redhat.com>:
>> DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
>> bits in the "unsigned long" data, make them long to ensure that
>> "&~" doesn't clear the upper bits.
>>
>> do_debug() and ptrace_write_dr7() which use DR*_RESERVED look
>> safe, but probably it makes sense to cleanup anyway.
> 
> Agreed. The code looks safe, but the pattern is error prone. I'm all
> for that cleanup.
> Just something below:
> 
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> ---
>>  arch/x86/include/uapi/asm/debugreg.h |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
>> index 3c0874d..75d07dd 100644
>> --- a/arch/x86/include/uapi/asm/debugreg.h
>> +++ b/arch/x86/include/uapi/asm/debugreg.h
>> @@ -15,7 +15,7 @@
>>     are either reserved or not of interest to us. */
>>
>>  /* Define reserved bits in DR6 which are always set to 1 */
>> -#define DR6_RESERVED   (0xFFFF0FF0)
>> +#define DR6_RESERVED   (0xFFFF0FF0ul)
> 
> You told in an earlier email that intel manual says upper 32 bits of
> dr6 are reserved.
> In this case don't we need to expand the mask in 64 bits like is done
> for DR_CONTROL_RESERVED?
> 

Arguably this would be a *good* use for ~ ...

Instead of defining separate bitmasks for 32 and 64 bits have the
reciprocal (non-reserved bits):

#define DR6_RESERVED (~0x0000F00FUL)

That does have the right value on both 32 and 64 bits.  The leading
zeroes aren't even really needed.

Now, DR6 is a bit special in that a bunch of the reserved bits are
hardwired to 1, not 0; I don't know offhand if that is true for bits
[63:32].

	-hpa



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24 23:06       ` H. Peter Anvin
@ 2013-04-24 23:31         ` Frederic Weisbecker
  2013-04-25  1:20           ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Frederic Weisbecker @ 2013-04-24 23:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Cyrill Gorcunov,
	Peter Zijlstra, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

2013/4/25 H. Peter Anvin <hpa@zytor.com>:
> On 04/24/2013 03:48 PM, Frederic Weisbecker wrote:
>> You told in an earlier email that intel manual says upper 32 bits of
>> dr6 are reserved.
>> In this case don't we need to expand the mask in 64 bits like is done
>> for DR_CONTROL_RESERVED?
>>
>
> Arguably this would be a *good* use for ~ ...
>
> Instead of defining separate bitmasks for 32 and 64 bits have the
> reciprocal (non-reserved bits):
>
> #define DR6_RESERVED (~0x0000F00FUL)
>
> That does have the right value on both 32 and 64 bits.  The leading
> zeroes aren't even really needed.

Ah, looks better indeed.

>
> Now, DR6 is a bit special in that a bunch of the reserved bits are
> hardwired to 1, not 0; I don't know offhand if that is true for bits
> [63:32].

Hmm, good point, could it be a problem given that we clear the
reserved dr6 bits on do_trap() and write that 'cleaned up" value back
to "tsk->thread.debugreg6"? Probably not if those hardwired reserved
bits are set to "1" on dr6 physical write whether those bits are
cleared or not in their storage in thread struct before resuming the
task?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Unsigned widening casts of binary "not" operations..
  2013-04-24  7:47   ` Cyrill Gorcunov
@ 2013-04-25  1:13     ` Lin Ming
  0 siblings, 0 replies; 41+ messages in thread
From: Lin Ming @ 2013-04-25  1:13 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, H. Peter Anvin,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Oleg Nesterov,
	Frederic Weisbecker

On Wed, Apr 24, 2013 at 3:47 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Wed, Apr 24, 2013 at 09:26:30AM +0200, Ingo Molnar wrote:
>>
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> > And there's quite a lot of them. Even in my (fairly small) config I use on
>> > my desktop. And the first warnings I see are in x86 code:
>> >
>> >     arch/x86/kernel/traps.c:405:16: warning: implicit unsigned widening
>> > cast of a '~' expression
>> >     arch/x86/kernel/cpu/perf_event_p4.c:912:15: warning: implicit unsigned
>> > widening cast of a '~' expression
>>
>> Hm, the perf_event_p4.c code is indeed confused.
>>
>> I think the bug is real but probably benign in effect: we allow narrower
>> values into the MSR register than probably intended. Only a couple of low
>> bits are reserved AFAICS.
>>
>> Here's an (untested!) patch that tries to untangle it all: it just moves
>> to clean 64-bit types everywhere - these MSRs are 64-bit wide regardless
>> of whether we run on 32-bit or not.
>>
>> Would be nice if someone with a working P4 could test it - Cyrill? [It
>> should also be double checked whether the high words are really not
>> reserved and can be written to ...]
>
> Hi Ingo! Ufortunately I don't have access to real p4 hardware,
> thus I'm CC'ing Ming who has been helping a lot in testing
> this code pieces.

Hi,

Sorry, but I don't have access to p4 hardware either.

Lin Ming

>
> Still the patch itself is perfectly fine to me
>
> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24 23:31         ` Frederic Weisbecker
@ 2013-04-25  1:20           ` H. Peter Anvin
  0 siblings, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-25  1:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Cyrill Gorcunov,
	Peter Zijlstra, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On 04/24/2013 04:31 PM, Frederic Weisbecker wrote:
>>
>> Now, DR6 is a bit special in that a bunch of the reserved bits are
>> hardwired to 1, not 0; I don't know offhand if that is true for bits
>> [63:32].
> 
> Hmm, good point, could it be a problem given that we clear the
> reserved dr6 bits on do_trap() and write that 'cleaned up" value back
> to "tsk->thread.debugreg6"? Probably not if those hardwired reserved
> bits are set to "1" on dr6 physical write whether those bits are
> cleared or not in their storage in thread struct before resuming the
> task?
> 

OK, the SDM states that DR6[63:32] are reserved and must be written as
zero (not one).

So the quiescent 64-bit value of DR6 is 0x0000_0000_FFFF_0FF0.

	-hpa


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] x86: make DR*_RESERVED unsigned long
  2013-04-24 18:45     ` H. Peter Anvin
@ 2013-04-25 14:48       ` Oleg Nesterov
  2013-04-26 16:38         ` [PATCH v2] " Oleg Nesterov
  0 siblings, 1 reply; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-25 14:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

On 04/24, H. Peter Anvin wrote:
>
> On 04/24/2013 10:07 AM, Oleg Nesterov wrote:
> > DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
> > bits in the "unsigned long" data, make them long to ensure that
> > "&~" doesn't clear the upper bits.
> >
> > do_debug() and ptrace_write_dr7() which use DR*_RESERVED look
> > safe, but probably it makes sense to cleanup anyway.
> >
> > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Can you please put in the description if this is a manifest or
> non-manifest bug and, if manifest, what the issues are?  It greatly
> affects what we otherwise have to do to address the bug.

Sorry if it was not clear, I tried to explain in the changelog that
this is only cleanup. Yes, dr6 should have all zeroes in 32-64 so
the usage of DR6_RESERVED is safe.

> Also, the -UL suffix is usually capitalized in our codebase.

OK, iiuc otherwise you agree with this change. I'll send v2.

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-24  7:26 ` Ingo Molnar
  2013-04-24  7:47   ` Cyrill Gorcunov
  2013-04-24 17:07   ` [PATCH] x86: make DR*_RESERVED unsigned long Oleg Nesterov
@ 2013-04-26 14:20   ` tip-bot for Ingo Molnar
  2013-04-26 16:13     ` Borislav Petkov
  2 siblings, 1 reply; 41+ messages in thread
From: tip-bot for Ingo Molnar @ 2013-04-26 14:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, torvalds, gorcunov,
	tytso, davem, fweisbec, tglx, oleg

Commit-ID:  5ac2b5c2721501a8f5c5e1cd4116cbc31ace6886
Gitweb:     http://git.kernel.org/tip/5ac2b5c2721501a8f5c5e1cd4116cbc31ace6886
Author:     Ingo Molnar <mingo@kernel.org>
AuthorDate: Wed, 24 Apr 2013 09:26:30 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 26 Apr 2013 09:31:41 +0200

perf/x86/intel/P4: Robistify P4 PMU types

Linus found, while extending integer type extension checks in the
sparse static code checker, various fragile patterns of mixed
signed/unsigned  64-bit/32-bit integer use in perf_events_p4.c.

The relevant hardware register ABI is 64 bit wide on 32-bit
kernels as  well, so clean it all up a bit, remove unnecessary
casts, and make sure we  use 64-bit unsigned integers in these
places.

[ Unfortunately this patch was not tested on real P4 hardware,
  those are pretty rare already. If this patch causes any
  problems on P4 hardware then please holler ... ]

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/perf_event_p4.h | 62 ++++++++++++++++++------------------
 arch/x86/kernel/cpu/perf_event_p4.c  |  9 +++---
 2 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/perf_event_p4.h b/arch/x86/include/asm/perf_event_p4.h
index 4f7e67e..85e13cc 100644
--- a/arch/x86/include/asm/perf_event_p4.h
+++ b/arch/x86/include/asm/perf_event_p4.h
@@ -24,45 +24,45 @@
 #define ARCH_P4_CNTRVAL_MASK	((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
 #define ARCH_P4_UNFLAGGED_BIT	((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))
 
-#define P4_ESCR_EVENT_MASK	0x7e000000U
+#define P4_ESCR_EVENT_MASK	0x7e000000ULL
 #define P4_ESCR_EVENT_SHIFT	25
-#define P4_ESCR_EVENTMASK_MASK	0x01fffe00U
+#define P4_ESCR_EVENTMASK_MASK	0x01fffe00ULL
 #define P4_ESCR_EVENTMASK_SHIFT	9
-#define P4_ESCR_TAG_MASK	0x000001e0U
+#define P4_ESCR_TAG_MASK	0x000001e0ULL
 #define P4_ESCR_TAG_SHIFT	5
-#define P4_ESCR_TAG_ENABLE	0x00000010U
-#define P4_ESCR_T0_OS		0x00000008U
-#define P4_ESCR_T0_USR		0x00000004U
-#define P4_ESCR_T1_OS		0x00000002U
-#define P4_ESCR_T1_USR		0x00000001U
+#define P4_ESCR_TAG_ENABLE	0x00000010ULL
+#define P4_ESCR_T0_OS		0x00000008ULL
+#define P4_ESCR_T0_USR		0x00000004ULL
+#define P4_ESCR_T1_OS		0x00000002ULL
+#define P4_ESCR_T1_USR		0x00000001ULL
 
 #define P4_ESCR_EVENT(v)	((v) << P4_ESCR_EVENT_SHIFT)
 #define P4_ESCR_EMASK(v)	((v) << P4_ESCR_EVENTMASK_SHIFT)
 #define P4_ESCR_TAG(v)		((v) << P4_ESCR_TAG_SHIFT)
 
-#define P4_CCCR_OVF			0x80000000U
-#define P4_CCCR_CASCADE			0x40000000U
-#define P4_CCCR_OVF_PMI_T0		0x04000000U
-#define P4_CCCR_OVF_PMI_T1		0x08000000U
-#define P4_CCCR_FORCE_OVF		0x02000000U
-#define P4_CCCR_EDGE			0x01000000U
-#define P4_CCCR_THRESHOLD_MASK		0x00f00000U
+#define P4_CCCR_OVF			0x80000000ULL
+#define P4_CCCR_CASCADE			0x40000000ULL
+#define P4_CCCR_OVF_PMI_T0		0x04000000ULL
+#define P4_CCCR_OVF_PMI_T1		0x08000000ULL
+#define P4_CCCR_FORCE_OVF		0x02000000ULL
+#define P4_CCCR_EDGE			0x01000000ULL
+#define P4_CCCR_THRESHOLD_MASK		0x00f00000ULL
 #define P4_CCCR_THRESHOLD_SHIFT		20
-#define P4_CCCR_COMPLEMENT		0x00080000U
-#define P4_CCCR_COMPARE			0x00040000U
-#define P4_CCCR_ESCR_SELECT_MASK	0x0000e000U
+#define P4_CCCR_COMPLEMENT		0x00080000ULL
+#define P4_CCCR_COMPARE			0x00040000ULL
+#define P4_CCCR_ESCR_SELECT_MASK	0x0000e000ULL
 #define P4_CCCR_ESCR_SELECT_SHIFT	13
-#define P4_CCCR_ENABLE			0x00001000U
-#define P4_CCCR_THREAD_SINGLE		0x00010000U
-#define P4_CCCR_THREAD_BOTH		0x00020000U
-#define P4_CCCR_THREAD_ANY		0x00030000U
-#define P4_CCCR_RESERVED		0x00000fffU
+#define P4_CCCR_ENABLE			0x00001000ULL
+#define P4_CCCR_THREAD_SINGLE		0x00010000ULL
+#define P4_CCCR_THREAD_BOTH		0x00020000ULL
+#define P4_CCCR_THREAD_ANY		0x00030000ULL
+#define P4_CCCR_RESERVED		0x00000fffULL
 
 #define P4_CCCR_THRESHOLD(v)		((v) << P4_CCCR_THRESHOLD_SHIFT)
 #define P4_CCCR_ESEL(v)			((v) << P4_CCCR_ESCR_SELECT_SHIFT)
 
 #define P4_GEN_ESCR_EMASK(class, name, bit)	\
-	class##__##name = ((1 << bit) << P4_ESCR_EVENTMASK_SHIFT)
+	class##__##name = ((1ULL << bit) << P4_ESCR_EVENTMASK_SHIFT)
 #define P4_ESCR_EMASK_BIT(class, name)		class##__##name
 
 /*
@@ -107,7 +107,7 @@
  * P4_PEBS_CONFIG_MASK and related bits on
  * modification.)
  */
-#define P4_CONFIG_ALIASABLE		(1 << 9)
+#define P4_CONFIG_ALIASABLE		(1ULL << 9)
 
 /*
  * The bits we allow to pass for RAW events
@@ -784,17 +784,17 @@ enum P4_ESCR_EMASKS {
  * Note we have UOP and PEBS bits reserved for now
  * just in case if we will need them once
  */
-#define P4_PEBS_CONFIG_ENABLE		(1 << 7)
-#define P4_PEBS_CONFIG_UOP_TAG		(1 << 8)
-#define P4_PEBS_CONFIG_METRIC_MASK	0x3f
-#define P4_PEBS_CONFIG_MASK		0xff
+#define P4_PEBS_CONFIG_ENABLE		(1ULL << 7)
+#define P4_PEBS_CONFIG_UOP_TAG		(1ULL << 8)
+#define P4_PEBS_CONFIG_METRIC_MASK	0x3FLL
+#define P4_PEBS_CONFIG_MASK		0xFFLL
 
 /*
  * mem: Only counters MSR_IQ_COUNTER4 (16) and
  * MSR_IQ_COUNTER5 (17) are allowed for PEBS sampling
  */
-#define P4_PEBS_ENABLE			0x02000000U
-#define P4_PEBS_ENABLE_UOP_TAG		0x01000000U
+#define P4_PEBS_ENABLE			0x02000000ULL
+#define P4_PEBS_ENABLE_UOP_TAG		0x01000000ULL
 
 #define p4_config_unpack_metric(v)	(((u64)(v)) & P4_PEBS_CONFIG_METRIC_MASK)
 #define p4_config_unpack_pebs(v)	(((u64)(v)) & P4_PEBS_CONFIG_MASK)
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 92c7e39..3486e66 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -895,8 +895,8 @@ static void p4_pmu_disable_pebs(void)
 	 * So at moment let leave metrics turned on forever -- it's
 	 * ok for now but need to be revisited!
 	 *
-	 * (void)wrmsrl_safe(MSR_IA32_PEBS_ENABLE, (u64)0);
-	 * (void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT, (u64)0);
+	 * (void)wrmsrl_safe(MSR_IA32_PEBS_ENABLE, 0);
+	 * (void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT, 0);
 	 */
 }
 
@@ -910,8 +910,7 @@ static inline void p4_pmu_disable_event(struct perf_event *event)
 	 * asserted again and again
 	 */
 	(void)wrmsrl_safe(hwc->config_base,
-		(u64)(p4_config_unpack_cccr(hwc->config)) &
-			~P4_CCCR_ENABLE & ~P4_CCCR_OVF & ~P4_CCCR_RESERVED);
+		p4_config_unpack_cccr(hwc->config) & ~P4_CCCR_ENABLE & ~P4_CCCR_OVF & ~P4_CCCR_RESERVED);
 }
 
 static void p4_pmu_disable_all(void)
@@ -957,7 +956,7 @@ static void p4_pmu_enable_event(struct perf_event *event)
 	u64 escr_addr, cccr;
 
 	bind = &p4_event_bind_map[idx];
-	escr_addr = (u64)bind->escr_msr[thread];
+	escr_addr = bind->escr_msr[thread];
 
 	/*
 	 * - we dont support cascaded counters yet

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-26 14:20   ` [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types tip-bot for Ingo Molnar
@ 2013-04-26 16:13     ` Borislav Petkov
  2013-04-26 16:24       ` Cyrill Gorcunov
  0 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2013-04-26 16:13 UTC (permalink / raw)
  To: mingo
  Cc: hpa, linux-kernel, torvalds, a.p.zijlstra, tytso, gorcunov,
	davem, fweisbec, oleg, tglx, linux-tip-commits

On Fri, Apr 26, 2013 at 07:20:46AM -0700, tip-bot for Ingo Molnar wrote:
> Commit-ID:  5ac2b5c2721501a8f5c5e1cd4116cbc31ace6886
> Gitweb:     http://git.kernel.org/tip/5ac2b5c2721501a8f5c5e1cd4116cbc31ace6886
> Author:     Ingo Molnar <mingo@kernel.org>
> AuthorDate: Wed, 24 Apr 2013 09:26:30 +0200
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Fri, 26 Apr 2013 09:31:41 +0200
> 
> perf/x86/intel/P4: Robistify P4 PMU types
> 
> Linus found, while extending integer type extension checks in the
> sparse static code checker, various fragile patterns of mixed
> signed/unsigned  64-bit/32-bit integer use in perf_events_p4.c.
> 
> The relevant hardware register ABI is 64 bit wide on 32-bit
> kernels as  well, so clean it all up a bit, remove unnecessary
> casts, and make sure we  use 64-bit unsigned integers in these
> places.
> 
> [ Unfortunately this patch was not tested on real P4 hardware,
>   those are pretty rare already. If this patch causes any
>   problems on P4 hardware then please holler ... ]

I have a P4 and I turn it on from time to time when it's cold in the
room :-).

Any particular tests you want me to run on it?

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.60GHz

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-26 16:13     ` Borislav Petkov
@ 2013-04-26 16:24       ` Cyrill Gorcunov
  2013-04-26 16:39         ` Borislav Petkov
  0 siblings, 1 reply; 41+ messages in thread
From: Cyrill Gorcunov @ 2013-04-26 16:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mingo, hpa, linux-kernel, torvalds, a.p.zijlstra, tytso, davem,
	fweisbec, oleg, tglx, linux-tip-commits

On Fri, Apr 26, 2013 at 06:13:28PM +0200, Borislav Petkov wrote:
> 
> I have a P4 and I turn it on from time to time when it's cold in the
> room :-).
> 
> Any particular tests you want me to run on it?
> 
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 2
> model name      : Intel(R) Pentium(R) 4 CPU 2.60GHz

Cool! Easiest way I think is to run "perf top" (iirc that's the
command for top-like output). And if possible some branch instructions
testing.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-25 14:48       ` Oleg Nesterov
@ 2013-04-26 16:38         ` Oleg Nesterov
  2013-04-26 16:44           ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-26 16:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
bits in the "unsigned long" data, make them long to ensure that
"&~" doesn't clear the upper bits.

This is only cleanup, the usage of ~DR*_RESERVED is safe but
doesn't look clean and the pattern is error prone.

	- do_debug:

		dr6 &= ~DR6_RESERVED;

	  this also wrongly clears 32-63 bits. Fortunately these
	  bits are reserved and must be zero.

	- ptrace_write_dr7:

		data &= ~DR_CONTROL_RESERVED;

	  on __i386__ this mixes long/int but sizeof should be the
	  same.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/uapi/asm/debugreg.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index 3c0874d..c0c1b89 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -15,7 +15,7 @@
    are either reserved or not of interest to us. */
 
 /* Define reserved bits in DR6 which are always set to 1 */
-#define DR6_RESERVED	(0xFFFF0FF0)
+#define DR6_RESERVED	(0xFFFF0FF0UL)
 
 #define DR_TRAP0	(0x1)		/* db0 */
 #define DR_TRAP1	(0x2)		/* db1 */
@@ -65,7 +65,7 @@
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
 #ifdef __i386__
-#define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
+#define DR_CONTROL_RESERVED (0xFC00UL) /* Reserved by Intel */
 #else
 #define DR_CONTROL_RESERVED (0xFFFFFFFF0000FC00UL) /* Reserved */
 #endif
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-26 16:24       ` Cyrill Gorcunov
@ 2013-04-26 16:39         ` Borislav Petkov
  2013-04-26 16:46           ` Cyrill Gorcunov
  0 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2013-04-26 16:39 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: mingo, hpa, linux-kernel, torvalds, a.p.zijlstra, tytso, davem,
	fweisbec, oleg, tglx, linux-tip-commits

On Fri, Apr 26, 2013 at 08:24:44PM +0400, Cyrill Gorcunov wrote:
> And if possible some branch instructions testing.

By that you mean, see whether it counts branches?

./perf stat sleep 0

 Performance counter stats for 'sleep 0':

          2.483869 task-clock                #    0.463 CPUs utilized
                 1 context-switches          #    0.403 K/sec
                 0 cpu-migrations            #    0.000 K/sec
               168 page-faults               #    0.068 M/sec
         3,435,630 cycles                    #    1.383 GHz
           736,680 stalled-cycles-frontend   #   21.44% frontend cycles idle
           656,603 stalled-cycles-backend    #   19.11% backend  cycles idle
         2,932,775 instructions              #    0.85  insns per cycle
                                             #    0.25  stalled cycles per insn
           590,855 branches                  #  237.877 M/sec
            12,882 branch-misses             #    2.18% of all branches
	    ^^^^^^^^^^^^^^^^^^^^

Those above?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-26 16:38         ` [PATCH v2] " Oleg Nesterov
@ 2013-04-26 16:44           ` H. Peter Anvin
  2013-04-26 17:15             ` Oleg Nesterov
  0 siblings, 1 reply; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-26 16:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

On 04/26/2013 09:38 AM, Oleg Nesterov wrote:
> DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the set
> bits in the "unsigned long" data, make them long to ensure that
> "&~" doesn't clear the upper bits.
> 
> This is only cleanup, the usage of ~DR*_RESERVED is safe but
> doesn't look clean and the pattern is error prone.
> 
> 	- do_debug:
> 
> 		dr6 &= ~DR6_RESERVED;
> 
> 	  this also wrongly clears 32-63 bits. Fortunately these
> 	  bits are reserved and must be zero.
> 

I don't think this is wrongly at all.  The whole point is to mask out
the bits that the handler doesn't want to deal with, so masking out the
reserved bits [63:32] seems reasonable to me.

The comment should probably be corrected, though.

	-hpa



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-26 16:39         ` Borislav Petkov
@ 2013-04-26 16:46           ` Cyrill Gorcunov
  2013-04-27 16:14             ` Borislav Petkov
  0 siblings, 1 reply; 41+ messages in thread
From: Cyrill Gorcunov @ 2013-04-26 16:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mingo, hpa, linux-kernel, torvalds, a.p.zijlstra, tytso, davem,
	fweisbec, oleg, tglx, linux-tip-commits

On Fri, Apr 26, 2013 at 06:39:52PM +0200, Borislav Petkov wrote:
>                                              #    0.25  stalled cycles per insn
>            590,855 branches                  #  237.877 M/sec
>             12,882 branch-misses             #    2.18% of all branches
> 	    ^^^^^^^^^^^^^^^^^^^^
> 
> Those above?

Yeah, thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-26 16:44           ` H. Peter Anvin
@ 2013-04-26 17:15             ` Oleg Nesterov
  2013-04-27 14:45               ` Oleg Nesterov
  0 siblings, 1 reply; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-26 17:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

On 04/26, H. Peter Anvin wrote:
>
> On 04/26/2013 09:38 AM, Oleg Nesterov wrote:
> >
> > 	- do_debug:
> >
> > 		dr6 &= ~DR6_RESERVED;
> >
> > 	  this also wrongly clears 32-63 bits. Fortunately these
> > 	  bits are reserved and must be zero.
>
> I don't think this is wrongly at all.

OK, I meant that it also clears the bits that are not specified in
DR6_RESERVED mask.

> The whole point is to mask out
> the bits that the handler doesn't want to deal with, so masking out the
> reserved bits [63:32] seems reasonable to me.

Then we should do

	- #define DR6_RESERVED    0xFFFF0FF0
	+ #define DR6_RESERVED    0xFFFFFFFFFFFF0FF0

?

or what? (just in case, I will happily agree with "do nothing" ;)

> The comment should probably be corrected, though.

Which one?

	/* Define reserved bits in DR6 which are always set to 1 */
	#define DR6_RESERVED    (0xFFFF0FF0UL)

	/* Filter out all the reserved bits which are preset to 1 */
	dr6 &= ~DR6_RESERVED;

I guess both should be updated then. But if I read the doc correctly
the lower reserved bits are set to 1.

However do_debug() does set_debugreg(0, 6) and this looks correct, the
doc says "debug handlers should clear the register before returning to
the interrupted task".

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-26 17:15             ` Oleg Nesterov
@ 2013-04-27 14:45               ` Oleg Nesterov
  2013-04-27 16:20                 ` H. Peter Anvin
  2013-04-28  0:58                 ` Frederic Weisbecker
  0 siblings, 2 replies; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-27 14:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

On 04/26, Oleg Nesterov wrote:

> On 04/26, H. Peter Anvin wrote:
> >
> > On 04/26/2013 09:38 AM, Oleg Nesterov wrote:
> > >
> > > 	- do_debug:
> > >
> > > 		dr6 &= ~DR6_RESERVED;
> > >
> > > 	  this also wrongly clears 32-63 bits. Fortunately these
> > > 	  bits are reserved and must be zero.
> >
> > I don't think this is wrongly at all.
>
> OK, I meant that it also clears the bits that are not specified in
> DR6_RESERVED mask.
>
> > The whole point is to mask out
> > the bits that the handler doesn't want to deal with, so masking out the
> > reserved bits [63:32] seems reasonable to me.
>
> Then we should do
>
> 	- #define DR6_RESERVED    0xFFFF0FF0
> 	+ #define DR6_RESERVED    0xFFFFFFFFFFFF0FF0
>
> ?
>
> or what? (just in case, I will happily agree with "do nothing" ;)

Or we can do the s/reserved/mask/ change and avoid any "unexpected"
effect of "long &= ~int". This allso allows to kill ifdef(__i386__).

But this is include/uapi, I do not know if I can simply remove the
old define's.

In short: whatever you prefer, including "leave it alone".

Oleg.

diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index 3c0874d..2678b23 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -14,8 +14,7 @@
    which debugging register was responsible for the trap.  The other bits
    are either reserved or not of interest to us. */
 
-/* Define reserved bits in DR6 which are always set to 1 */
-#define DR6_RESERVED	(0xFFFF0FF0)
+#define DR6_MASK	(0xF00FU)	/* Everything else is reserved */
 
 #define DR_TRAP0	(0x1)		/* db0 */
 #define DR_TRAP1	(0x2)		/* db1 */
@@ -32,6 +31,8 @@
    and indicates what types of access we trap on, and how large the data
    field is that we are looking at */
 
+#define DR_CONTROL_MASK (0xFFFF03FFU) /* Everything else is reserved */
+
 #define DR_CONTROL_SHIFT 16 /* Skip this many bits in ctl register */
 #define DR_CONTROL_SIZE 4   /* 4 control bits per register */
 
@@ -64,12 +65,6 @@
    We can slow the instruction pipeline for instructions coming via the
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
-#ifdef __i386__
-#define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
-#else
-#define DR_CONTROL_RESERVED (0xFFFFFFFF0000FC00UL) /* Reserved */
-#endif
-
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
 
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 7461f50..bc5fb98 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -657,7 +657,7 @@ static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
 	bool second_pass = false;
 	int i, rc, ret = 0;
 
-	data &= ~DR_CONTROL_RESERVED;
+	data &= DR_CONTROL_MASK;
 	old_dr7 = ptrace_get_dr7(thread->ptrace_bps);
 
 restore:
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 68bda7a..42a635f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -402,7 +402,7 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
 	get_debugreg(dr6, 6);
 
 	/* Filter out all the reserved bits which are preset to 1 */
-	dr6 &= ~DR6_RESERVED;
+	dr6 &= DR6_MASK;
 
 	/*
 	 * If dr6 has no reason to give us about the origin of this trap,


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-26 16:46           ` Cyrill Gorcunov
@ 2013-04-27 16:14             ` Borislav Petkov
  2013-04-27 16:33               ` Cyrill Gorcunov
  0 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2013-04-27 16:14 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: mingo, hpa, linux-kernel, torvalds, a.p.zijlstra, tytso, davem,
	fweisbec, oleg, tglx, linux-tip-commits

On Fri, Apr 26, 2013 at 08:46:52PM +0400, Cyrill Gorcunov wrote:
> On Fri, Apr 26, 2013 at 06:39:52PM +0200, Borislav Petkov wrote:
> >                                              #    0.25  stalled cycles per insn
> >            590,855 branches                  #  237.877 M/sec
> >             12,882 branch-misses             #    2.18% of all branches
> > 	    ^^^^^^^^^^^^^^^^^^^^
> > 
> > Those above?

Well, perf top looks ok to me, here's a snapshot:

   PerfTop:      63 irqs/sec  kernel:79.4%  exact:  0.0% [4000Hz cycles],  (all, 2 CPUs)
--------------------------------------------------------------------------------------------------------------------------------------------

    11.21%  [kernel]               [k] __lock_acquire                
     7.87%  libc-2.13.so           [.] 0x00078b0c                    
     5.78%  libz.so.1.2.7          [.] 0x00003731                    
     4.29%  libpthread-2.13.so     [.] pthread_rwlock_unlock         
     3.74%  libpthread-2.13.so     [.] pthread_rwlock_rdlock         
     3.67%  [kernel]               [k] lock_release                  
     2.55%  [kernel]               [k] lock_acquire                  
     2.27%  perf                   [.] symbols__insert               
     2.15%  sshd                   [.] 0x0004707e                    
     1.62%  libc-2.13.so           [.] vfprintf                      
     1.58%  [kernel]               [k] mark_held_locks               
     1.40%  [kernel]               [k] do_raw_spin_lock              
     1.37%  [kernel]               [k] trace_hardirqs_on_caller      
     1.29%  [kernel]               [k] sub_preempt_count             
     1.17%  perf                   [.] symbol_filter                 
     1.13%  [kernel]               [k] mark_lock                     
     1.05%  [kernel]               [k] trace_hardirqs_off_caller     
     0.96%  perf                   [.] rb_next                       
     0.94%  libc-2.13.so           [.] memchr                        
     0.80%  libbfd-2.22-system.so  [.] 0x000bb009                    
     0.72%  [kernel]               [k] __schedule                    
     0.71%  [kernel]               [k] ioread16                      
     0.67%  [kernel]               [k] _raw_spin_unlock_irqrestore   
     0.66%  [kernel]               [k] __switch_to                   
     0.59%  [kernel]               [k] do_raw_spin_unlock            
     0.56%  perf                   [.] dso__load_sym
...

I can annotate symbols and disassemble works fine too, along with
refresh and per-insn overhead.

The other trivial test passes too, although branch-misses doesn't get
counted:

./perf stat sleep 1

 Performance counter stats for 'sleep 1':

          1.433368 task-clock                #    0.001 CPUs utilized          
                 1 context-switches          #    0.698 K/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               147 page-faults               #    0.103 M/sec                  
            78,446 cycles                    #    0.055 GHz                    
                 0 stalled-cycles-frontend   #    0.00% frontend cycles idle   
                 0 stalled-cycles-backend    #    0.00% backend  cycles idle    [27.37%]
         1,268,044 instructions              #   16.16  insns per cycle         [27.37%]
           223,742 branches                  #  156.095 M/sec                   [27.37%]
     <not counted> branch-misses

       1.002191045 seconds time elapsed

However, if I do this, it works:

./perf stat -e branch-misses sleep 1

 Performance counter stats for 'sleep 1':

             8,583 branch-misses

       1.001992384 seconds time elapsed


Oh, btw, tip/master has

commit 697dfd884438058b15032b0169887c742704434a
Merge: 0fbd06761f5c f697036b93aa
Author: H. Peter Anvin <hpa@linux.intel.com>
Date:   Thu Apr 25 14:00:22 2013 -0700

    Merge tag 'efi-urgent' into x86/urgent

as its top commit.

HTH.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-27 14:45               ` Oleg Nesterov
@ 2013-04-27 16:20                 ` H. Peter Anvin
  2013-04-28  0:58                 ` Frederic Weisbecker
  1 sibling, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-27 16:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Linus Torvalds, Cyrill Gorcunov, Peter Zijlstra,
	Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4, Fr??d??ric Weisbecker

I don't know why this is uapi... finest make a lot of sense to me.

Oleg Nesterov <oleg@redhat.com> wrote:

>On 04/26, Oleg Nesterov wrote:
>
>> On 04/26, H. Peter Anvin wrote:
>> >
>> > On 04/26/2013 09:38 AM, Oleg Nesterov wrote:
>> > >
>> > > 	- do_debug:
>> > >
>> > > 		dr6 &= ~DR6_RESERVED;
>> > >
>> > > 	  this also wrongly clears 32-63 bits. Fortunately these
>> > > 	  bits are reserved and must be zero.
>> >
>> > I don't think this is wrongly at all.
>>
>> OK, I meant that it also clears the bits that are not specified in
>> DR6_RESERVED mask.
>>
>> > The whole point is to mask out
>> > the bits that the handler doesn't want to deal with, so masking out
>the
>> > reserved bits [63:32] seems reasonable to me.
>>
>> Then we should do
>>
>> 	- #define DR6_RESERVED    0xFFFF0FF0
>> 	+ #define DR6_RESERVED    0xFFFFFFFFFFFF0FF0
>>
>> ?
>>
>> or what? (just in case, I will happily agree with "do nothing" ;)
>
>Or we can do the s/reserved/mask/ change and avoid any "unexpected"
>effect of "long &= ~int". This allso allows to kill ifdef(__i386__).
>
>But this is include/uapi, I do not know if I can simply remove the
>old define's.
>
>In short: whatever you prefer, including "leave it alone".
>
>Oleg.
>
>diff --git a/arch/x86/include/uapi/asm/debugreg.h
>b/arch/x86/include/uapi/asm/debugreg.h
>index 3c0874d..2678b23 100644
>--- a/arch/x86/include/uapi/asm/debugreg.h
>+++ b/arch/x86/include/uapi/asm/debugreg.h
>@@ -14,8 +14,7 @@
> which debugging register was responsible for the trap.  The other bits
>    are either reserved or not of interest to us. */
> 
>-/* Define reserved bits in DR6 which are always set to 1 */
>-#define DR6_RESERVED	(0xFFFF0FF0)
>+#define DR6_MASK	(0xF00FU)	/* Everything else is reserved */
> 
> #define DR_TRAP0	(0x1)		/* db0 */
> #define DR_TRAP1	(0x2)		/* db1 */
>@@ -32,6 +31,8 @@
>  and indicates what types of access we trap on, and how large the data
>    field is that we are looking at */
> 
>+#define DR_CONTROL_MASK (0xFFFF03FFU) /* Everything else is reserved
>*/
>+
> #define DR_CONTROL_SHIFT 16 /* Skip this many bits in ctl register */
> #define DR_CONTROL_SIZE 4   /* 4 control bits per register */
> 
>@@ -64,12 +65,6 @@
>   We can slow the instruction pipeline for instructions coming via the
>gdt or the ldt if we want to.  I am not sure why this is an advantage
>*/
> 
>-#ifdef __i386__
>-#define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
>-#else
>-#define DR_CONTROL_RESERVED (0xFFFFFFFF0000FC00UL) /* Reserved */
>-#endif
>-
> #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
> #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
> 
>diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
>index 7461f50..bc5fb98 100644
>--- a/arch/x86/kernel/ptrace.c
>+++ b/arch/x86/kernel/ptrace.c
>@@ -657,7 +657,7 @@ static int ptrace_write_dr7(struct task_struct
>*tsk, unsigned long data)
> 	bool second_pass = false;
> 	int i, rc, ret = 0;
> 
>-	data &= ~DR_CONTROL_RESERVED;
>+	data &= DR_CONTROL_MASK;
> 	old_dr7 = ptrace_get_dr7(thread->ptrace_bps);
> 
> restore:
>diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>index 68bda7a..42a635f 100644
>--- a/arch/x86/kernel/traps.c
>+++ b/arch/x86/kernel/traps.c
>@@ -402,7 +402,7 @@ dotraplinkage void __kprobes do_debug(struct
>pt_regs *regs, long error_code)
> 	get_debugreg(dr6, 6);
> 
> 	/* Filter out all the reserved bits which are preset to 1 */
>-	dr6 &= ~DR6_RESERVED;
>+	dr6 &= DR6_MASK;
> 
> 	/*
> 	 * If dr6 has no reason to give us about the origin of this trap,

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types
  2013-04-27 16:14             ` Borislav Petkov
@ 2013-04-27 16:33               ` Cyrill Gorcunov
  0 siblings, 0 replies; 41+ messages in thread
From: Cyrill Gorcunov @ 2013-04-27 16:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mingo, hpa, linux-kernel, torvalds, a.p.zijlstra, tytso, davem,
	fweisbec, oleg, tglx, linux-tip-commits

On Sat, Apr 27, 2013 at 06:14:07PM +0200, Borislav Petkov wrote:
...

Thanks a LOT, Borislav!

> The other trivial test passes too, although branch-misses doesn't get
> counted:
> 
> ./perf stat sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>           1.433368 task-clock                #    0.001 CPUs utilized          
>                  1 context-switches          #    0.698 K/sec                  
>                  0 cpu-migrations            #    0.000 K/sec                  
>                147 page-faults               #    0.103 M/sec                  
>             78,446 cycles                    #    0.055 GHz                    
>                  0 stalled-cycles-frontend   #    0.00% frontend cycles idle   
>                  0 stalled-cycles-backend    #    0.00% backend  cycles idle    [27.37%]
>          1,268,044 instructions              #   16.16  insns per cycle         [27.37%]
>            223,742 branches                  #  156.095 M/sec                   [27.37%]
>      <not counted> branch-misses
> 
>        1.002191045 seconds time elapsed

It's known effect. instructions retired and branch-misses can't
run simultaneously since they both use same escr msr register
(if I recall correctly, out of repo at moment).

> However, if I do this, it works:
> 
> ./perf stat -e branch-misses sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>              8,583 branch-misses
> 
>        1.001992384 seconds time elapsed

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-27 14:45               ` Oleg Nesterov
  2013-04-27 16:20                 ` H. Peter Anvin
@ 2013-04-28  0:58                 ` Frederic Weisbecker
  2013-04-28 17:27                   ` Oleg Nesterov
  1 sibling, 1 reply; 41+ messages in thread
From: Frederic Weisbecker @ 2013-04-28  0:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: H. Peter Anvin, Ingo Molnar, Linus Torvalds, Cyrill Gorcunov,
	Peter Zijlstra, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On Sat, Apr 27, 2013 at 04:45:37PM +0200, Oleg Nesterov wrote:
> On 04/26, Oleg Nesterov wrote:
> 
> > On 04/26, H. Peter Anvin wrote:
> > >
> > > On 04/26/2013 09:38 AM, Oleg Nesterov wrote:
> > > >
> > > > 	- do_debug:
> > > >
> > > > 		dr6 &= ~DR6_RESERVED;
> > > >
> > > > 	  this also wrongly clears 32-63 bits. Fortunately these
> > > > 	  bits are reserved and must be zero.
> > >
> > > I don't think this is wrongly at all.
> >
> > OK, I meant that it also clears the bits that are not specified in
> > DR6_RESERVED mask.
> >
> > > The whole point is to mask out
> > > the bits that the handler doesn't want to deal with, so masking out the
> > > reserved bits [63:32] seems reasonable to me.
> >
> > Then we should do
> >
> > 	- #define DR6_RESERVED    0xFFFF0FF0
> > 	+ #define DR6_RESERVED    0xFFFFFFFFFFFF0FF0
> >
> > ?
> >
> > or what? (just in case, I will happily agree with "do nothing" ;)
> 
> Or we can do the s/reserved/mask/ change and avoid any "unexpected"
> effect of "long &= ~int". This allso allows to kill ifdef(__i386__).
> 
> But this is include/uapi, I do not know if I can simply remove the
> old define's.
> 
> In short: whatever you prefer, including "leave it alone".
> 
> Oleg.
> 
> diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
> index 3c0874d..2678b23 100644
> --- a/arch/x86/include/uapi/asm/debugreg.h
> +++ b/arch/x86/include/uapi/asm/debugreg.h
> @@ -14,8 +14,7 @@
>     which debugging register was responsible for the trap.  The other bits
>     are either reserved or not of interest to us. */
>  
> -/* Define reserved bits in DR6 which are always set to 1 */
> -#define DR6_RESERVED	(0xFFFF0FF0)
> +#define DR6_MASK	(0xF00FU)	/* Everything else is reserved */

I'm personally fine either with that or with Peter's suggestion to do:

-#define DR6_RESERVED (0xFFFF0FF0)
+#define DR6_RESERVED (~0xF00FUL)

If this should stay stable UAPI, we probably want Peter's solution. Otherwise
I really don't mind.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-28  0:58                 ` Frederic Weisbecker
@ 2013-04-28 17:27                   ` Oleg Nesterov
  2013-04-28 17:32                     ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-28 17:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: H. Peter Anvin, Ingo Molnar, Linus Torvalds, Cyrill Gorcunov,
	Peter Zijlstra, Thomas Gleixner, David Miller, Theodore Ts'o,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Network Development, linux-ext4

On 04/28, Frederic Weisbecker wrote:
>
> On Sat, Apr 27, 2013 at 04:45:37PM +0200, Oleg Nesterov wrote:
> >
> > -/* Define reserved bits in DR6 which are always set to 1 */
> > -#define DR6_RESERVED	(0xFFFF0FF0)
> > +#define DR6_MASK	(0xF00FU)	/* Everything else is reserved */
>
> I'm personally fine either with that or with Peter's suggestion to do:
>
> -#define DR6_RESERVED (0xFFFF0FF0)
> +#define DR6_RESERVED (~0xF00FUL)

I missed this suggestion...

Yes, and this allows to kill ifdef too.

> If this should stay stable UAPI,

I do not know, but I guess it would be safer to keep the old define's.

> I really don't mind.

Oh, I do not mind too ;)

OK, please see v3.

------------------------------------------------------------------------------
Subject: [PATCH v3] x86: make DR*_RESERVED unsigned long

DR6_RESERVED and DR_CONTROL_RESERVED are used to clear the unwanted
bits in the "unsigned long" data, but "ulong &= ~int" also clears the
upper bits that are not specified in mask.

This is actually fine, dr6[32:63] are reserved, but this is not clear
so it would be better to make them "unsigned long" to cleanup the code.

However, depending on sizeof(long), DR6_RESERVED should be either
0xFFFF0FF0 or 0xFFFFFFFF_FFFF0FF0, so this patch redefines them as
(~ 32_bit_mask UL) to avoid ifdef's.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/uapi/asm/debugreg.h |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index 3c0874d..4ff5d05 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -15,7 +15,7 @@
    are either reserved or not of interest to us. */
 
 /* Define reserved bits in DR6 which are always set to 1 */
-#define DR6_RESERVED	(0xFFFF0FF0)
+#define DR6_RESERVED	(~0xF00FUL)
 
 #define DR_TRAP0	(0x1)		/* db0 */
 #define DR_TRAP1	(0x2)		/* db1 */
@@ -64,11 +64,7 @@
    We can slow the instruction pipeline for instructions coming via the
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
-#ifdef __i386__
-#define DR_CONTROL_RESERVED (0xFC00) /* Reserved by Intel */
-#else
-#define DR_CONTROL_RESERVED (0xFFFFFFFF0000FC00UL) /* Reserved */
-#endif
+#define DR_CONTROL_RESERVED (~0xFFFF03FFUL) /* Reserved by Intel */
 
 #define DR_LOCAL_SLOWDOWN (0x100)   /* Local slow the pipeline */
 #define DR_GLOBAL_SLOWDOWN (0x200)  /* Global slow the pipeline */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-28 17:27                   ` Oleg Nesterov
@ 2013-04-28 17:32                     ` H. Peter Anvin
  2013-04-28 17:39                       ` Oleg Nesterov
  0 siblings, 1 reply; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-28 17:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Frederic Weisbecker, Ingo Molnar, Linus Torvalds,
	Cyrill Gorcunov, Peter Zijlstra, Thomas Gleixner, David Miller,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On 04/28/2013 10:27 AM, Oleg Nesterov wrote:
> On 04/28, Frederic Weisbecker wrote:
>>
>> On Sat, Apr 27, 2013 at 04:45:37PM +0200, Oleg Nesterov wrote:
>>>
>>> -/* Define reserved bits in DR6 which are always set to 1 */
>>> -#define DR6_RESERVED	(0xFFFF0FF0)
>>> +#define DR6_MASK	(0xF00FU)	/* Everything else is reserved */
>>
>> I'm personally fine either with that or with Peter's suggestion to do:
>>
>> -#define DR6_RESERVED (0xFFFF0FF0)
>> +#define DR6_RESERVED (~0xF00FUL)
> 
> I missed this suggestion...
> 
> Yes, and this allows to kill ifdef too.
> 
>> If this should stay stable UAPI,
> 
> I do not know, but I guess it would be safer to keep the old define's.
> 
>> I really don't mind.
> 
> Oh, I do not mind too ;)
> 
> OK, please see v3.
> 

Looks good.  However, given the timing, I would think this is 3.11
material unless we have a manifest bug at this point.

I have several bits like this that I'm going to queue up in a separate
tip branch.

	-hpa



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-28 17:32                     ` H. Peter Anvin
@ 2013-04-28 17:39                       ` Oleg Nesterov
  2013-04-28 17:43                         ` H. Peter Anvin
  0 siblings, 1 reply; 41+ messages in thread
From: Oleg Nesterov @ 2013-04-28 17:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Frederic Weisbecker, Ingo Molnar, Linus Torvalds,
	Cyrill Gorcunov, Peter Zijlstra, Thomas Gleixner, David Miller,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On 04/28, H. Peter Anvin wrote:
>
> On 04/28/2013 10:27 AM, Oleg Nesterov wrote:
>
> Looks good.  However, given the timing, I would think this is 3.11
> material unless we have a manifest bug at this point.

Yes, yes, this is only cleanup.

> I have several bits like this that I'm going to queue up in a separate
> tip branch.

Great, thanks.

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] x86: make DR*_RESERVED unsigned long
  2013-04-28 17:39                       ` Oleg Nesterov
@ 2013-04-28 17:43                         ` H. Peter Anvin
  0 siblings, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2013-04-28 17:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Frederic Weisbecker, Ingo Molnar, Linus Torvalds,
	Cyrill Gorcunov, Peter Zijlstra, Thomas Gleixner, David Miller,
	Theodore Ts'o, Linux Kernel Mailing List,
	the arch/x86 maintainers, Network Development, linux-ext4

On 04/28/2013 10:39 AM, Oleg Nesterov wrote:
> On 04/28, H. Peter Anvin wrote:
>>
>> On 04/28/2013 10:27 AM, Oleg Nesterov wrote:
>>
>> Looks good.  However, given the timing, I would think this is 3.11
>> material unless we have a manifest bug at this point.
> 
> Yes, yes, this is only cleanup.
> 

Thanks for tackling it.  I just wanted to make sure you didn't wonder
why I didn't want to push it for 3.10.

	-hpa


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-04-28 17:44 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+55aFwpLu0qNms=hkQr43yqD0K9DgESNm91OEWKT1ZuT8MU6Q@mail.gmail.com>
2013-04-23  0:23 ` Unsigned widening casts of binary "not" operations Linus Torvalds
2013-04-23  8:59   ` David Laight
2013-04-23 14:29     ` Linus Torvalds
2013-04-23 15:24       ` David Laight
2013-04-23 15:42         ` Linus Torvalds
2013-04-23 15:52           ` Theodore Ts'o
2013-04-23 16:05             ` Linus Torvalds
2013-04-23 17:37           ` David Miller
2013-04-23 17:52             ` Linus Torvalds
2013-04-23 17:56               ` David Miller
2013-04-23 18:21                 ` Linus Torvalds
2013-04-24 12:36             ` Geert Uytterhoeven
2013-04-23  0:32 ` H. Peter Anvin
2013-04-23 13:00 ` Theodore Ts'o
2013-04-24  7:26 ` Ingo Molnar
2013-04-24  7:47   ` Cyrill Gorcunov
2013-04-25  1:13     ` Lin Ming
2013-04-24 17:07   ` [PATCH] x86: make DR*_RESERVED unsigned long Oleg Nesterov
2013-04-24 18:45     ` H. Peter Anvin
2013-04-25 14:48       ` Oleg Nesterov
2013-04-26 16:38         ` [PATCH v2] " Oleg Nesterov
2013-04-26 16:44           ` H. Peter Anvin
2013-04-26 17:15             ` Oleg Nesterov
2013-04-27 14:45               ` Oleg Nesterov
2013-04-27 16:20                 ` H. Peter Anvin
2013-04-28  0:58                 ` Frederic Weisbecker
2013-04-28 17:27                   ` Oleg Nesterov
2013-04-28 17:32                     ` H. Peter Anvin
2013-04-28 17:39                       ` Oleg Nesterov
2013-04-28 17:43                         ` H. Peter Anvin
2013-04-24 22:48     ` [PATCH] " Frederic Weisbecker
2013-04-24 23:06       ` H. Peter Anvin
2013-04-24 23:31         ` Frederic Weisbecker
2013-04-25  1:20           ` H. Peter Anvin
2013-04-26 14:20   ` [tip:perf/core] perf/x86/intel/P4: Robistify P4 PMU types tip-bot for Ingo Molnar
2013-04-26 16:13     ` Borislav Petkov
2013-04-26 16:24       ` Cyrill Gorcunov
2013-04-26 16:39         ` Borislav Petkov
2013-04-26 16:46           ` Cyrill Gorcunov
2013-04-27 16:14             ` Borislav Petkov
2013-04-27 16:33               ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).