All of lore.kernel.org
 help / color / mirror / Atom feed
* + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch
@ 2022-10-20  0:03 Andrew Morton
  2022-10-20  9:43 ` Alexey Dobriyan
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Andrew Morton @ 2022-10-20  0:03 UTC (permalink / raw)
  To: mm-commits, torvalds, masahiroy, keescook, gregkh,
	andriy.shevchenko, Jason, akpm


The patch titled
     Subject: kbuild: treat char as always unsigned
has been added to the -mm mm-nonmm-unstable branch.  Its filename is
     kbuild-treat-char-as-always-unsigned.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/kbuild-treat-char-as-always-unsigned.patch

This patch will later appear in the mm-nonmm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Subject: kbuild: treat char as always unsigned
Date: Wed, 19 Oct 2022 14:30:34 -0600

Recently, some compile-time checking I added to the clamp_t family of
functions triggered a build error when a poorly written driver was
compiled on ARM, because the driver assumed that the naked `char` type is
signed, but ARM treats it as unsigned, and the C standard says it's
architecture-dependent.

I doubt this particular driver is the only instance in which unsuspecting
authors make assumptions about `char` with no `signed` or `unsigned`
specifier.  We were lucky enough this time that that driver used
`clamp_t(char, negative_value, positive_value)`, so the new checking code
found it, and I've sent a patch to fix it, but there are likely other
places lurking that won't be so easily unearthed.

So let's just eliminate this particular variety of heisensign bugs
entirely.  Set `-funsigned-char` globally, so that gcc makes the type
unsigned on all architectures.

This will break things in some places and fix things in others, so this
will likely cause a bit of churn while reconciling the type misuse.

Link: https://lkml.kernel.org/r/20221019203034.3795710-1-Jason@zx2c4.com
Link: https://lore.kernel.org/lkml/202210190108.ESC3pc3D-lkp@intel.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Makefile |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/Makefile~kbuild-treat-char-as-always-unsigned
+++ a/Makefile
@@ -562,7 +562,7 @@ KBUILD_AFLAGS   := -D__ASSEMBLY__ -fno-P
 KBUILD_CFLAGS   := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
 		   -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE \
 		   -Werror=implicit-function-declaration -Werror=implicit-int \
-		   -Werror=return-type -Wno-format-security \
+		   -Werror=return-type -Wno-format-security -funsigned-char \
 		   -std=gnu11
 KBUILD_CPPFLAGS := -D__KERNEL__
 KBUILD_RUSTFLAGS := $(rust_common_flags) \
_

Patches currently in -mm which might be from Jason@zx2c4.com are

wifi-rt2x00-use-explicitly-signed-type-for-clamping.patch
minmax-sanity-check-constant-bounds-when-clamping.patch
minmax-clamp-more-efficiently-by-avoiding-extra-comparison.patch
kbuild-treat-char-as-always-unsigned.patch


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch
  2022-10-20  0:03 + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Andrew Morton
@ 2022-10-20  9:43 ` Alexey Dobriyan
  2022-10-20  9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan
  2022-10-20 16:24 ` + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Jason A. Donenfeld
  2 siblings, 0 replies; 35+ messages in thread
From: Alexey Dobriyan @ 2022-10-20  9:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mm-commits, torvalds, masahiroy, keescook, gregkh,
	andriy.shevchenko, Jason, akpm

On Wed, Oct 19, 2022 at 05:03:55PM -0700, Andrew Morton wrote:
> This will break things in some places and fix things in others, so this
> will likely cause a bit of churn while reconciling the type misuse.

> --- a/Makefile~kbuild-treat-char-as-always-unsigned
> +++ a/Makefile
> @@ -562,7 +562,7 @@ KBUILD_AFLAGS   := -D__ASSEMBLY__ -fno-P
>  KBUILD_CFLAGS   := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
>  		   -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE \
>  		   -Werror=implicit-function-declaration -Werror=implicit-int \
> -		   -Werror=return-type -Wno-format-security \
> +		   -Werror=return-type -Wno-format-security -funsigned-char \
>  		   -std=gnu11
>  KBUILD_CPPFLAGS := -D__KERNEL__
>  KBUILD_RUSTFLAGS := $(rust_common_flags) \

ACK

Another reason is that characters were always some small non-negative
integers which mapped to pictures so making them signed was silly from
the beginning.

People should use "const char *" for C strings and "u8[]" for raw buffers.
Unfortunately, C can't give developer more.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20  0:03 + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Andrew Morton
  2022-10-20  9:43 ` Alexey Dobriyan
@ 2022-10-20  9:49 ` Alexey Dobriyan
  2022-10-20  9:56   ` [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() Alexey Dobriyan
  2022-10-20 16:28   ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld
  2022-10-20 16:24 ` + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Jason A. Donenfeld
  2 siblings, 2 replies; 35+ messages in thread
From: Alexey Dobriyan @ 2022-10-20  9:49 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: mm-commits, torvalds, masahiroy, keescook, gregkh,
	andriy.shevchenko, Jason, akpm

struct p4_event_bind::cntr[][] should be signed because of
the following code:

	int i, j;
        for (i = 0; i < P4_CNTR_LIMIT; i++) {
          ===>  j = bind->cntr[thread][i];
                if (j != -1 && !test_bit(j, used_mask))
                        return j;
        }

Making this member unsigned will make "j" 255 and fail "j != -1"
comparison.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 arch/x86/events/intel/p4.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -24,7 +24,7 @@ struct p4_event_bind {
 	unsigned int escr_msr[2];		/* ESCR MSR for this event */
 	unsigned int escr_emask;		/* valid ESCR EventMask bits */
 	unsigned int shared;			/* event is shared across threads */
-	char cntr[2][P4_CNTR_LIMIT];		/* counter index (offset), -1 on absence */
+	signed char cntr[2][P4_CNTR_LIMIT];	/* counter index (offset), -1 on absence */
 };
 
 struct p4_pebs_bind {

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common()
  2022-10-20  9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan
@ 2022-10-20  9:56   ` Alexey Dobriyan
  2022-10-20 16:28   ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld
  1 sibling, 0 replies; 35+ messages in thread
From: Alexey Dobriyan @ 2022-10-20  9:56 UTC (permalink / raw)
  To: akpm; +Cc: viro, linux-fsdevel, linux-kernel

Cast to unsigned int doesn't do anything because two comparisons are
a) for equality, and
b) both '/' and '\0' have non-negative values.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 fs/namei.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2657,7 +2657,7 @@ static int lookup_one_common(struct user_namespace *mnt_userns,
 	}
 
 	while (len--) {
-		unsigned int c = *(const unsigned char *)name++;
+		char c = *name++;
 		if (c == '/' || c == '\0')
 			return -EACCES;
 	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch
  2022-10-20  0:03 + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Andrew Morton
  2022-10-20  9:43 ` Alexey Dobriyan
  2022-10-20  9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan
@ 2022-10-20 16:24 ` Jason A. Donenfeld
  2022-10-20 21:12   ` Andrew Morton
  2 siblings, 1 reply; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20 16:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mm-commits, torvalds, masahiroy, keescook, gregkh, andriy.shevchenko

Hi Andrew,

On Wed, Oct 19, 2022 at 6:03 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> The patch titled
>      Subject: kbuild: treat char as always unsigned
> has been added to the -mm mm-nonmm-unstable branch.  Its filename is
>      kbuild-treat-char-as-always-unsigned.patch
>
> This patch will shortly appear at
>      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/kbuild-treat-char-as-always-unsigned.patch
>
> This patch will later appear in the mm-nonmm-unstable branch at
>     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Looks like you missed this email -
https://lore.kernel.org/lkml/Y1CP%2FuJb1SQjyS0n@zx2c4.com/ - and
didn't communicate about your intention to take this. Kind of
annoying, but oh well. Either way, I'll drop it from my tree, so that
-next has a clear path with it. Just please don't drop it from your
tree without appropriate discussion first. As mentioned in the thread,
there inevitably will be a few dragons with this patch, and the whole
idea of putting this in -next a whopping 4.5 months before 6.2 is
released is so that we can actually have time to chip away at those
issues as they come up. So please don't drop this at the very first
sign of trouble; just plan for some trouble.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20  9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan
  2022-10-20  9:56   ` [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() Alexey Dobriyan
@ 2022-10-20 16:28   ` Jason A. Donenfeld
  2022-10-20 17:14     ` Linus Torvalds
  1 sibling, 1 reply; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20 16:28 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: akpm, linux-kernel, mm-commits, torvalds, masahiroy, keescook,
	gregkh, andriy.shevchenko

On Thu, Oct 20, 2022 at 3:49 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> struct p4_event_bind::cntr[][] should be signed because of
> the following code:
>
>         int i, j;
>         for (i = 0; i < P4_CNTR_LIMIT; i++) {
>           ===>  j = bind->cntr[thread][i];
>                 if (j != -1 && !test_bit(j, used_mask))
>                         return j;
>         }
>
> Making this member unsigned will make "j" 255 and fail "j != -1"
> comparison.
>
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Nice catch.

Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>

>  arch/x86/events/intel/p4.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/arch/x86/events/intel/p4.c
> +++ b/arch/x86/events/intel/p4.c
> @@ -24,7 +24,7 @@ struct p4_event_bind {
>         unsigned int escr_msr[2];               /* ESCR MSR for this event */
>         unsigned int escr_emask;                /* valid ESCR EventMask bits */
>         unsigned int shared;                    /* event is shared across threads */
> -       char cntr[2][P4_CNTR_LIMIT];            /* counter index (offset), -1 on absence */
> +       signed char cntr[2][P4_CNTR_LIMIT];     /* counter index (offset), -1 on absence */
>  };

This is fine, but I wonder if this is a good occasion to use `s8` instead?

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 16:28   ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld
@ 2022-10-20 17:14     ` Linus Torvalds
  2022-10-20 17:33       ` Jason A. Donenfeld
  2022-10-21  5:59       ` Alexey Dobriyan
  0 siblings, 2 replies; 35+ messages in thread
From: Linus Torvalds @ 2022-10-20 17:14 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko

On Thu, Oct 20, 2022 at 9:28 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Nice catch.
>
> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>

Can we please try to collect these all in one place?

I see that Andrew picked up the original one for -mm, but I think it
would be better if we had one specific place for all of this (one
branch) to collect it all.

I'm actually trying to do a "make allyesconfig" build on x86-64 with
both signed and unsigned char, and trying to see if I can script
something sane to show differences.

Doing the build is easy, but the differences end up being huge just
due to silly constants (ie the whole "one small difference ends up
changing layout, which then causes hundreds of megs of diff just due
to hex constants in the disassembly changing".

I think the problem is that I tried to do the vmlinux file after
linking to make it easier. Doing the individual object files
separately would probably have been a better idea, just to avoid this
kind of relocation offset issues.

There's not a huge amount of pull requests (the week after -rc1 tends
to be quiet for me), so I'll continue to waste my time on this.

                 Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 17:14     ` Linus Torvalds
@ 2022-10-20 17:33       ` Jason A. Donenfeld
  2022-10-20 17:42         ` Linus Torvalds
  2022-10-24 15:44         ` Jason A. Donenfeld
  2022-10-21  5:59       ` Alexey Dobriyan
  1 sibling, 2 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20 17:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko, Stephen Rothwell

Hi Linus,

On Thu, Oct 20, 2022 at 11:15 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Can we please try to collect these all in one place?
>
> I see that Andrew picked up the original one for -mm, but I think it
> would be better if we had one specific place for all of this (one
> branch) to collect it all.

Sure. Andrew can drop it from -mm, and I'll collect everything in:

https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=unsigned-char&r

And I'll ask Stephen to add that branch to -next.

> I'm actually trying to do a "make allyesconfig" build on x86-64 with
> both signed and unsigned char, and trying to see if I can script
> something sane to show differences.
>
> Doing the build is easy, but the differences end up being huge just
> due to silly constants (ie the whole "one small difference ends up
> changing layout, which then causes hundreds of megs of diff just due
> to hex constants in the disassembly changing".

If you can get a copy of IDA Pro, diaphora is quite nice:
https://github.com/joxeankoret/diaphora

Or sometimes with objdump, I've had more success by keeping debug
symbols, and then trimming offsets from jmps.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 17:33       ` Jason A. Donenfeld
@ 2022-10-20 17:42         ` Linus Torvalds
  2022-10-20 18:57           ` Kees Cook
  2022-10-24 15:44         ` Jason A. Donenfeld
  1 sibling, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2022-10-20 17:42 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko, Stephen Rothwell

On Thu, Oct 20, 2022 at 10:33 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Or sometimes with objdump, I've had more success by keeping debug
> symbols, and then trimming offsets from jmps.

objdump is what I'm using, and it actually seems ok on individual object files.

Now I just need to script the "do all the object files" and see how
massive the end result is.

                Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 17:42         ` Linus Torvalds
@ 2022-10-20 18:57           ` Kees Cook
  2022-10-20 19:39             ` Linus Torvalds
  2022-10-26  1:50             ` Jason A. Donenfeld
  0 siblings, 2 replies; 35+ messages in thread
From: Kees Cook @ 2022-10-20 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel,
	mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell

On Thu, Oct 20, 2022 at 10:42:25AM -0700, Linus Torvalds wrote:
> On Thu, Oct 20, 2022 at 10:33 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Or sometimes with objdump, I've had more success by keeping debug
> > symbols, and then trimming offsets from jmps.
> 
> objdump is what I'm using, and it actually seems ok on individual object files.
> 
> Now I just need to script the "do all the object files" and see how
> massive the end result is.

For the a/b build, I start with all*config, then:

# Stop painful noise
CONFIG_KCOV=n
CONFIG_GCOV_KERNEL=n
CONFIG_GCC_PLUGINS=n
CONFIG_IKHEADERS=n
CONFIG_KASAN=n
CONFIG_UBSAN=n
CONFIG_KCSAN=n
CONFIG_KMSAN=n
# Get us source/line details
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_INFO_REDUCED=n CONFIG_DEBUG_INFO_COMPRESSED=n
CONFIG_DEBUG_INFO_SPLIT=n

And to keep other build-time junk stabilized[1], I build with these make
options:

KBUILD_BUILD_TIMESTAMP=1970-01-01
KBUILD_BUILD_USER=user
KBUILD_BUILD_HOST=host
KBUILD_BUILD_VERSION=1

For the code diff, I use:

objdump --disassemble --demangle --no-show-raw-insn --no-addresses

and when doing the manual examination:

objdump --disassemble --demangle --reloc --source -l --no-show-raw-insn


My not-great way to filter out the movsbl/movzbl, I added this to diff:
	-I '\bmov[sz]bl\b'


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 18:57           ` Kees Cook
@ 2022-10-20 19:39             ` Linus Torvalds
  2022-10-20 20:17               ` Linus Torvalds
  2022-10-26  1:50             ` Jason A. Donenfeld
  1 sibling, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2022-10-20 19:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel,
	mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell

On Thu, Oct 20, 2022 at 11:57 AM Kees Cook <keescook@chromium.org> wrote:
>
> For the a/b build, I start with all*config, then:

Yes, I have that part all figured out.

> For the code diff, I use:
>
> objdump --disassemble --demangle --no-show-raw-insn --no-addresses

This part I still hate.

Have you figured out any way to get objdump to actually show the
relocations in-place in the assembly?

Ie, instead of

        call   <will_become_orphaned_pgrp+0xbf>
                        R_X86_64_PLT32  debug_lockdep_rcu_enabled-0x4

just show it as

        call   debug_lockdep_rcu_enabled

to make the diff - when it exists - hugely more legible?

Because now any code changes will not just show the code changes, but
end up showing a lot of silly changes because the "+0xbf" changes.

I guess I'll just have to remove all of those hex constants anyway,
because they also show up for any jumps inside the functions.

I also explored trying to compare just the generates *.s files, but
that has its own set of problems, notably with gcc label numbering.
Plus they are harder to generate for the full tree with our standard
build rules (maybe there's some trick I haven't thought of to make gcc
keep the '*.s' files as it generates the '*.o' ones).

I do have something that "works", but it turns out to be very noisy,
because while gcc *often* generates almost identical code, then when
it doesn't it can be quite nasty.

When there is a *real* difference, having a nasty diff is fine. For
example, the arch/x86/events/intel/p4.c issue that Alexey found
generates huge differences, because gcc can just see that "ok, that's
never negative", and generates completely different code.

That's good.

But when there's some small change that just changes the offset, it's
just annoying, even with --no-addresses. The hex numbers can be edited
out, but then you have the nop padding changes etc etc.

So getting rid of that kind of pointless noise is just about all the
effort here.

                  Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 19:39             ` Linus Torvalds
@ 2022-10-20 20:17               ` Linus Torvalds
  2022-10-20 21:34                 ` Andy Shevchenko
  2022-10-21  6:48                 ` Greg KH
  0 siblings, 2 replies; 35+ messages in thread
From: Linus Torvalds @ 2022-10-20 20:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel,
	mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell

On Thu, Oct 20, 2022 at 12:39 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So getting rid of that kind of pointless noise is just about all the
> effort here.

Current status: of 22.5k object files, 971 have differences.

In many cases, the differences are small and trivial. Example:

 - fscrypt_show_test_dummy_encryption() does that same "print a char with %c"

        seq_printf(seq, "%ctest_dummy_encryption=v%d", sep, vers);

which is entirely harmless and exactly the same as that  (but I most
certainly haven't figured out how to automatically script away that
"oh, %c is fine).

And in other cases, there's no actual difference at all, just
different register usage, so the diff looks fairly big, but doesn't
seem to be real.  In one case I looked at, it started with a 'movzbl',
but it was that in both cases, because the type was actually 'unsigned
char' to begin with. But for some reason it just used different
registers. Example:

 - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c

   The reason here *seems* to be that

                        char *buf;
                        buf = (char *)urb->transfer_buffer;

   where it really probably should be 'u8 *buf', since it actually
does a cast to 'u8' in one place, but there isn't even any read of
that 'buf' pointer. So the difference seems to be entirely just some
"different type in assignment" cast internal to gcc that then
incidentally generated a random other choice in register allocation.

And in some cases the differences are enormous:

 - drivers/net/wireless/ralink/rt2x00/rt2800lib.c generates a 220kB diff

which seems to be due to entirely different inlining decisions or
something, and the differences are so enormous that I didn't even
start looking at the cause.

There's a fair number of things in between, like fs/ext4/super.c that
generates a lot of differences, some of them obvious, some of them
very much not obvious that may br similar to the
handle_control_request() ones.

*Presumably* the ext4 super.c code is fine (since it has been used on
architectures that already had unsigned char), but it actually
generates a bigger diff than the p4 events driver does...

And that arch/x86/events/intel/p4.c thing that Alexey found sadly does
not stand out at all in that "somewhere in the middle" bunch.

So I think my "hey, we can automate the comparison" is pretty much a
dud, but I'm not giving up quite yet. It's annoying how *most* of the
kernel files show no differences at all, but then I can't even figure
our why other files do show differences with no obvious reason for
them at all.

                Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch
  2022-10-20 16:24 ` + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Jason A. Donenfeld
@ 2022-10-20 21:12   ` Andrew Morton
  2022-10-20 21:13     ` Jason A. Donenfeld
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2022-10-20 21:12 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: mm-commits, torvalds, masahiroy, keescook, gregkh, andriy.shevchenko

On Thu, 20 Oct 2022 10:24:51 -0600 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> Hi Andrew,
> 
> On Wed, Oct 19, 2022 at 6:03 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > The patch titled
> >      Subject: kbuild: treat char as always unsigned
> > has been added to the -mm mm-nonmm-unstable branch.  Its filename is
> >      kbuild-treat-char-as-always-unsigned.patch
> >
> > This patch will shortly appear at
> >      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/kbuild-treat-char-as-always-unsigned.patch
> >
> > This patch will later appear in the mm-nonmm-unstable branch at
> >     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> 
> Looks like you missed this email -
> https://lore.kernel.org/lkml/Y1CP%2FuJb1SQjyS0n@zx2c4.com/

I saw that email after I'd merged it.

> didn't communicate about your intention to take this. Kind of
> annoying, but oh well. Either way, I'll drop it from my tree, so that
> -next has a clear path with it. Just please don't drop it from your
> tree without appropriate discussion first. As mentioned in the thread,
> there inevitably will be a few dragons with this patch, and the whole
> idea of putting this in -next a whopping 4.5 months before 6.2 is
> released is so that we can actually have time to chip away at those
> issues as they come up. So please don't drop this at the very first
> sign of trouble; just plan for some trouble.

No, please go ahead and merge it in your tree (I didn't actually know
you ran a tree).  Once it appears in -next, Stephen will tell us and
I'll drop my copy.  This happens fairly commonly.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch
  2022-10-20 21:12   ` Andrew Morton
@ 2022-10-20 21:13     ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mm-commits, torvalds, masahiroy, keescook, gregkh, andriy.shevchenko

On Thu, Oct 20, 2022 at 3:13 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 20 Oct 2022 10:24:51 -0600 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>
> > Hi Andrew,
> >
> > On Wed, Oct 19, 2022 at 6:03 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > The patch titled
> > >      Subject: kbuild: treat char as always unsigned
> > > has been added to the -mm mm-nonmm-unstable branch.  Its filename is
> > >      kbuild-treat-char-as-always-unsigned.patch
> > >
> > > This patch will shortly appear at
> > >      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/kbuild-treat-char-as-always-unsigned.patch
> > >
> > > This patch will later appear in the mm-nonmm-unstable branch at
> > >     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >
> > Looks like you missed this email -
> > https://lore.kernel.org/lkml/Y1CP%2FuJb1SQjyS0n@zx2c4.com/
>
> I saw that email after I'd merged it.
>
> > didn't communicate about your intention to take this. Kind of
> > annoying, but oh well. Either way, I'll drop it from my tree, so that
> > -next has a clear path with it. Just please don't drop it from your
> > tree without appropriate discussion first. As mentioned in the thread,
> > there inevitably will be a few dragons with this patch, and the whole
> > idea of putting this in -next a whopping 4.5 months before 6.2 is
> > released is so that we can actually have time to chip away at those
> > issues as they come up. So please don't drop this at the very first
> > sign of trouble; just plan for some trouble.
>
> No, please go ahead and merge it in your tree (I didn't actually know
> you ran a tree).  Once it appears in -next, Stephen will tell us and
> I'll drop my copy.  This happens fairly commonly.

Okay, will do.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 20:17               ` Linus Torvalds
@ 2022-10-20 21:34                 ` Andy Shevchenko
  2022-10-20 22:46                   ` Jason A. Donenfeld
  2022-10-21  6:48                 ` Greg KH
  1 sibling, 1 reply; 35+ messages in thread
From: Andy Shevchenko @ 2022-10-20 21:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Jason A. Donenfeld, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, Stephen Rothwell

On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote:
> On Thu, Oct 20, 2022 at 12:39 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:

...

> And in some cases the differences are enormous:
> 
>  - drivers/net/wireless/ralink/rt2x00/rt2800lib.c generates a 220kB diff
> 
> which seems to be due to entirely different inlining decisions or
> something, and the differences are so enormous that I didn't even
> start looking at the cause.

This one is what we start the epopee from. I think Jason handled it in his last
patch against this certain driver.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 21:34                 ` Andy Shevchenko
@ 2022-10-20 22:46                   ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20 22:46 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linus Torvalds, Kees Cook, Alexey Dobriyan, akpm, linux-kernel,
	mm-commits, masahiroy, gregkh, Stephen Rothwell

On Fri, Oct 21, 2022 at 12:34:37AM +0300, Andy Shevchenko wrote:
> On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote:
> > On Thu, Oct 20, 2022 at 12:39 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> 
> ...
> 
> > And in some cases the differences are enormous:
> > 
> >  - drivers/net/wireless/ralink/rt2x00/rt2800lib.c generates a 220kB diff
> > 
> > which seems to be due to entirely different inlining decisions or
> > something, and the differences are so enormous that I didn't even
> > start looking at the cause.
> 
> This one is what we start the epopee from. I think Jason handled it in his last
> patch against this certain driver.

Right, and Kale is taking it for 6.1, because it fixes existing breakage
on ARM. But it's not broken on x86 with -funsigned-char.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 17:14     ` Linus Torvalds
  2022-10-20 17:33       ` Jason A. Donenfeld
@ 2022-10-21  5:59       ` Alexey Dobriyan
  2022-10-21 17:11         ` Linus Torvalds
  1 sibling, 1 reply; 35+ messages in thread
From: Alexey Dobriyan @ 2022-10-21  5:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko

On Thu, Oct 20, 2022 at 10:14:54AM -0700, Linus Torvalds wrote:
> On Thu, Oct 20, 2022 at 9:28 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Nice catch.
> >
> > Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
> 
> Can we please try to collect these all in one place?
> 
> I see that Andrew picked up the original one for -mm, but I think it
> would be better if we had one specific place for all of this (one
> branch) to collect it all.
> 
> I'm actually trying to do a "make allyesconfig" build on x86-64 with
> both signed and unsigned char, and trying to see if I can script
> something sane to show differences.

It is very entertaining, i've given up and started patching sparse
but it needs more because char constants are ints:


diff --git a/evaluate.c b/evaluate.c
index 61f59ee3..ab607581 100644
--- a/evaluate.c
+++ b/evaluate.c
@@ -321,6 +321,10 @@ static struct expression * cast_to(struct expression *old, struct symbol *type)
 	if (old->ctype != &null_ctype && is_same_type(old, type))
 		return old;
 
+	if (is_char_type(old->ctype)) {
+		sparse_error(old->pos, "XXX char");
+	}
+
 	expr = alloc_expression(old->pos, EXPR_IMPLIED_CAST);
 	expr->ctype = type;
 	expr->cast_type = type;
diff --git a/symbol.h b/symbol.h
index 5270fcd7..8e62aca2 100644
--- a/symbol.h
+++ b/symbol.h
@@ -455,6 +455,14 @@ static inline int is_byte_type(struct symbol *type)
 	return type->bit_size == bits_in_char && type->type != SYM_BITFIELD;
 }
 
+static inline int is_char_type(const struct symbol *type)
+{
+	if (type->type == SYM_NODE) {
+		type = type->ctype.base_type;
+	}
+	return type == &char_ctype;
+}
+
 static inline int is_wchar_type(struct symbol *type)
 {
 	if (type->type == SYM_NODE)

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 20:17               ` Linus Torvalds
  2022-10-20 21:34                 ` Andy Shevchenko
@ 2022-10-21  6:48                 ` Greg KH
  2022-10-21  7:24                   ` Jason A. Donenfeld
  1 sibling, 1 reply; 35+ messages in thread
From: Greg KH @ 2022-10-21  6:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Jason A. Donenfeld, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, andriy.shevchenko,
	Stephen Rothwell

On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote:
> And in other cases, there's no actual difference at all, just
> different register usage, so the diff looks fairly big, but doesn't
> seem to be real.  In one case I looked at, it started with a 'movzbl',
> but it was that in both cases, because the type was actually 'unsigned
> char' to begin with. But for some reason it just used different
> registers. Example:
> 
>  - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c
> 
>    The reason here *seems* to be that
> 
>                         char *buf;
>                         buf = (char *)urb->transfer_buffer;
> 
>    where it really probably should be 'u8 *buf', since it actually
> does a cast to 'u8' in one place, but there isn't even any read of
> that 'buf' pointer. So the difference seems to be entirely just some
> "different type in assignment" cast internal to gcc that then
> incidentally generated a random other choice in register allocation.

I've send a patch for this now:
	https://lore.kernel.org/r/20221021064453.3341050-1-gregkh@linuxfoundation.org
and will take it through the USB tree, unless Jason wants to grab it
through his tree.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-21  6:48                 ` Greg KH
@ 2022-10-21  7:24                   ` Jason A. Donenfeld
  2022-10-21  7:36                     ` Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-21  7:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Kees Cook, Alexey Dobriyan, akpm, linux-kernel,
	mm-commits, masahiroy, andriy.shevchenko, Stephen Rothwell

Hi Greg,

On Fri, Oct 21, 2022 at 2:48 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote:
> > And in other cases, there's no actual difference at all, just
> > different register usage, so the diff looks fairly big, but doesn't
> > seem to be real.  In one case I looked at, it started with a 'movzbl',
> > but it was that in both cases, because the type was actually 'unsigned
> > char' to begin with. But for some reason it just used different
> > registers. Example:
> >
> >  - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c
> >
> >    The reason here *seems* to be that
> >
> >                         char *buf;
> >                         buf = (char *)urb->transfer_buffer;
> >
> >    where it really probably should be 'u8 *buf', since it actually
> > does a cast to 'u8' in one place, but there isn't even any read of
> > that 'buf' pointer. So the difference seems to be entirely just some
> > "different type in assignment" cast internal to gcc that then
> > incidentally generated a random other choice in register allocation.
>
> I've send a patch for this now:
>         https://lore.kernel.org/r/20221021064453.3341050-1-gregkh@linuxfoundation.org
> and will take it through the USB tree, unless Jason wants to grab it
> through his tree.

This doesn't appear to have any actual effect, but just changes gcc's
register allocation unexpectedly. So feel free to take it, as it
doesn't seem like it's "one of those bad cases" that I'm keeping track
of.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-21  7:24                   ` Jason A. Donenfeld
@ 2022-10-21  7:36                     ` Greg KH
  0 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2022-10-21  7:36 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linus Torvalds, Kees Cook, Alexey Dobriyan, akpm, linux-kernel,
	mm-commits, masahiroy, andriy.shevchenko, Stephen Rothwell

On Fri, Oct 21, 2022 at 03:24:27AM -0400, Jason A. Donenfeld wrote:
> Hi Greg,
> 
> On Fri, Oct 21, 2022 at 2:48 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote:
> > > And in other cases, there's no actual difference at all, just
> > > different register usage, so the diff looks fairly big, but doesn't
> > > seem to be real.  In one case I looked at, it started with a 'movzbl',
> > > but it was that in both cases, because the type was actually 'unsigned
> > > char' to begin with. But for some reason it just used different
> > > registers. Example:
> > >
> > >  - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c
> > >
> > >    The reason here *seems* to be that
> > >
> > >                         char *buf;
> > >                         buf = (char *)urb->transfer_buffer;
> > >
> > >    where it really probably should be 'u8 *buf', since it actually
> > > does a cast to 'u8' in one place, but there isn't even any read of
> > > that 'buf' pointer. So the difference seems to be entirely just some
> > > "different type in assignment" cast internal to gcc that then
> > > incidentally generated a random other choice in register allocation.
> >
> > I've send a patch for this now:
> >         https://lore.kernel.org/r/20221021064453.3341050-1-gregkh@linuxfoundation.org
> > and will take it through the USB tree, unless Jason wants to grab it
> > through his tree.
> 
> This doesn't appear to have any actual effect, but just changes gcc's
> register allocation unexpectedly. So feel free to take it, as it
> doesn't seem like it's "one of those bad cases" that I'm keeping track
> of.

Great, will take it through my tree, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-21  5:59       ` Alexey Dobriyan
@ 2022-10-21 17:11         ` Linus Torvalds
  2022-10-21 17:23           ` Linus Torvalds
  0 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2022-10-21 17:11 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Jason A. Donenfeld, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko

[-- Attachment #1: Type: text/plain, Size: 339 bytes --]

On Thu, Oct 20, 2022 at 10:59 PM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> It is very entertaining, i've given up and started patching sparse
> but it needs more because char constants are ints:

I think you can fix that by simply warning about character constants
with the high bit set.

Something like this..

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 472 bytes --]

 char.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/char.c b/char.c
index 730ae3f5..3706e033 100644
--- a/char.c
+++ b/char.c
@@ -93,6 +93,10 @@ void get_char_constant(struct token *token, unsigned long long *val)
 	if (p != end)
 		warning(token->pos,
 			"multi-character character constant");
+	if (v & 0x80) {
+		if (type >= TOKEN_CHAR && type <= TOKEN_CHAR_EMBEDDED_3)
+			warning(token->pos, "character constant with sign bit set");
+	}
 	*val = v;
 }
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-21 17:11         ` Linus Torvalds
@ 2022-10-21 17:23           ` Linus Torvalds
  0 siblings, 0 replies; 35+ messages in thread
From: Linus Torvalds @ 2022-10-21 17:23 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Jason A. Donenfeld, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko

On Fri, Oct 21, 2022 at 10:11 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think you can fix that by simply warning about character constants
> with the high bit set.
>
> Something like this..

This actually triggers for the kernel, and I think those (few)
warnings are likely worth just fixing.

Because things like

                put_tty_queue('\377', ldata);

just isn't worth it.

Or look at this horror:

                        if (c == (unsigned char) '\377' && I_PARMRK(tty))

where somebody did realize that '\377' might be -1, so they added the
"helpful" cast.

Wouldn't that be much nicer and simpler as just

                        if (c == 255 && I_PARMRK(tty))

instead? Or just 0377 if you really love octal in the context of
characters (and really, nobody should). Or 0xff.

At no point does "(unsigned char) '\377' " strike me as a really
readable way to write things. There's two of those things.

We also have

        memset(stack, '\xff', 64);

which really isn't helpful either. Why not just use 0xff, or even just -1.

We also have a lot of those in lib/hexdump.c, for no good reason,
particularly as those arrays are 'unsigned char[]' to begin with, not
'char'.

I really don't understand why people would use '\xAA' instead of just
using 0xAA.

We also have

        static char sample_rate_buffer[4] = { '\x80', '\xbb', '\x00', '\x00' };

in sound/usb/mixer_scarlett.c, and that should be a byte array, so the
'char' should probably be 'unsigned char' or 'u8' in the first place,
and again it would be just simpler and clearer to use plain hex
constants.

So that sparse warning looks simple enough, and I think every single
case was just the kernel being odd.

Now, in *string* constants, you sometimes do want to use that format, ie

    u8 array[] = "abcd\377";

is a reasonable way to avoid having to use a more complex initializer.
But that sparse patch of mine only complains about (non-wide)
character constants unless I screwed something up.

                  Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 17:33       ` Jason A. Donenfeld
  2022-10-20 17:42         ` Linus Torvalds
@ 2022-10-24 15:44         ` Jason A. Donenfeld
  1 sibling, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-24 15:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy,
	keescook, gregkh, andriy.shevchenko, Stephen Rothwell

On Thu, Oct 20, 2022 at 11:33:39AM -0600, Jason A. Donenfeld wrote:
> Hi Linus,
> 
> On Thu, Oct 20, 2022 at 11:15 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Can we please try to collect these all in one place?
> >
> > I see that Andrew picked up the original one for -mm, but I think it
> > would be better if we had one specific place for all of this (one
> > branch) to collect it all.
> 
> Sure. Andrew can drop it from -mm, and I'll collect everything in:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=unsigned-char&r
> 
> And I'll ask Stephen to add that branch to -next.

So, as discussed, I'm doing this, and it's in -next now. And as a
result, we're getting warnings and such that I am fixing one by one.
Progress, good.

But it occurs to me that most of these bugs are not in
architecture-specific code like the x86 p4_event_bind one from last
week. That means I'll be submitting these as *fixes* during 6.1, since
they're broken on some architecture already, rather than waiting to
submit them to you via my unsigned-char tree in 6.2.

Just FYI.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-20 18:57           ` Kees Cook
  2022-10-20 19:39             ` Linus Torvalds
@ 2022-10-26  1:50             ` Jason A. Donenfeld
  2022-10-26 12:58                 ` [cocci] " Jason A. Donenfeld
  1 sibling, 1 reply; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-26  1:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits,
	masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell

On Thu, Oct 20, 2022 at 11:57:12AM -0700, Kees Cook wrote:
> On Thu, Oct 20, 2022 at 10:42:25AM -0700, Linus Torvalds wrote:
> > On Thu, Oct 20, 2022 at 10:33 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > >
> > > Or sometimes with objdump, I've had more success by keeping debug
> > > symbols, and then trimming offsets from jmps.
> > 
> > objdump is what I'm using, and it actually seems ok on individual object files.
> > 
> > Now I just need to script the "do all the object files" and see how
> > massive the end result is.
> 
> For the a/b build, I start with all*config, then:
> 
> # Stop painful noise
> CONFIG_KCOV=n
> CONFIG_GCOV_KERNEL=n
> CONFIG_GCC_PLUGINS=n
> CONFIG_IKHEADERS=n
> CONFIG_KASAN=n
> CONFIG_UBSAN=n
> CONFIG_KCSAN=n
> CONFIG_KMSAN=n
> # Get us source/line details
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
> CONFIG_DEBUG_INFO_REDUCED=n CONFIG_DEBUG_INFO_COMPRESSED=n
> CONFIG_DEBUG_INFO_SPLIT=n
> 
> And to keep other build-time junk stabilized[1], I build with these make
> options:
> 
> KBUILD_BUILD_TIMESTAMP=1970-01-01
> KBUILD_BUILD_USER=user
> KBUILD_BUILD_HOST=host
> KBUILD_BUILD_VERSION=1

The LLVM `.ll` file thing I tried turned out to be a disaster. Too much
noise, as this is too early of a stage.

The traditional objdump comparison does work, though. It produces a good
amount of noise, but still yields a manageable amount of diffs -- 882 --
which can then be paired down more with heuristics.

I've been using this script below to compare `linux-schar/` with
`linux-uchar/`, which creates a directory `linux-schar-uchar/` filled
with color diffs that I can then flip through using
`less -R linux-schar-uchar/*.diff`. Seems to work okay, so I'll post it
here in case others are curious about looking through these.

Jason

------8<--------------------------------

#!/bin/bash

asm_diff() {
	objdump \
		--disassemble \
		--demangle \
		--no-show-raw-insn \
		--no-addresses \
		--section=.text \
		--disassembler-options=intel \
	"$1" | \
	sed \
		-e 's/[0-9a-f]\+ \(<[a-zA-Z0-9_+-]\+>\)/?? \1/g' \
		-e 's/<\([a-zA-Z0-9_-]\+\)+0x[a-f0-9]\+>/<\1>/g' \
		-e '/\/[a-zA-Z0-9._-]\+\.o:/d'
}

A=linux-schar
B=linux-uchar
C=linux-schar-uchar

rm -rf "$C"
mkdir -p "$C"

while read -r obj_a; do
	obj_b="$B/${obj_a#$A/}"
	diff_c="${obj_a#$A/}"
	diff_c="$C/${diff_c//\//--}.diff"
	[[ -f $obj_b ]] || { echo "ERROR: $obj_b is missing" >&2; exit 1; }

	echo "${obj_a#$A/}" >&2

	diff --color=always --text --unified=10 \
		<(asm_diff "$obj_a") <(asm_diff "$obj_b") > "$diff_c" && \
	rm -f "$diff_c"
done < <(exec find "$A" -name '*.o')

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-26  1:50             ` Jason A. Donenfeld
@ 2022-10-26 12:58                 ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-26 12:58 UTC (permalink / raw)
  To: Kees Cook, cocci
  Cc: Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits,
	masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell

On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> The traditional objdump comparison does work, though. It produces a good

Another thing that appears to work well is just using Coccinelle
scripts. I've had some success just scrolling through the results of:

    @@
    char c;
    expression E;
    @@
    (
    * E > c
    |
    * E >= c
    |
    * E < c
    |
    * E <= c
    )

That also triggers on explicitly signed chars, and examining those
reveals that quite a bit of code in the tree already does do the right
thing, which is good.

From looking at this and objdump output, it looks like most naked-char
usage that isn't for strings is actually already assuming it's unsigned,
using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
things here and there, but all and all, I don't think this is looking as
terrible as I initially feared.

I'm CC'ing the Coccinelle people to see if they have any nice ideas on
improvements. Specifically, the thing we're trying to identify is:

  - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
    where:
  - It's not being used for characters; and
  - It's doing something that assumes it is signed, such as various
    types of comparisons or decrements.

LWN wrote a summary of the general problem, in case that helps describe
what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/

Any nice Cocci tricks for this?

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
@ 2022-10-26 12:58                 ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-10-26 12:58 UTC (permalink / raw)
  To: Kees Cook, cocci
  Cc: Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits,
	masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell

On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> The traditional objdump comparison does work, though. It produces a good

Another thing that appears to work well is just using Coccinelle
scripts. I've had some success just scrolling through the results of:

    @@
    char c;
    expression E;
    @@
    (
    * E > c
    |
    * E >= c
    |
    * E < c
    |
    * E <= c
    )

That also triggers on explicitly signed chars, and examining those
reveals that quite a bit of code in the tree already does do the right
thing, which is good.

From looking at this and objdump output, it looks like most naked-char
usage that isn't for strings is actually already assuming it's unsigned,
using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
things here and there, but all and all, I don't think this is looking as
terrible as I initially feared.

I'm CC'ing the Coccinelle people to see if they have any nice ideas on
improvements. Specifically, the thing we're trying to identify is:

  - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
    where:
  - It's not being used for characters; and
  - It's doing something that assumes it is signed, such as various
    types of comparisons or decrements.

LWN wrote a summary of the general problem, in case that helps describe
what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/

Any nice Cocci tricks for this?

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-26 12:58                 ` [cocci] " Jason A. Donenfeld
@ 2022-10-26 13:17                   ` Andy Shevchenko
  -1 siblings, 0 replies; 35+ messages in thread
From: Andy Shevchenko @ 2022-10-26 13:17 UTC (permalink / raw)
  To: Jason A. Donenfeld, Rasmus Villemoes
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, Stephen Rothwell

+Cc: Rasmus as he has done a lot regarding library stuff and optimizations and
     he knows Coccinelle (to some extent as far as I can tell).

On Wed, Oct 26, 2022 at 02:58:34PM +0200, Jason A. Donenfeld wrote:
> On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> > The traditional objdump comparison does work, though. It produces a good
> 
> Another thing that appears to work well is just using Coccinelle
> scripts. I've had some success just scrolling through the results of:
> 
>     @@
>     char c;
>     expression E;
>     @@
>     (
>     * E > c
>     |
>     * E >= c
>     |
>     * E < c
>     |
>     * E <= c
>     )
> 
> That also triggers on explicitly signed chars, and examining those
> reveals that quite a bit of code in the tree already does do the right
> thing, which is good.
> 
> From looking at this and objdump output, it looks like most naked-char
> usage that isn't for strings is actually already assuming it's unsigned,
> using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
> things here and there, but all and all, I don't think this is looking as
> terrible as I initially feared.
> 
> I'm CC'ing the Coccinelle people to see if they have any nice ideas on
> improvements. Specifically, the thing we're trying to identify is:
> 
>   - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
>     where:
>   - It's not being used for characters; and
>   - It's doing something that assumes it is signed, such as various
>     types of comparisons or decrements.
> 
> LWN wrote a summary of the general problem, in case that helps describe
> what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/
> 
> Any nice Cocci tricks for this?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
@ 2022-10-26 13:17                   ` Andy Shevchenko
  0 siblings, 0 replies; 35+ messages in thread
From: Andy Shevchenko @ 2022-10-26 13:17 UTC (permalink / raw)
  To: Jason A. Donenfeld, Rasmus Villemoes
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, Stephen Rothwell

+Cc: Rasmus as he has done a lot regarding library stuff and optimizations and
     he knows Coccinelle (to some extent as far as I can tell).

On Wed, Oct 26, 2022 at 02:58:34PM +0200, Jason A. Donenfeld wrote:
> On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> > The traditional objdump comparison does work, though. It produces a good
> 
> Another thing that appears to work well is just using Coccinelle
> scripts. I've had some success just scrolling through the results of:
> 
>     @@
>     char c;
>     expression E;
>     @@
>     (
>     * E > c
>     |
>     * E >= c
>     |
>     * E < c
>     |
>     * E <= c
>     )
> 
> That also triggers on explicitly signed chars, and examining those
> reveals that quite a bit of code in the tree already does do the right
> thing, which is good.
> 
> From looking at this and objdump output, it looks like most naked-char
> usage that isn't for strings is actually already assuming it's unsigned,
> using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
> things here and there, but all and all, I don't think this is looking as
> terrible as I initially feared.
> 
> I'm CC'ing the Coccinelle people to see if they have any nice ideas on
> improvements. Specifically, the thing we're trying to identify is:
> 
>   - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
>     where:
>   - It's not being used for characters; and
>   - It's doing something that assumes it is signed, such as various
>     types of comparisons or decrements.
> 
> LWN wrote a summary of the general problem, in case that helps describe
> what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/
> 
> Any nice Cocci tricks for this?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-10-26 12:58                 ` [cocci] " Jason A. Donenfeld
  (?)
  (?)
@ 2022-11-02 17:17                 ` Julia Lawall
  2022-11-03  0:08                   ` Jason A. Donenfeld
  -1 siblings, 1 reply; 35+ messages in thread
From: Julia Lawall @ 2022-11-02 17:17 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell



On Wed, 26 Oct 2022, Jason A. Donenfeld wrote:

> On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> > The traditional objdump comparison does work, though. It produces a good
>
> Another thing that appears to work well is just using Coccinelle
> scripts. I've had some success just scrolling through the results of:
>
>     @@
>     char c;
>     expression E;
>     @@
>     (
>     * E > c
>     |
>     * E >= c
>     |
>     * E < c
>     |
>     * E <= c
>     )
>
> That also triggers on explicitly signed chars, and examining those
> reveals that quite a bit of code in the tree already does do the right
> thing, which is good.
>
> From looking at this and objdump output, it looks like most naked-char
> usage that isn't for strings is actually already assuming it's unsigned,
> using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
> things here and there, but all and all, I don't think this is looking as
> terrible as I initially feared.
>
> I'm CC'ing the Coccinelle people to see if they have any nice ideas on
> improvements. Specifically, the thing we're trying to identify is:
>
>   - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
>     where:

Try putting

disable optional_qualifier

between the initial @@, to avoid the implicit matching of signed and
unsigned.

>   - It's not being used for characters; and
>   - It's doing something that assumes it is signed, such as various
>     types of comparisons or decrements.

I took a quick look at the article, but I'm not completely sure what you
are getting at here.  Could you give some examples of what you do and
don't want to find?

You don't want the case where c is 'x', for some x?

julia

> LWN wrote a summary of the general problem, in case that helps describe
> what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/
>
> Any nice Cocci tricks for this?
>
> Jason
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-11-02 17:17                 ` Julia Lawall
@ 2022-11-03  0:08                   ` Jason A. Donenfeld
  2022-11-03  6:31                     ` Julia Lawall
  2022-11-03 12:45                     ` Julia Lawall
  0 siblings, 2 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-11-03  0:08 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell

On Wed, Nov 02, 2022 at 06:17:04PM +0100, Julia Lawall wrote:
> 
> 
> On Wed, 26 Oct 2022, Jason A. Donenfeld wrote:
> 
> > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> > > The traditional objdump comparison does work, though. It produces a good
> >
> > Another thing that appears to work well is just using Coccinelle
> > scripts. I've had some success just scrolling through the results of:
> >
> >     @@
> >     char c;
> >     expression E;
> >     @@
> >     (
> >     * E > c
> >     |
> >     * E >= c
> >     |
> >     * E < c
> >     |
> >     * E <= c
> >     )
> >
> > That also triggers on explicitly signed chars, and examining those
> > reveals that quite a bit of code in the tree already does do the right
> > thing, which is good.
> >
> > From looking at this and objdump output, it looks like most naked-char
> > usage that isn't for strings is actually already assuming it's unsigned,
> > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
> > things here and there, but all and all, I don't think this is looking as
> > terrible as I initially feared.
> >
> > I'm CC'ing the Coccinelle people to see if they have any nice ideas on
> > improvements. Specifically, the thing we're trying to identify is:
> >
> >   - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
> >     where:
> 
> Try putting
> 
> disable optional_qualifier
> 
> between the initial @@, to avoid the implicit matching of signed and
> unsigned.

Hmm, this doesn't quite work. Here are my rules:

    @disable optional_qualifier@
    char c;
    expression E;
    @@
    (
    * E > c
    |
    * E >= c
    |
    * E < c
    |
    * E <= c
    )
    
    @disable optional_qualifier@
    char c;
    @@
    * c == -1
    
    @disable optional_qualifier@
    char c;
    @@
    * c = -1

This produces, for example:

diff -u -p ./sound/firewire/bebob/bebob_focusrite.c /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c
--- ./sound/firewire/bebob/bebob_focusrite.c
+++ /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c
@@ -192,7 +192,6 @@ saffirepro_both_clk_src_get(struct snd_b

        /* In a case that this driver cannot handle the value of register. */
        value &= SAFFIREPRO_CLOCK_SOURCE_SELECT_MASK;
-       if (value >= SAFFIREPRO_CLOCK_SOURCE_COUNT || map[value] < 0) {
                err = -EIO;
                goto end;
        }

Except map is defined as:

    const signed char *map;

So this would be one of those cases that I had hoped `disable
optional_qualifier` would exclude. (I think internally coccinelle might
be assuming `char` is signed, by the way.)

> >   - It's not being used for characters; and
> >   - It's doing something that assumes it is signed, such as various
> >     types of comparisons or decrements.
> 
> I took a quick look at the article, but I'm not completely sure what you
> are getting at here.  Could you give some examples of what you do and
> don't want to find?
> 
> You don't want the case where c is 'x', for some x?

Something I would want to find is `if (c < 0)`. Something I wouldn't
want to find is `if (c < '9')`. IOW, I'm looking for code that assumes
`c` is signed, and would become incorrect if `c` suddenly became
unsigned. Most things involving actual characters are fine. But most
things involving signed arithmetic or comparisons with numbers isn't
find.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-11-03  0:08                   ` Jason A. Donenfeld
@ 2022-11-03  6:31                     ` Julia Lawall
  2022-11-03 12:45                     ` Julia Lawall
  1 sibling, 0 replies; 35+ messages in thread
From: Julia Lawall @ 2022-11-03  6:31 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell



On Thu, 3 Nov 2022, Jason A. Donenfeld wrote:

> On Wed, Nov 02, 2022 at 06:17:04PM +0100, Julia Lawall wrote:
> >
> >
> > On Wed, 26 Oct 2022, Jason A. Donenfeld wrote:
> >
> > > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> > > > The traditional objdump comparison does work, though. It produces a good
> > >
> > > Another thing that appears to work well is just using Coccinelle
> > > scripts. I've had some success just scrolling through the results of:
> > >
> > >     @@
> > >     char c;
> > >     expression E;
> > >     @@
> > >     (
> > >     * E > c
> > >     |
> > >     * E >= c
> > >     |
> > >     * E < c
> > >     |
> > >     * E <= c
> > >     )
> > >
> > > That also triggers on explicitly signed chars, and examining those
> > > reveals that quite a bit of code in the tree already does do the right
> > > thing, which is good.
> > >
> > > From looking at this and objdump output, it looks like most naked-char
> > > usage that isn't for strings is actually already assuming it's unsigned,
> > > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
> > > things here and there, but all and all, I don't think this is looking as
> > > terrible as I initially feared.
> > >
> > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on
> > > improvements. Specifically, the thing we're trying to identify is:
> > >
> > >   - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
> > >     where:
> >
> > Try putting
> >
> > disable optional_qualifier
> >
> > between the initial @@, to avoid the implicit matching of signed and
> > unsigned.
>
> Hmm, this doesn't quite work. Here are my rules:
>
>     @disable optional_qualifier@
>     char c;
>     expression E;
>     @@
>     (
>     * E > c
>     |
>     * E >= c
>     |
>     * E < c
>     |
>     * E <= c
>     )
>
>     @disable optional_qualifier@
>     char c;
>     @@
>     * c == -1
>
>     @disable optional_qualifier@
>     char c;
>     @@
>     * c = -1
>
> This produces, for example:
>
> diff -u -p ./sound/firewire/bebob/bebob_focusrite.c /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c
> --- ./sound/firewire/bebob/bebob_focusrite.c
> +++ /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c
> @@ -192,7 +192,6 @@ saffirepro_both_clk_src_get(struct snd_b
>
>         /* In a case that this driver cannot handle the value of register. */
>         value &= SAFFIREPRO_CLOCK_SOURCE_SELECT_MASK;
> -       if (value >= SAFFIREPRO_CLOCK_SOURCE_COUNT || map[value] < 0) {
>                 err = -EIO;
>                 goto end;
>         }
>
> Except map is defined as:
>
>     const signed char *map;
>
> So this would be one of those cases that I had hoped `disable
> optional_qualifier` would exclude. (I think internally coccinelle might
> be assuming `char` is signed, by the way.)

OK, I see the problem.  Coccinelle isn't taking the "disable
optional_qualifier" into account when it checks types on expressions.  It
would work if you put, eg:

char x;
... when any
* x < 0

But that would be much slower and less general.  I will fix it.

>
> > >   - It's not being used for characters; and
> > >   - It's doing something that assumes it is signed, such as various
> > >     types of comparisons or decrements.
> >
> > I took a quick look at the article, but I'm not completely sure what you
> > are getting at here.  Could you give some examples of what you do and
> > don't want to find?
> >
> > You don't want the case where c is 'x', for some x?
>
> Something I would want to find is `if (c < 0)`. Something I wouldn't
> want to find is `if (c < '9')`. IOW, I'm looking for code that assumes
> `c` is signed, and would become incorrect if `c` suddenly became
> unsigned. Most things involving actual characters are fine. But most
> things involving signed arithmetic or comparisons with numbers isn't
> find.

This seems to do what you want:

@disable optional_qualifier@
constant char cc;
expression e;
char c;
@@

(
  c < cc
|
* c < e
)

It highlights only the two return lines in:

int main () {
        char x;
        if (x < 'd')
                return x < 0;
        else return x < y;
}


julia

>
> Jason
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-11-03  0:08                   ` Jason A. Donenfeld
  2022-11-03  6:31                     ` Julia Lawall
@ 2022-11-03 12:45                     ` Julia Lawall
  2022-11-03 12:47                       ` Jason A. Donenfeld
  1 sibling, 1 reply; 35+ messages in thread
From: Julia Lawall @ 2022-11-03 12:45 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Julia Lawall, Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan,
	akpm, linux-kernel, mm-commits, masahiroy, gregkh,
	andriy.shevchenko, Stephen Rothwell



On Thu, 3 Nov 2022, Jason A. Donenfeld wrote:

> On Wed, Nov 02, 2022 at 06:17:04PM +0100, Julia Lawall wrote:
> >
> >
> > On Wed, 26 Oct 2022, Jason A. Donenfeld wrote:
> >
> > > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote:
> > > > The traditional objdump comparison does work, though. It produces a good
> > >
> > > Another thing that appears to work well is just using Coccinelle
> > > scripts. I've had some success just scrolling through the results of:
> > >
> > >     @@
> > >     char c;
> > >     expression E;
> > >     @@
> > >     (
> > >     * E > c
> > >     |
> > >     * E >= c
> > >     |
> > >     * E < c
> > >     |
> > >     * E <= c
> > >     )
> > >
> > > That also triggers on explicitly signed chars, and examining those
> > > reveals that quite a bit of code in the tree already does do the right
> > > thing, which is good.
> > >
> > > From looking at this and objdump output, it looks like most naked-char
> > > usage that isn't for strings is actually already assuming it's unsigned,
> > > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few
> > > things here and there, but all and all, I don't think this is looking as
> > > terrible as I initially feared.
> > >
> > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on
> > > improvements. Specifically, the thing we're trying to identify is:
> > >
> > >   - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier,
> > >     where:
> >
> > Try putting
> >
> > disable optional_qualifier
> >
> > between the initial @@, to avoid the implicit matching of signed and
> > unsigned.
>
> Hmm, this doesn't quite work. Here are my rules:

It should work now.  However, without disable optional_qualifier, char is
still matching signed char.  If you think that should be changed, I can do
that.

julia


>
>     @disable optional_qualifier@
>     char c;
>     expression E;
>     @@
>     (
>     * E > c
>     |
>     * E >= c
>     |
>     * E < c
>     |
>     * E <= c
>     )
>
>     @disable optional_qualifier@
>     char c;
>     @@
>     * c == -1
>
>     @disable optional_qualifier@
>     char c;
>     @@
>     * c = -1
>
> This produces, for example:
>
> diff -u -p ./sound/firewire/bebob/bebob_focusrite.c /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c
> --- ./sound/firewire/bebob/bebob_focusrite.c
> +++ /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c
> @@ -192,7 +192,6 @@ saffirepro_both_clk_src_get(struct snd_b
>
>         /* In a case that this driver cannot handle the value of register. */
>         value &= SAFFIREPRO_CLOCK_SOURCE_SELECT_MASK;
> -       if (value >= SAFFIREPRO_CLOCK_SOURCE_COUNT || map[value] < 0) {
>                 err = -EIO;
>                 goto end;
>         }
>
> Except map is defined as:
>
>     const signed char *map;
>
> So this would be one of those cases that I had hoped `disable
> optional_qualifier` would exclude. (I think internally coccinelle might
> be assuming `char` is signed, by the way.)
>
> > >   - It's not being used for characters; and
> > >   - It's doing something that assumes it is signed, such as various
> > >     types of comparisons or decrements.
> >
> > I took a quick look at the article, but I'm not completely sure what you
> > are getting at here.  Could you give some examples of what you do and
> > don't want to find?
> >
> > You don't want the case where c is 'x', for some x?
>
> Something I would want to find is `if (c < 0)`. Something I wouldn't
> want to find is `if (c < '9')`. IOW, I'm looking for code that assumes
> `c` is signed, and would become incorrect if `c` suddenly became
> unsigned. Most things involving actual characters are fine. But most
> things involving signed arithmetic or comparisons with numbers isn't
> find.
>
> Jason
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-11-03 12:45                     ` Julia Lawall
@ 2022-11-03 12:47                       ` Jason A. Donenfeld
  2022-11-03 12:57                         ` Julia Lawall
  0 siblings, 1 reply; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-11-03 12:47 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell

Hi Julia,

On Thu, Nov 3, 2022 at 1:45 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> It should work now.

Thanks!

> However, without disable optional_qualifier, char is
> still matching signed char.  If you think that should be changed, I can do
> that.

Does `optional_qualifier` disable other things that might be
interesting to have? If so, maybe this is less than ideal? If not,
maybe it doesn't matter?

Though, for what it's worth, gcc treats `char` as a separate type,
even when using `-funsigned-char` or `-fsigned-char`.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-11-03 12:47                       ` Jason A. Donenfeld
@ 2022-11-03 12:57                         ` Julia Lawall
  2022-11-03 14:07                           ` Jason A. Donenfeld
  0 siblings, 1 reply; 35+ messages in thread
From: Julia Lawall @ 2022-11-03 12:57 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell



On Thu, 3 Nov 2022, Jason A. Donenfeld wrote:

> Hi Julia,
>
> On Thu, Nov 3, 2022 at 1:45 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > It should work now.
>
> Thanks!
>
> > However, without disable optional_qualifier, char is
> > still matching signed char.  If you think that should be changed, I can do
> > that.
>
> Does `optional_qualifier` disable other things that might be
> interesting to have? If so, maybe this is less than ideal? If not,
> maybe it doesn't matter?

Optional qualifier only allows a metavariable declared to have a certain
type to match an expression that has the same type with signed, const, or
verbatim in front of it.  Disabling it forces you to write our signed,
const etc explicitly when you want them.  So rules may becomes more
verbose.

julia

>
> Though, for what it's worth, gcc treats `char` as a separate type,
> even when using `-funsigned-char` or `-fsigned-char`.
>
> Jason
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array
  2022-11-03 12:57                         ` Julia Lawall
@ 2022-11-03 14:07                           ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-11-03 14:07 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm,
	linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko,
	Stephen Rothwell

On Thu, Nov 03, 2022 at 01:57:26PM +0100, Julia Lawall wrote:
> 
> 
> On Thu, 3 Nov 2022, Jason A. Donenfeld wrote:
> 
> > Hi Julia,
> >
> > On Thu, Nov 3, 2022 at 1:45 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > It should work now.
> >
> > Thanks!
> >
> > > However, without disable optional_qualifier, char is
> > > still matching signed char.  If you think that should be changed, I can do
> > > that.
> >
> > Does `optional_qualifier` disable other things that might be
> > interesting to have? If so, maybe this is less than ideal? If not,
> > maybe it doesn't matter?
> 
> Optional qualifier only allows a metavariable declared to have a certain
> type to match an expression that has the same type with signed, const, or
> verbatim in front of it.  Disabling it forces you to write our signed,
> const etc explicitly when you want them.  So rules may becomes more
> verbose.

Oh, huh. Maybe best to treat it as a different type then so that's not
required? I was also thinking that it doesn't totally make sense the way
it is now, in that `char` is *NOT* signed on many platforms, such as
arm. In 6.2, it'll be unsigned everywhere, for kernel code. So in the
general case, for coccinelle, it's a bit of a heisentype and so maybe
should be treated as distinct from `signed char` or `unsigned char`.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-11-03 14:07 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-20  0:03 + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Andrew Morton
2022-10-20  9:43 ` Alexey Dobriyan
2022-10-20  9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan
2022-10-20  9:56   ` [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() Alexey Dobriyan
2022-10-20 16:28   ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld
2022-10-20 17:14     ` Linus Torvalds
2022-10-20 17:33       ` Jason A. Donenfeld
2022-10-20 17:42         ` Linus Torvalds
2022-10-20 18:57           ` Kees Cook
2022-10-20 19:39             ` Linus Torvalds
2022-10-20 20:17               ` Linus Torvalds
2022-10-20 21:34                 ` Andy Shevchenko
2022-10-20 22:46                   ` Jason A. Donenfeld
2022-10-21  6:48                 ` Greg KH
2022-10-21  7:24                   ` Jason A. Donenfeld
2022-10-21  7:36                     ` Greg KH
2022-10-26  1:50             ` Jason A. Donenfeld
2022-10-26 12:58               ` Jason A. Donenfeld
2022-10-26 12:58                 ` [cocci] " Jason A. Donenfeld
2022-10-26 13:17                 ` Andy Shevchenko
2022-10-26 13:17                   ` [cocci] " Andy Shevchenko
2022-11-02 17:17                 ` Julia Lawall
2022-11-03  0:08                   ` Jason A. Donenfeld
2022-11-03  6:31                     ` Julia Lawall
2022-11-03 12:45                     ` Julia Lawall
2022-11-03 12:47                       ` Jason A. Donenfeld
2022-11-03 12:57                         ` Julia Lawall
2022-11-03 14:07                           ` Jason A. Donenfeld
2022-10-24 15:44         ` Jason A. Donenfeld
2022-10-21  5:59       ` Alexey Dobriyan
2022-10-21 17:11         ` Linus Torvalds
2022-10-21 17:23           ` Linus Torvalds
2022-10-20 16:24 ` + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Jason A. Donenfeld
2022-10-20 21:12   ` Andrew Morton
2022-10-20 21:13     ` Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.