* Re: + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch [not found] <20221020000356.177CDC433C1@smtp.kernel.org> @ 2022-10-20 9:43 ` Alexey Dobriyan 2022-10-20 9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan 1 sibling, 0 replies; 29+ messages in thread From: Alexey Dobriyan @ 2022-10-20 9:43 UTC (permalink / raw) To: linux-kernel Cc: mm-commits, torvalds, masahiroy, keescook, gregkh, andriy.shevchenko, Jason, akpm On Wed, Oct 19, 2022 at 05:03:55PM -0700, Andrew Morton wrote: > This will break things in some places and fix things in others, so this > will likely cause a bit of churn while reconciling the type misuse. > --- a/Makefile~kbuild-treat-char-as-always-unsigned > +++ a/Makefile > @@ -562,7 +562,7 @@ KBUILD_AFLAGS := -D__ASSEMBLY__ -fno-P > KBUILD_CFLAGS := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \ > -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE \ > -Werror=implicit-function-declaration -Werror=implicit-int \ > - -Werror=return-type -Wno-format-security \ > + -Werror=return-type -Wno-format-security -funsigned-char \ > -std=gnu11 > KBUILD_CPPFLAGS := -D__KERNEL__ > KBUILD_RUSTFLAGS := $(rust_common_flags) \ ACK Another reason is that characters were always some small non-negative integers which mapped to pictures so making them signed was silly from the beginning. People should use "const char *" for C strings and "u8[]" for raw buffers. Unfortunately, C can't give developer more. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array [not found] <20221020000356.177CDC433C1@smtp.kernel.org> 2022-10-20 9:43 ` + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Alexey Dobriyan @ 2022-10-20 9:49 ` Alexey Dobriyan 2022-10-20 9:56 ` [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() Alexey Dobriyan 2022-10-20 16:28 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld 1 sibling, 2 replies; 29+ messages in thread From: Alexey Dobriyan @ 2022-10-20 9:49 UTC (permalink / raw) To: akpm, linux-kernel Cc: mm-commits, torvalds, masahiroy, keescook, gregkh, andriy.shevchenko, Jason, akpm struct p4_event_bind::cntr[][] should be signed because of the following code: int i, j; for (i = 0; i < P4_CNTR_LIMIT; i++) { ===> j = bind->cntr[thread][i]; if (j != -1 && !test_bit(j, used_mask)) return j; } Making this member unsigned will make "j" 255 and fail "j != -1" comparison. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> --- arch/x86/events/intel/p4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/events/intel/p4.c +++ b/arch/x86/events/intel/p4.c @@ -24,7 +24,7 @@ struct p4_event_bind { unsigned int escr_msr[2]; /* ESCR MSR for this event */ unsigned int escr_emask; /* valid ESCR EventMask bits */ unsigned int shared; /* event is shared across threads */ - char cntr[2][P4_CNTR_LIMIT]; /* counter index (offset), -1 on absence */ + signed char cntr[2][P4_CNTR_LIMIT]; /* counter index (offset), -1 on absence */ }; struct p4_pebs_bind { ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() 2022-10-20 9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan @ 2022-10-20 9:56 ` Alexey Dobriyan 2022-10-20 16:28 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld 1 sibling, 0 replies; 29+ messages in thread From: Alexey Dobriyan @ 2022-10-20 9:56 UTC (permalink / raw) To: akpm; +Cc: viro, linux-fsdevel, linux-kernel Cast to unsigned int doesn't do anything because two comparisons are a) for equality, and b) both '/' and '\0' have non-negative values. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> --- fs/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2657,7 +2657,7 @@ static int lookup_one_common(struct user_namespace *mnt_userns, } while (len--) { - unsigned int c = *(const unsigned char *)name++; + char c = *name++; if (c == '/' || c == '\0') return -EACCES; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan 2022-10-20 9:56 ` [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() Alexey Dobriyan @ 2022-10-20 16:28 ` Jason A. Donenfeld 2022-10-20 17:14 ` Linus Torvalds 1 sibling, 1 reply; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-20 16:28 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, linux-kernel, mm-commits, torvalds, masahiroy, keescook, gregkh, andriy.shevchenko On Thu, Oct 20, 2022 at 3:49 AM Alexey Dobriyan <adobriyan@gmail.com> wrote: > > struct p4_event_bind::cntr[][] should be signed because of > the following code: > > int i, j; > for (i = 0; i < P4_CNTR_LIMIT; i++) { > ===> j = bind->cntr[thread][i]; > if (j != -1 && !test_bit(j, used_mask)) > return j; > } > > Making this member unsigned will make "j" 255 and fail "j != -1" > comparison. > > Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Nice catch. Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> > arch/x86/events/intel/p4.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/arch/x86/events/intel/p4.c > +++ b/arch/x86/events/intel/p4.c > @@ -24,7 +24,7 @@ struct p4_event_bind { > unsigned int escr_msr[2]; /* ESCR MSR for this event */ > unsigned int escr_emask; /* valid ESCR EventMask bits */ > unsigned int shared; /* event is shared across threads */ > - char cntr[2][P4_CNTR_LIMIT]; /* counter index (offset), -1 on absence */ > + signed char cntr[2][P4_CNTR_LIMIT]; /* counter index (offset), -1 on absence */ > }; This is fine, but I wonder if this is a good occasion to use `s8` instead? Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 16:28 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld @ 2022-10-20 17:14 ` Linus Torvalds 2022-10-20 17:33 ` Jason A. Donenfeld 2022-10-21 5:59 ` Alexey Dobriyan 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2022-10-20 17:14 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko On Thu, Oct 20, 2022 at 9:28 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Nice catch. > > Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Can we please try to collect these all in one place? I see that Andrew picked up the original one for -mm, but I think it would be better if we had one specific place for all of this (one branch) to collect it all. I'm actually trying to do a "make allyesconfig" build on x86-64 with both signed and unsigned char, and trying to see if I can script something sane to show differences. Doing the build is easy, but the differences end up being huge just due to silly constants (ie the whole "one small difference ends up changing layout, which then causes hundreds of megs of diff just due to hex constants in the disassembly changing". I think the problem is that I tried to do the vmlinux file after linking to make it easier. Doing the individual object files separately would probably have been a better idea, just to avoid this kind of relocation offset issues. There's not a huge amount of pull requests (the week after -rc1 tends to be quiet for me), so I'll continue to waste my time on this. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 17:14 ` Linus Torvalds @ 2022-10-20 17:33 ` Jason A. Donenfeld 2022-10-20 17:42 ` Linus Torvalds 2022-10-24 15:44 ` Jason A. Donenfeld 2022-10-21 5:59 ` Alexey Dobriyan 1 sibling, 2 replies; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-20 17:33 UTC (permalink / raw) To: Linus Torvalds Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko, Stephen Rothwell Hi Linus, On Thu, Oct 20, 2022 at 11:15 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > Can we please try to collect these all in one place? > > I see that Andrew picked up the original one for -mm, but I think it > would be better if we had one specific place for all of this (one > branch) to collect it all. Sure. Andrew can drop it from -mm, and I'll collect everything in: https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=unsigned-char&r And I'll ask Stephen to add that branch to -next. > I'm actually trying to do a "make allyesconfig" build on x86-64 with > both signed and unsigned char, and trying to see if I can script > something sane to show differences. > > Doing the build is easy, but the differences end up being huge just > due to silly constants (ie the whole "one small difference ends up > changing layout, which then causes hundreds of megs of diff just due > to hex constants in the disassembly changing". If you can get a copy of IDA Pro, diaphora is quite nice: https://github.com/joxeankoret/diaphora Or sometimes with objdump, I've had more success by keeping debug symbols, and then trimming offsets from jmps. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 17:33 ` Jason A. Donenfeld @ 2022-10-20 17:42 ` Linus Torvalds 2022-10-20 18:57 ` Kees Cook 2022-10-24 15:44 ` Jason A. Donenfeld 1 sibling, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2022-10-20 17:42 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 10:33 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Or sometimes with objdump, I've had more success by keeping debug > symbols, and then trimming offsets from jmps. objdump is what I'm using, and it actually seems ok on individual object files. Now I just need to script the "do all the object files" and see how massive the end result is. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 17:42 ` Linus Torvalds @ 2022-10-20 18:57 ` Kees Cook 2022-10-20 19:39 ` Linus Torvalds 2022-10-26 1:50 ` Jason A. Donenfeld 0 siblings, 2 replies; 29+ messages in thread From: Kees Cook @ 2022-10-20 18:57 UTC (permalink / raw) To: Linus Torvalds Cc: Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 10:42:25AM -0700, Linus Torvalds wrote: > On Thu, Oct 20, 2022 at 10:33 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > Or sometimes with objdump, I've had more success by keeping debug > > symbols, and then trimming offsets from jmps. > > objdump is what I'm using, and it actually seems ok on individual object files. > > Now I just need to script the "do all the object files" and see how > massive the end result is. For the a/b build, I start with all*config, then: # Stop painful noise CONFIG_KCOV=n CONFIG_GCOV_KERNEL=n CONFIG_GCC_PLUGINS=n CONFIG_IKHEADERS=n CONFIG_KASAN=n CONFIG_UBSAN=n CONFIG_KCSAN=n CONFIG_KMSAN=n # Get us source/line details CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_DEBUG_INFO_REDUCED=n CONFIG_DEBUG_INFO_COMPRESSED=n CONFIG_DEBUG_INFO_SPLIT=n And to keep other build-time junk stabilized[1], I build with these make options: KBUILD_BUILD_TIMESTAMP=1970-01-01 KBUILD_BUILD_USER=user KBUILD_BUILD_HOST=host KBUILD_BUILD_VERSION=1 For the code diff, I use: objdump --disassemble --demangle --no-show-raw-insn --no-addresses and when doing the manual examination: objdump --disassemble --demangle --reloc --source -l --no-show-raw-insn My not-great way to filter out the movsbl/movzbl, I added this to diff: -I '\bmov[sz]bl\b' -- Kees Cook ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 18:57 ` Kees Cook @ 2022-10-20 19:39 ` Linus Torvalds 2022-10-20 20:17 ` Linus Torvalds 2022-10-26 1:50 ` Jason A. Donenfeld 1 sibling, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2022-10-20 19:39 UTC (permalink / raw) To: Kees Cook Cc: Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 11:57 AM Kees Cook <keescook@chromium.org> wrote: > > For the a/b build, I start with all*config, then: Yes, I have that part all figured out. > For the code diff, I use: > > objdump --disassemble --demangle --no-show-raw-insn --no-addresses This part I still hate. Have you figured out any way to get objdump to actually show the relocations in-place in the assembly? Ie, instead of call <will_become_orphaned_pgrp+0xbf> R_X86_64_PLT32 debug_lockdep_rcu_enabled-0x4 just show it as call debug_lockdep_rcu_enabled to make the diff - when it exists - hugely more legible? Because now any code changes will not just show the code changes, but end up showing a lot of silly changes because the "+0xbf" changes. I guess I'll just have to remove all of those hex constants anyway, because they also show up for any jumps inside the functions. I also explored trying to compare just the generates *.s files, but that has its own set of problems, notably with gcc label numbering. Plus they are harder to generate for the full tree with our standard build rules (maybe there's some trick I haven't thought of to make gcc keep the '*.s' files as it generates the '*.o' ones). I do have something that "works", but it turns out to be very noisy, because while gcc *often* generates almost identical code, then when it doesn't it can be quite nasty. When there is a *real* difference, having a nasty diff is fine. For example, the arch/x86/events/intel/p4.c issue that Alexey found generates huge differences, because gcc can just see that "ok, that's never negative", and generates completely different code. That's good. But when there's some small change that just changes the offset, it's just annoying, even with --no-addresses. The hex numbers can be edited out, but then you have the nop padding changes etc etc. So getting rid of that kind of pointless noise is just about all the effort here. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 19:39 ` Linus Torvalds @ 2022-10-20 20:17 ` Linus Torvalds 2022-10-20 21:34 ` Andy Shevchenko 2022-10-21 6:48 ` Greg KH 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2022-10-20 20:17 UTC (permalink / raw) To: Kees Cook Cc: Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 12:39 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So getting rid of that kind of pointless noise is just about all the > effort here. Current status: of 22.5k object files, 971 have differences. In many cases, the differences are small and trivial. Example: - fscrypt_show_test_dummy_encryption() does that same "print a char with %c" seq_printf(seq, "%ctest_dummy_encryption=v%d", sep, vers); which is entirely harmless and exactly the same as that (but I most certainly haven't figured out how to automatically script away that "oh, %c is fine). And in other cases, there's no actual difference at all, just different register usage, so the diff looks fairly big, but doesn't seem to be real. In one case I looked at, it started with a 'movzbl', but it was that in both cases, because the type was actually 'unsigned char' to begin with. But for some reason it just used different registers. Example: - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c The reason here *seems* to be that char *buf; buf = (char *)urb->transfer_buffer; where it really probably should be 'u8 *buf', since it actually does a cast to 'u8' in one place, but there isn't even any read of that 'buf' pointer. So the difference seems to be entirely just some "different type in assignment" cast internal to gcc that then incidentally generated a random other choice in register allocation. And in some cases the differences are enormous: - drivers/net/wireless/ralink/rt2x00/rt2800lib.c generates a 220kB diff which seems to be due to entirely different inlining decisions or something, and the differences are so enormous that I didn't even start looking at the cause. There's a fair number of things in between, like fs/ext4/super.c that generates a lot of differences, some of them obvious, some of them very much not obvious that may br similar to the handle_control_request() ones. *Presumably* the ext4 super.c code is fine (since it has been used on architectures that already had unsigned char), but it actually generates a bigger diff than the p4 events driver does... And that arch/x86/events/intel/p4.c thing that Alexey found sadly does not stand out at all in that "somewhere in the middle" bunch. So I think my "hey, we can automate the comparison" is pretty much a dud, but I'm not giving up quite yet. It's annoying how *most* of the kernel files show no differences at all, but then I can't even figure our why other files do show differences with no obvious reason for them at all. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 20:17 ` Linus Torvalds @ 2022-10-20 21:34 ` Andy Shevchenko 2022-10-20 22:46 ` Jason A. Donenfeld 2022-10-21 6:48 ` Greg KH 1 sibling, 1 reply; 29+ messages in thread From: Andy Shevchenko @ 2022-10-20 21:34 UTC (permalink / raw) To: Linus Torvalds Cc: Kees Cook, Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, Stephen Rothwell On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote: > On Thu, Oct 20, 2022 at 12:39 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: ... > And in some cases the differences are enormous: > > - drivers/net/wireless/ralink/rt2x00/rt2800lib.c generates a 220kB diff > > which seems to be due to entirely different inlining decisions or > something, and the differences are so enormous that I didn't even > start looking at the cause. This one is what we start the epopee from. I think Jason handled it in his last patch against this certain driver. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 21:34 ` Andy Shevchenko @ 2022-10-20 22:46 ` Jason A. Donenfeld 0 siblings, 0 replies; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-20 22:46 UTC (permalink / raw) To: Andy Shevchenko Cc: Linus Torvalds, Kees Cook, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, Stephen Rothwell On Fri, Oct 21, 2022 at 12:34:37AM +0300, Andy Shevchenko wrote: > On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote: > > On Thu, Oct 20, 2022 at 12:39 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > ... > > > And in some cases the differences are enormous: > > > > - drivers/net/wireless/ralink/rt2x00/rt2800lib.c generates a 220kB diff > > > > which seems to be due to entirely different inlining decisions or > > something, and the differences are so enormous that I didn't even > > start looking at the cause. > > This one is what we start the epopee from. I think Jason handled it in his last > patch against this certain driver. Right, and Kale is taking it for 6.1, because it fixes existing breakage on ARM. But it's not broken on x86 with -funsigned-char. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 20:17 ` Linus Torvalds 2022-10-20 21:34 ` Andy Shevchenko @ 2022-10-21 6:48 ` Greg KH 2022-10-21 7:24 ` Jason A. Donenfeld 1 sibling, 1 reply; 29+ messages in thread From: Greg KH @ 2022-10-21 6:48 UTC (permalink / raw) To: Linus Torvalds Cc: Kees Cook, Jason A. Donenfeld, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote: > And in other cases, there's no actual difference at all, just > different register usage, so the diff looks fairly big, but doesn't > seem to be real. In one case I looked at, it started with a 'movzbl', > but it was that in both cases, because the type was actually 'unsigned > char' to begin with. But for some reason it just used different > registers. Example: > > - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c > > The reason here *seems* to be that > > char *buf; > buf = (char *)urb->transfer_buffer; > > where it really probably should be 'u8 *buf', since it actually > does a cast to 'u8' in one place, but there isn't even any read of > that 'buf' pointer. So the difference seems to be entirely just some > "different type in assignment" cast internal to gcc that then > incidentally generated a random other choice in register allocation. I've send a patch for this now: https://lore.kernel.org/r/20221021064453.3341050-1-gregkh@linuxfoundation.org and will take it through the USB tree, unless Jason wants to grab it through his tree. thanks, greg k-h ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-21 6:48 ` Greg KH @ 2022-10-21 7:24 ` Jason A. Donenfeld 2022-10-21 7:36 ` Greg KH 0 siblings, 1 reply; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-21 7:24 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Kees Cook, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, andriy.shevchenko, Stephen Rothwell Hi Greg, On Fri, Oct 21, 2022 at 2:48 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote: > > And in other cases, there's no actual difference at all, just > > different register usage, so the diff looks fairly big, but doesn't > > seem to be real. In one case I looked at, it started with a 'movzbl', > > but it was that in both cases, because the type was actually 'unsigned > > char' to begin with. But for some reason it just used different > > registers. Example: > > > > - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c > > > > The reason here *seems* to be that > > > > char *buf; > > buf = (char *)urb->transfer_buffer; > > > > where it really probably should be 'u8 *buf', since it actually > > does a cast to 'u8' in one place, but there isn't even any read of > > that 'buf' pointer. So the difference seems to be entirely just some > > "different type in assignment" cast internal to gcc that then > > incidentally generated a random other choice in register allocation. > > I've send a patch for this now: > https://lore.kernel.org/r/20221021064453.3341050-1-gregkh@linuxfoundation.org > and will take it through the USB tree, unless Jason wants to grab it > through his tree. This doesn't appear to have any actual effect, but just changes gcc's register allocation unexpectedly. So feel free to take it, as it doesn't seem like it's "one of those bad cases" that I'm keeping track of. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-21 7:24 ` Jason A. Donenfeld @ 2022-10-21 7:36 ` Greg KH 0 siblings, 0 replies; 29+ messages in thread From: Greg KH @ 2022-10-21 7:36 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Linus Torvalds, Kees Cook, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, andriy.shevchenko, Stephen Rothwell On Fri, Oct 21, 2022 at 03:24:27AM -0400, Jason A. Donenfeld wrote: > Hi Greg, > > On Fri, Oct 21, 2022 at 2:48 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Thu, Oct 20, 2022 at 01:17:33PM -0700, Linus Torvalds wrote: > > > And in other cases, there's no actual difference at all, just > > > different register usage, so the diff looks fairly big, but doesn't > > > seem to be real. In one case I looked at, it started with a 'movzbl', > > > but it was that in both cases, because the type was actually 'unsigned > > > char' to begin with. But for some reason it just used different > > > registers. Example: > > > > > > - handle_control_request() in drivers/usb/gadget/udc/dummy_hcd.c > > > > > > The reason here *seems* to be that > > > > > > char *buf; > > > buf = (char *)urb->transfer_buffer; > > > > > > where it really probably should be 'u8 *buf', since it actually > > > does a cast to 'u8' in one place, but there isn't even any read of > > > that 'buf' pointer. So the difference seems to be entirely just some > > > "different type in assignment" cast internal to gcc that then > > > incidentally generated a random other choice in register allocation. > > > > I've send a patch for this now: > > https://lore.kernel.org/r/20221021064453.3341050-1-gregkh@linuxfoundation.org > > and will take it through the USB tree, unless Jason wants to grab it > > through his tree. > > This doesn't appear to have any actual effect, but just changes gcc's > register allocation unexpectedly. So feel free to take it, as it > doesn't seem like it's "one of those bad cases" that I'm keeping track > of. Great, will take it through my tree, thanks! greg k-h ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 18:57 ` Kees Cook 2022-10-20 19:39 ` Linus Torvalds @ 2022-10-26 1:50 ` Jason A. Donenfeld 2022-10-26 12:58 ` Jason A. Donenfeld 1 sibling, 1 reply; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-26 1:50 UTC (permalink / raw) To: Kees Cook Cc: Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 11:57:12AM -0700, Kees Cook wrote: > On Thu, Oct 20, 2022 at 10:42:25AM -0700, Linus Torvalds wrote: > > On Thu, Oct 20, 2022 at 10:33 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > > > Or sometimes with objdump, I've had more success by keeping debug > > > symbols, and then trimming offsets from jmps. > > > > objdump is what I'm using, and it actually seems ok on individual object files. > > > > Now I just need to script the "do all the object files" and see how > > massive the end result is. > > For the a/b build, I start with all*config, then: > > # Stop painful noise > CONFIG_KCOV=n > CONFIG_GCOV_KERNEL=n > CONFIG_GCC_PLUGINS=n > CONFIG_IKHEADERS=n > CONFIG_KASAN=n > CONFIG_UBSAN=n > CONFIG_KCSAN=n > CONFIG_KMSAN=n > # Get us source/line details > CONFIG_DEBUG_KERNEL=y > CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y > CONFIG_DEBUG_INFO_REDUCED=n CONFIG_DEBUG_INFO_COMPRESSED=n > CONFIG_DEBUG_INFO_SPLIT=n > > And to keep other build-time junk stabilized[1], I build with these make > options: > > KBUILD_BUILD_TIMESTAMP=1970-01-01 > KBUILD_BUILD_USER=user > KBUILD_BUILD_HOST=host > KBUILD_BUILD_VERSION=1 The LLVM `.ll` file thing I tried turned out to be a disaster. Too much noise, as this is too early of a stage. The traditional objdump comparison does work, though. It produces a good amount of noise, but still yields a manageable amount of diffs -- 882 -- which can then be paired down more with heuristics. I've been using this script below to compare `linux-schar/` with `linux-uchar/`, which creates a directory `linux-schar-uchar/` filled with color diffs that I can then flip through using `less -R linux-schar-uchar/*.diff`. Seems to work okay, so I'll post it here in case others are curious about looking through these. Jason ------8<-------------------------------- #!/bin/bash asm_diff() { objdump \ --disassemble \ --demangle \ --no-show-raw-insn \ --no-addresses \ --section=.text \ --disassembler-options=intel \ "$1" | \ sed \ -e 's/[0-9a-f]\+ \(<[a-zA-Z0-9_+-]\+>\)/?? \1/g' \ -e 's/<\([a-zA-Z0-9_-]\+\)+0x[a-f0-9]\+>/<\1>/g' \ -e '/\/[a-zA-Z0-9._-]\+\.o:/d' } A=linux-schar B=linux-uchar C=linux-schar-uchar rm -rf "$C" mkdir -p "$C" while read -r obj_a; do obj_b="$B/${obj_a#$A/}" diff_c="${obj_a#$A/}" diff_c="$C/${diff_c//\//--}.diff" [[ -f $obj_b ]] || { echo "ERROR: $obj_b is missing" >&2; exit 1; } echo "${obj_a#$A/}" >&2 diff --color=always --text --unified=10 \ <(asm_diff "$obj_a") <(asm_diff "$obj_b") > "$diff_c" && \ rm -f "$diff_c" done < <(exec find "$A" -name '*.o') ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-26 1:50 ` Jason A. Donenfeld @ 2022-10-26 12:58 ` Jason A. Donenfeld 2022-10-26 13:17 ` Andy Shevchenko 2022-11-02 17:17 ` [cocci] " Julia Lawall 0 siblings, 2 replies; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-26 12:58 UTC (permalink / raw) To: Kees Cook, cocci Cc: Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote: > The traditional objdump comparison does work, though. It produces a good Another thing that appears to work well is just using Coccinelle scripts. I've had some success just scrolling through the results of: @@ char c; expression E; @@ ( * E > c | * E >= c | * E < c | * E <= c ) That also triggers on explicitly signed chars, and examining those reveals that quite a bit of code in the tree already does do the right thing, which is good. From looking at this and objdump output, it looks like most naked-char usage that isn't for strings is actually already assuming it's unsigned, using it as a byte. I'll continue to churn, and I'm sure I'll miss a few things here and there, but all and all, I don't think this is looking as terrible as I initially feared. I'm CC'ing the Coccinelle people to see if they have any nice ideas on improvements. Specifically, the thing we're trying to identify is: - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier, where: - It's not being used for characters; and - It's doing something that assumes it is signed, such as various types of comparisons or decrements. LWN wrote a summary of the general problem, in case that helps describe what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/ Any nice Cocci tricks for this? Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-26 12:58 ` Jason A. Donenfeld @ 2022-10-26 13:17 ` Andy Shevchenko 2022-11-02 17:17 ` [cocci] " Julia Lawall 1 sibling, 0 replies; 29+ messages in thread From: Andy Shevchenko @ 2022-10-26 13:17 UTC (permalink / raw) To: Jason A. Donenfeld, Rasmus Villemoes Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, Stephen Rothwell +Cc: Rasmus as he has done a lot regarding library stuff and optimizations and he knows Coccinelle (to some extent as far as I can tell). On Wed, Oct 26, 2022 at 02:58:34PM +0200, Jason A. Donenfeld wrote: > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote: > > The traditional objdump comparison does work, though. It produces a good > > Another thing that appears to work well is just using Coccinelle > scripts. I've had some success just scrolling through the results of: > > @@ > char c; > expression E; > @@ > ( > * E > c > | > * E >= c > | > * E < c > | > * E <= c > ) > > That also triggers on explicitly signed chars, and examining those > reveals that quite a bit of code in the tree already does do the right > thing, which is good. > > From looking at this and objdump output, it looks like most naked-char > usage that isn't for strings is actually already assuming it's unsigned, > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few > things here and there, but all and all, I don't think this is looking as > terrible as I initially feared. > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on > improvements. Specifically, the thing we're trying to identify is: > > - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier, > where: > - It's not being used for characters; and > - It's doing something that assumes it is signed, such as various > types of comparisons or decrements. > > LWN wrote a summary of the general problem, in case that helps describe > what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/ > > Any nice Cocci tricks for this? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-26 12:58 ` Jason A. Donenfeld 2022-10-26 13:17 ` Andy Shevchenko @ 2022-11-02 17:17 ` Julia Lawall 2022-11-03 0:08 ` Jason A. Donenfeld 1 sibling, 1 reply; 29+ messages in thread From: Julia Lawall @ 2022-11-02 17:17 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Wed, 26 Oct 2022, Jason A. Donenfeld wrote: > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote: > > The traditional objdump comparison does work, though. It produces a good > > Another thing that appears to work well is just using Coccinelle > scripts. I've had some success just scrolling through the results of: > > @@ > char c; > expression E; > @@ > ( > * E > c > | > * E >= c > | > * E < c > | > * E <= c > ) > > That also triggers on explicitly signed chars, and examining those > reveals that quite a bit of code in the tree already does do the right > thing, which is good. > > From looking at this and objdump output, it looks like most naked-char > usage that isn't for strings is actually already assuming it's unsigned, > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few > things here and there, but all and all, I don't think this is looking as > terrible as I initially feared. > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on > improvements. Specifically, the thing we're trying to identify is: > > - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier, > where: Try putting disable optional_qualifier between the initial @@, to avoid the implicit matching of signed and unsigned. > - It's not being used for characters; and > - It's doing something that assumes it is signed, such as various > types of comparisons or decrements. I took a quick look at the article, but I'm not completely sure what you are getting at here. Could you give some examples of what you do and don't want to find? You don't want the case where c is 'x', for some x? julia > LWN wrote a summary of the general problem, in case that helps describe > what would be useful: https://lwn.net/SubscriberLink/911914/f90c2ed1af23cbc4/ > > Any nice Cocci tricks for this? > > Jason > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-11-02 17:17 ` [cocci] " Julia Lawall @ 2022-11-03 0:08 ` Jason A. Donenfeld 2022-11-03 6:31 ` Julia Lawall 2022-11-03 12:45 ` Julia Lawall 0 siblings, 2 replies; 29+ messages in thread From: Jason A. Donenfeld @ 2022-11-03 0:08 UTC (permalink / raw) To: Julia Lawall Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Wed, Nov 02, 2022 at 06:17:04PM +0100, Julia Lawall wrote: > > > On Wed, 26 Oct 2022, Jason A. Donenfeld wrote: > > > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote: > > > The traditional objdump comparison does work, though. It produces a good > > > > Another thing that appears to work well is just using Coccinelle > > scripts. I've had some success just scrolling through the results of: > > > > @@ > > char c; > > expression E; > > @@ > > ( > > * E > c > > | > > * E >= c > > | > > * E < c > > | > > * E <= c > > ) > > > > That also triggers on explicitly signed chars, and examining those > > reveals that quite a bit of code in the tree already does do the right > > thing, which is good. > > > > From looking at this and objdump output, it looks like most naked-char > > usage that isn't for strings is actually already assuming it's unsigned, > > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few > > things here and there, but all and all, I don't think this is looking as > > terrible as I initially feared. > > > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on > > improvements. Specifically, the thing we're trying to identify is: > > > > - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier, > > where: > > Try putting > > disable optional_qualifier > > between the initial @@, to avoid the implicit matching of signed and > unsigned. Hmm, this doesn't quite work. Here are my rules: @disable optional_qualifier@ char c; expression E; @@ ( * E > c | * E >= c | * E < c | * E <= c ) @disable optional_qualifier@ char c; @@ * c == -1 @disable optional_qualifier@ char c; @@ * c = -1 This produces, for example: diff -u -p ./sound/firewire/bebob/bebob_focusrite.c /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c --- ./sound/firewire/bebob/bebob_focusrite.c +++ /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c @@ -192,7 +192,6 @@ saffirepro_both_clk_src_get(struct snd_b /* In a case that this driver cannot handle the value of register. */ value &= SAFFIREPRO_CLOCK_SOURCE_SELECT_MASK; - if (value >= SAFFIREPRO_CLOCK_SOURCE_COUNT || map[value] < 0) { err = -EIO; goto end; } Except map is defined as: const signed char *map; So this would be one of those cases that I had hoped `disable optional_qualifier` would exclude. (I think internally coccinelle might be assuming `char` is signed, by the way.) > > - It's not being used for characters; and > > - It's doing something that assumes it is signed, such as various > > types of comparisons or decrements. > > I took a quick look at the article, but I'm not completely sure what you > are getting at here. Could you give some examples of what you do and > don't want to find? > > You don't want the case where c is 'x', for some x? Something I would want to find is `if (c < 0)`. Something I wouldn't want to find is `if (c < '9')`. IOW, I'm looking for code that assumes `c` is signed, and would become incorrect if `c` suddenly became unsigned. Most things involving actual characters are fine. But most things involving signed arithmetic or comparisons with numbers isn't find. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-11-03 0:08 ` Jason A. Donenfeld @ 2022-11-03 6:31 ` Julia Lawall 2022-11-03 12:45 ` Julia Lawall 1 sibling, 0 replies; 29+ messages in thread From: Julia Lawall @ 2022-11-03 6:31 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, 3 Nov 2022, Jason A. Donenfeld wrote: > On Wed, Nov 02, 2022 at 06:17:04PM +0100, Julia Lawall wrote: > > > > > > On Wed, 26 Oct 2022, Jason A. Donenfeld wrote: > > > > > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote: > > > > The traditional objdump comparison does work, though. It produces a good > > > > > > Another thing that appears to work well is just using Coccinelle > > > scripts. I've had some success just scrolling through the results of: > > > > > > @@ > > > char c; > > > expression E; > > > @@ > > > ( > > > * E > c > > > | > > > * E >= c > > > | > > > * E < c > > > | > > > * E <= c > > > ) > > > > > > That also triggers on explicitly signed chars, and examining those > > > reveals that quite a bit of code in the tree already does do the right > > > thing, which is good. > > > > > > From looking at this and objdump output, it looks like most naked-char > > > usage that isn't for strings is actually already assuming it's unsigned, > > > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few > > > things here and there, but all and all, I don't think this is looking as > > > terrible as I initially feared. > > > > > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on > > > improvements. Specifically, the thing we're trying to identify is: > > > > > > - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier, > > > where: > > > > Try putting > > > > disable optional_qualifier > > > > between the initial @@, to avoid the implicit matching of signed and > > unsigned. > > Hmm, this doesn't quite work. Here are my rules: > > @disable optional_qualifier@ > char c; > expression E; > @@ > ( > * E > c > | > * E >= c > | > * E < c > | > * E <= c > ) > > @disable optional_qualifier@ > char c; > @@ > * c == -1 > > @disable optional_qualifier@ > char c; > @@ > * c = -1 > > This produces, for example: > > diff -u -p ./sound/firewire/bebob/bebob_focusrite.c /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c > --- ./sound/firewire/bebob/bebob_focusrite.c > +++ /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c > @@ -192,7 +192,6 @@ saffirepro_both_clk_src_get(struct snd_b > > /* In a case that this driver cannot handle the value of register. */ > value &= SAFFIREPRO_CLOCK_SOURCE_SELECT_MASK; > - if (value >= SAFFIREPRO_CLOCK_SOURCE_COUNT || map[value] < 0) { > err = -EIO; > goto end; > } > > Except map is defined as: > > const signed char *map; > > So this would be one of those cases that I had hoped `disable > optional_qualifier` would exclude. (I think internally coccinelle might > be assuming `char` is signed, by the way.) OK, I see the problem. Coccinelle isn't taking the "disable optional_qualifier" into account when it checks types on expressions. It would work if you put, eg: char x; ... when any * x < 0 But that would be much slower and less general. I will fix it. > > > > - It's not being used for characters; and > > > - It's doing something that assumes it is signed, such as various > > > types of comparisons or decrements. > > > > I took a quick look at the article, but I'm not completely sure what you > > are getting at here. Could you give some examples of what you do and > > don't want to find? > > > > You don't want the case where c is 'x', for some x? > > Something I would want to find is `if (c < 0)`. Something I wouldn't > want to find is `if (c < '9')`. IOW, I'm looking for code that assumes > `c` is signed, and would become incorrect if `c` suddenly became > unsigned. Most things involving actual characters are fine. But most > things involving signed arithmetic or comparisons with numbers isn't > find. This seems to do what you want: @disable optional_qualifier@ constant char cc; expression e; char c; @@ ( c < cc | * c < e ) It highlights only the two return lines in: int main () { char x; if (x < 'd') return x < 0; else return x < y; } julia > > Jason > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-11-03 0:08 ` Jason A. Donenfeld 2022-11-03 6:31 ` Julia Lawall @ 2022-11-03 12:45 ` Julia Lawall 2022-11-03 12:47 ` Jason A. Donenfeld 1 sibling, 1 reply; 29+ messages in thread From: Julia Lawall @ 2022-11-03 12:45 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Julia Lawall, Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, 3 Nov 2022, Jason A. Donenfeld wrote: > On Wed, Nov 02, 2022 at 06:17:04PM +0100, Julia Lawall wrote: > > > > > > On Wed, 26 Oct 2022, Jason A. Donenfeld wrote: > > > > > On Wed, Oct 26, 2022 at 03:50:25AM +0200, Jason A. Donenfeld wrote: > > > > The traditional objdump comparison does work, though. It produces a good > > > > > > Another thing that appears to work well is just using Coccinelle > > > scripts. I've had some success just scrolling through the results of: > > > > > > @@ > > > char c; > > > expression E; > > > @@ > > > ( > > > * E > c > > > | > > > * E >= c > > > | > > > * E < c > > > | > > > * E <= c > > > ) > > > > > > That also triggers on explicitly signed chars, and examining those > > > reveals that quite a bit of code in the tree already does do the right > > > thing, which is good. > > > > > > From looking at this and objdump output, it looks like most naked-char > > > usage that isn't for strings is actually already assuming it's unsigned, > > > using it as a byte. I'll continue to churn, and I'm sure I'll miss a few > > > things here and there, but all and all, I don't think this is looking as > > > terrible as I initially feared. > > > > > > I'm CC'ing the Coccinelle people to see if they have any nice ideas on > > > improvements. Specifically, the thing we're trying to identify is: > > > > > > - Usage of vanilla `char`, without a `signed` or `unsigned` qualifier, > > > where: > > > > Try putting > > > > disable optional_qualifier > > > > between the initial @@, to avoid the implicit matching of signed and > > unsigned. > > Hmm, this doesn't quite work. Here are my rules: It should work now. However, without disable optional_qualifier, char is still matching signed char. If you think that should be changed, I can do that. julia > > @disable optional_qualifier@ > char c; > expression E; > @@ > ( > * E > c > | > * E >= c > | > * E < c > | > * E <= c > ) > > @disable optional_qualifier@ > char c; > @@ > * c == -1 > > @disable optional_qualifier@ > char c; > @@ > * c = -1 > > This produces, for example: > > diff -u -p ./sound/firewire/bebob/bebob_focusrite.c /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c > --- ./sound/firewire/bebob/bebob_focusrite.c > +++ /tmp/nothing/sound/firewire/bebob/bebob_focusrite.c > @@ -192,7 +192,6 @@ saffirepro_both_clk_src_get(struct snd_b > > /* In a case that this driver cannot handle the value of register. */ > value &= SAFFIREPRO_CLOCK_SOURCE_SELECT_MASK; > - if (value >= SAFFIREPRO_CLOCK_SOURCE_COUNT || map[value] < 0) { > err = -EIO; > goto end; > } > > Except map is defined as: > > const signed char *map; > > So this would be one of those cases that I had hoped `disable > optional_qualifier` would exclude. (I think internally coccinelle might > be assuming `char` is signed, by the way.) > > > > - It's not being used for characters; and > > > - It's doing something that assumes it is signed, such as various > > > types of comparisons or decrements. > > > > I took a quick look at the article, but I'm not completely sure what you > > are getting at here. Could you give some examples of what you do and > > don't want to find? > > > > You don't want the case where c is 'x', for some x? > > Something I would want to find is `if (c < 0)`. Something I wouldn't > want to find is `if (c < '9')`. IOW, I'm looking for code that assumes > `c` is signed, and would become incorrect if `c` suddenly became > unsigned. Most things involving actual characters are fine. But most > things involving signed arithmetic or comparisons with numbers isn't > find. > > Jason > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-11-03 12:45 ` Julia Lawall @ 2022-11-03 12:47 ` Jason A. Donenfeld 2022-11-03 12:57 ` Julia Lawall 0 siblings, 1 reply; 29+ messages in thread From: Jason A. Donenfeld @ 2022-11-03 12:47 UTC (permalink / raw) To: Julia Lawall Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell Hi Julia, On Thu, Nov 3, 2022 at 1:45 PM Julia Lawall <julia.lawall@inria.fr> wrote: > It should work now. Thanks! > However, without disable optional_qualifier, char is > still matching signed char. If you think that should be changed, I can do > that. Does `optional_qualifier` disable other things that might be interesting to have? If so, maybe this is less than ideal? If not, maybe it doesn't matter? Though, for what it's worth, gcc treats `char` as a separate type, even when using `-funsigned-char` or `-fsigned-char`. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-11-03 12:47 ` Jason A. Donenfeld @ 2022-11-03 12:57 ` Julia Lawall 2022-11-03 14:07 ` Jason A. Donenfeld 0 siblings, 1 reply; 29+ messages in thread From: Julia Lawall @ 2022-11-03 12:57 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, 3 Nov 2022, Jason A. Donenfeld wrote: > Hi Julia, > > On Thu, Nov 3, 2022 at 1:45 PM Julia Lawall <julia.lawall@inria.fr> wrote: > > It should work now. > > Thanks! > > > However, without disable optional_qualifier, char is > > still matching signed char. If you think that should be changed, I can do > > that. > > Does `optional_qualifier` disable other things that might be > interesting to have? If so, maybe this is less than ideal? If not, > maybe it doesn't matter? Optional qualifier only allows a metavariable declared to have a certain type to match an expression that has the same type with signed, const, or verbatim in front of it. Disabling it forces you to write our signed, const etc explicitly when you want them. So rules may becomes more verbose. julia > > Though, for what it's worth, gcc treats `char` as a separate type, > even when using `-funsigned-char` or `-fsigned-char`. > > Jason > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [cocci] [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-11-03 12:57 ` Julia Lawall @ 2022-11-03 14:07 ` Jason A. Donenfeld 0 siblings, 0 replies; 29+ messages in thread From: Jason A. Donenfeld @ 2022-11-03 14:07 UTC (permalink / raw) To: Julia Lawall Cc: Kees Cook, cocci, Linus Torvalds, Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Nov 03, 2022 at 01:57:26PM +0100, Julia Lawall wrote: > > > On Thu, 3 Nov 2022, Jason A. Donenfeld wrote: > > > Hi Julia, > > > > On Thu, Nov 3, 2022 at 1:45 PM Julia Lawall <julia.lawall@inria.fr> wrote: > > > It should work now. > > > > Thanks! > > > > > However, without disable optional_qualifier, char is > > > still matching signed char. If you think that should be changed, I can do > > > that. > > > > Does `optional_qualifier` disable other things that might be > > interesting to have? If so, maybe this is less than ideal? If not, > > maybe it doesn't matter? > > Optional qualifier only allows a metavariable declared to have a certain > type to match an expression that has the same type with signed, const, or > verbatim in front of it. Disabling it forces you to write our signed, > const etc explicitly when you want them. So rules may becomes more > verbose. Oh, huh. Maybe best to treat it as a different type then so that's not required? I was also thinking that it doesn't totally make sense the way it is now, in that `char` is *NOT* signed on many platforms, such as arm. In 6.2, it'll be unsigned everywhere, for kernel code. So in the general case, for coccinelle, it's a bit of a heisentype and so maybe should be treated as distinct from `signed char` or `unsigned char`. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 17:33 ` Jason A. Donenfeld 2022-10-20 17:42 ` Linus Torvalds @ 2022-10-24 15:44 ` Jason A. Donenfeld 1 sibling, 0 replies; 29+ messages in thread From: Jason A. Donenfeld @ 2022-10-24 15:44 UTC (permalink / raw) To: Linus Torvalds Cc: Alexey Dobriyan, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko, Stephen Rothwell On Thu, Oct 20, 2022 at 11:33:39AM -0600, Jason A. Donenfeld wrote: > Hi Linus, > > On Thu, Oct 20, 2022 at 11:15 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > Can we please try to collect these all in one place? > > > > I see that Andrew picked up the original one for -mm, but I think it > > would be better if we had one specific place for all of this (one > > branch) to collect it all. > > Sure. Andrew can drop it from -mm, and I'll collect everything in: > > https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=unsigned-char&r > > And I'll ask Stephen to add that branch to -next. So, as discussed, I'm doing this, and it's in -next now. And as a result, we're getting warnings and such that I am fixing one by one. Progress, good. But it occurs to me that most of these bugs are not in architecture-specific code like the x86 p4_event_bind one from last week. That means I'll be submitting these as *fixes* during 6.1, since they're broken on some architecture already, rather than waiting to submit them to you via my unsigned-char tree in 6.2. Just FYI. Jason ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-20 17:14 ` Linus Torvalds 2022-10-20 17:33 ` Jason A. Donenfeld @ 2022-10-21 5:59 ` Alexey Dobriyan 2022-10-21 17:11 ` Linus Torvalds 1 sibling, 1 reply; 29+ messages in thread From: Alexey Dobriyan @ 2022-10-21 5:59 UTC (permalink / raw) To: Linus Torvalds Cc: Jason A. Donenfeld, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko On Thu, Oct 20, 2022 at 10:14:54AM -0700, Linus Torvalds wrote: > On Thu, Oct 20, 2022 at 9:28 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > Nice catch. > > > > Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> > > Can we please try to collect these all in one place? > > I see that Andrew picked up the original one for -mm, but I think it > would be better if we had one specific place for all of this (one > branch) to collect it all. > > I'm actually trying to do a "make allyesconfig" build on x86-64 with > both signed and unsigned char, and trying to see if I can script > something sane to show differences. It is very entertaining, i've given up and started patching sparse but it needs more because char constants are ints: diff --git a/evaluate.c b/evaluate.c index 61f59ee3..ab607581 100644 --- a/evaluate.c +++ b/evaluate.c @@ -321,6 +321,10 @@ static struct expression * cast_to(struct expression *old, struct symbol *type) if (old->ctype != &null_ctype && is_same_type(old, type)) return old; + if (is_char_type(old->ctype)) { + sparse_error(old->pos, "XXX char"); + } + expr = alloc_expression(old->pos, EXPR_IMPLIED_CAST); expr->ctype = type; expr->cast_type = type; diff --git a/symbol.h b/symbol.h index 5270fcd7..8e62aca2 100644 --- a/symbol.h +++ b/symbol.h @@ -455,6 +455,14 @@ static inline int is_byte_type(struct symbol *type) return type->bit_size == bits_in_char && type->type != SYM_BITFIELD; } +static inline int is_char_type(const struct symbol *type) +{ + if (type->type == SYM_NODE) { + type = type->ctype.base_type; + } + return type == &char_ctype; +} + static inline int is_wchar_type(struct symbol *type) { if (type->type == SYM_NODE) ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-21 5:59 ` Alexey Dobriyan @ 2022-10-21 17:11 ` Linus Torvalds 2022-10-21 17:23 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2022-10-21 17:11 UTC (permalink / raw) To: Alexey Dobriyan Cc: Jason A. Donenfeld, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko [-- Attachment #1: Type: text/plain, Size: 339 bytes --] On Thu, Oct 20, 2022 at 10:59 PM Alexey Dobriyan <adobriyan@gmail.com> wrote: > > It is very entertaining, i've given up and started patching sparse > but it needs more because char constants are ints: I think you can fix that by simply warning about character constants with the high bit set. Something like this.. Linus [-- Attachment #2: patch.diff --] [-- Type: text/x-patch, Size: 472 bytes --] char.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/char.c b/char.c index 730ae3f5..3706e033 100644 --- a/char.c +++ b/char.c @@ -93,6 +93,10 @@ void get_char_constant(struct token *token, unsigned long long *val) if (p != end) warning(token->pos, "multi-character character constant"); + if (v & 0x80) { + if (type >= TOKEN_CHAR && type <= TOKEN_CHAR_EMBEDDED_3) + warning(token->pos, "character constant with sign bit set"); + } *val = v; } ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array 2022-10-21 17:11 ` Linus Torvalds @ 2022-10-21 17:23 ` Linus Torvalds 0 siblings, 0 replies; 29+ messages in thread From: Linus Torvalds @ 2022-10-21 17:23 UTC (permalink / raw) To: Alexey Dobriyan Cc: Jason A. Donenfeld, akpm, linux-kernel, mm-commits, masahiroy, keescook, gregkh, andriy.shevchenko On Fri, Oct 21, 2022 at 10:11 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I think you can fix that by simply warning about character constants > with the high bit set. > > Something like this.. This actually triggers for the kernel, and I think those (few) warnings are likely worth just fixing. Because things like put_tty_queue('\377', ldata); just isn't worth it. Or look at this horror: if (c == (unsigned char) '\377' && I_PARMRK(tty)) where somebody did realize that '\377' might be -1, so they added the "helpful" cast. Wouldn't that be much nicer and simpler as just if (c == 255 && I_PARMRK(tty)) instead? Or just 0377 if you really love octal in the context of characters (and really, nobody should). Or 0xff. At no point does "(unsigned char) '\377' " strike me as a really readable way to write things. There's two of those things. We also have memset(stack, '\xff', 64); which really isn't helpful either. Why not just use 0xff, or even just -1. We also have a lot of those in lib/hexdump.c, for no good reason, particularly as those arrays are 'unsigned char[]' to begin with, not 'char'. I really don't understand why people would use '\xAA' instead of just using 0xAA. We also have static char sample_rate_buffer[4] = { '\x80', '\xbb', '\x00', '\x00' }; in sound/usb/mixer_scarlett.c, and that should be a byte array, so the 'char' should probably be 'unsigned char' or 'u8' in the first place, and again it would be just simpler and clearer to use plain hex constants. So that sparse warning looks simple enough, and I think every single case was just the kernel being odd. Now, in *string* constants, you sometimes do want to use that format, ie u8 array[] = "abcd\377"; is a reasonable way to avoid having to use a more complex initializer. But that sparse patch of mine only complains about (non-wide) character constants unless I screwed something up. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2022-11-03 14:07 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20221020000356.177CDC433C1@smtp.kernel.org> 2022-10-20 9:43 ` + kbuild-treat-char-as-always-unsigned.patch added to mm-nonmm-unstable branch Alexey Dobriyan 2022-10-20 9:49 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Alexey Dobriyan 2022-10-20 9:56 ` [PATCH -mm] -funsigned-char, namei: delete cast in lookup_one_common() Alexey Dobriyan 2022-10-20 16:28 ` [PATCH -mm] -funsigned-char, x86: make struct p4_event_bind::cntr signed array Jason A. Donenfeld 2022-10-20 17:14 ` Linus Torvalds 2022-10-20 17:33 ` Jason A. Donenfeld 2022-10-20 17:42 ` Linus Torvalds 2022-10-20 18:57 ` Kees Cook 2022-10-20 19:39 ` Linus Torvalds 2022-10-20 20:17 ` Linus Torvalds 2022-10-20 21:34 ` Andy Shevchenko 2022-10-20 22:46 ` Jason A. Donenfeld 2022-10-21 6:48 ` Greg KH 2022-10-21 7:24 ` Jason A. Donenfeld 2022-10-21 7:36 ` Greg KH 2022-10-26 1:50 ` Jason A. Donenfeld 2022-10-26 12:58 ` Jason A. Donenfeld 2022-10-26 13:17 ` Andy Shevchenko 2022-11-02 17:17 ` [cocci] " Julia Lawall 2022-11-03 0:08 ` Jason A. Donenfeld 2022-11-03 6:31 ` Julia Lawall 2022-11-03 12:45 ` Julia Lawall 2022-11-03 12:47 ` Jason A. Donenfeld 2022-11-03 12:57 ` Julia Lawall 2022-11-03 14:07 ` Jason A. Donenfeld 2022-10-24 15:44 ` Jason A. Donenfeld 2022-10-21 5:59 ` Alexey Dobriyan 2022-10-21 17:11 ` Linus Torvalds 2022-10-21 17:23 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).