checkpatch: Check for .byte-spelled insn opcodes documentation on x86
diff mbox series

Message ID 20201009161423.14583-1-bp@alien8.de
State New, archived
Headers show
Series
  • checkpatch: Check for .byte-spelled insn opcodes documentation on x86
Related show

Commit Message

Borislav Petkov Oct. 9, 2020, 4:14 p.m. UTC
From: Borislav Petkov <bp@suse.de>

Instruction opcode bytes spelled using the gas directive .byte should
carry a comment above them stating which binutils version has added
support for the instruction mnemonic so that they can be replaced with
the mnemonic when that binutils version is equal or less than the
minimum-supported version by the kernel.

Add a check for that.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 scripts/checkpatch.pl | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Joe Perches Oct. 9, 2020, 6:01 p.m. UTC | #1
On Fri, 2020-10-09 at 18:14 +0200, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> Instruction opcode bytes spelled using the gas directive .byte should
> carry a comment above them stating which binutils version has added
> support for the instruction mnemonic so that they can be replaced with
> the mnemonic when that binutils version is equal or less than the
> minimum-supported version by the kernel.
> 
> Add a check for that.

OK but several notes:

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -6858,6 +6858,18 @@ sub process {
>  			WARN("DUPLICATED_SYSCTL_CONST",
>  				"duplicated sysctl range checking value '$1', consider using the shared one in include/linux/sysctl.h\n" . $herecurr);
>  		}
> +
> +# document which binutils version supports the actual insn mnemonic so that the naked opcode bytes can be replaced.
> +# x86-only.
> +		if ($rawline =~ /(\.byte(?:0[xX][0-9a-fA-F]+0-9a-fx,]+)/ && $realfile =~ "^arch/x86/") {

Given the location, this only works on .c and .h files.
It does not work on .S files.  Should it?

No need for a capture group.

Please use @ not " as all the other $realfile comparisons
use that form when expecting a /

So it looks like the regex would be more complete as:

	if ($realfile =~ m@^arch/x86/@ &&
	    $rawline =~ /\.byte\s+(?:$Constant|(?:\\)?$Ident|"\s*$Ident)\b/) {

etc...

> +			my $comment = ctx_locate_comment(0, $linenr);

A patch can modify any number of files.

This should use ctx_locate_comment($file ? 0 : $first_line, $linenr)
as checkpatch tests work on patch contexts not the entire
file before this line.

> +			if (! $comment || ($comment !~ /binutils version [0-9.]+/(ms)) {

No need for the $!comment test

> +				WARN("MISSING_BINUTILS_VERSION",
> +				     "Please document which binutils version supports these .byte-spelled\n" .
> +				     "\tinsn opcodes by adding \"binutils version <num>\" in a comment" .
> +				     " above them.\n" . $herecurr);

checkpatch uses only a single line output only before $herecurr
Output line length doesn't matter.
Borislav Petkov Oct. 10, 2020, 10:54 a.m. UTC | #2
On Fri, Oct 09, 2020 at 11:01:18AM -0700, Joe Perches wrote:
> Given the location, this only works on .c and .h files.
> It does not work on .S files.  Should it?

Probably not because there will be too many false positives. .byte is
used not only to spell instruction opcodes in .S files. And the main
case we're addressing here is using those .byte spelled opcodes in asm
volatile constructs so...

> No need for a capture group.

Debugging leftover, gone now.

> Please use @ not " as all the other $realfile comparisons
> use that form when expecting a /

Done.

> So it looks like the regex would be more complete as:
> 
> 	if ($realfile =~ m@^arch/x86/@ &&
> 	    $rawline =~ /\.byte\s+(?:$Constant|(?:\\)?$Ident|"\s*$Ident)\b/) {

This is much less readable than what I have now (yes, realfile test
should come first):

	if ($realfile =~ m@^arch/x86/@ && $rawline =~ /\.byte[\s0-9a-fx,]+/) {

Also, this

	$rawline =~ /\.byte\s+(?:$Constant|(?:\\)?$Ident|"\s*$Ident)\b/) {

matches

	".byte 0x66"

This

	$rawline =~ /\.byte[\s0-9a-fx,]+)) {

matches

	".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"

which is what it needs to match.

> A patch can modify any number of files.
> 
> This should use ctx_locate_comment($file ? 0 : $first_line, $linenr)
> as checkpatch tests work on patch contexts not the entire
> file before this line.

Done.

> No need for the $!comment test

Done.

> checkpatch uses only a single line output only before $herecurr
> Output line length doesn't matter.

Well, this:

WARNING: Please document which binutils version supports these .byte-spelled
        insn opcodes by adding "binutils version <num>" in a comment above them.
#90: FILE: arch/x86/include/asm/special_insns.h:254:
+       asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"


is easier readable than this:

WARNING: Please document which binutils version supports these .byte-spelledinsn opcodes by adding "binutils version <num>" in a comment above them.
#90: FILE: arch/x86/include/asm/special_insns.h:254:
+       asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"

Thx.
Joe Perches Oct. 10, 2020, 3:27 p.m. UTC | #3
On Sat, 2020-10-10 at 12:54 +0200, Borislav Petkov wrote:
> On Fri, Oct 09, 2020 at 11:01:18AM -0700, Joe Perches wrote:
> > Given the location, this only works on .c and .h files.
> > It does not work on .S files.  Should it?
> 
> Probably not because there will be too many false positives. .byte is
> used not only to spell instruction opcodes in .S files. And the main
> case we're addressing here is using those .byte spelled opcodes in asm
> volatile constructs so...

[]

> > So it looks like the regex would be more complete as:
> > 
> > 	if ($realfile =~ m@^arch/x86/@ &&
> > 	    $rawline =~ /\.byte\s+(?:$Constant|(?:\\)?$Ident|"\s*$Ident)\b/) {
> 
> This is much less readable than what I have now (yes, realfile test
> should come first):

I care less about readability than completeness, correctness,
and minimal false positives.

> 	if ($realfile =~ m@^arch/x86/@ && $rawline =~ /\.byte[\s0-9a-fx,]+/) {
> 
> Also, this
> 
> 	$rawline =~ /\.byte\s+(?:$Constant|(?:\\)?$Ident|"\s*$Ident)\b/) {
> 
> matches
> 
> 	".byte 0x66"

And? So does the regex you propose.

> This
> 
> 	$rawline =~ /\.byte[\s0-9a-fx,]+)) {
> 
> matches
> 
> 	".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> 
> which is what it needs to match.

Is that the only case you are trying to match?
two or more "0x<hex>," ?

Then this could use:

/"\s*\.byte\s+(?:0x[0-9a-fA-F]{1,2}\s*,\s*){2,4}/

$ git grep -P '"\s*\.byte\s+(?:0x[0-9a-fA-F]{1,2}\s*,\s*){2,4}' arch/x86
arch/x86/include/asm/intel_pconfig.h:#define PCONFIG ".byte 0x0f, 0x01, 0xc5"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xc8;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xfa;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xc9;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xfb;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x66, 0x0f, 0xae, 0xf1\t\n"
arch/x86/include/asm/segment.h:                 ".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
arch/x86/include/asm/smap.h:#define __ASM_CLAC  ".byte 0x0f,0x01,0xca"
arch/x86/include/asm/smap.h:#define __ASM_STAC  ".byte 0x0f,0x01,0xcb"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0x0f,0x01,0xee\n\t"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0x0f,0x01,0xef\n\t"
arch/x86/include/asm/special_insns.h:           ".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0xf3, 0x0f, 0x38, 0xf8, 0x02, 0x66, 0x90"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0xf, 0x1, 0xe8" ::: "memory");

vs:

$ git grep -P '\.byte[\s0-9a-fx,]+' -- 'arch/x86/*.[ch]' | cat
arch/x86/include/asm/alternative-asm.h:	.byte \orig_len
arch/x86/include/asm/alternative-asm.h:	.byte \alt_len
arch/x86/include/asm/alternative-asm.h:	.byte \pad_len
arch/x86/include/asm/alternative.h:	" .byte " alt_total_slen "\n"			/* source len      */ \
arch/x86/include/asm/alternative.h:	" .byte " alt_rlen(num) "\n"			/* replacement len */ \
arch/x86/include/asm/alternative.h:	" .byte " alt_pad_len "\n"			/* pad len */
arch/x86/include/asm/bug.h:#define ASM_UD0		".byte 0x0f, 0xff" /* + ModRM (for Intel) */
arch/x86/include/asm/bug.h:#define ASM_UD1		".byte 0x0f, 0xb9" /* + ModRM */
arch/x86/include/asm/bug.h:#define ASM_UD2		".byte 0x0f, 0x0b"
arch/x86/include/asm/cpufeature.h:		 " .byte 3b - 1b\n"		/* src len */
arch/x86/include/asm/cpufeature.h:		 " .byte 5f - 4f\n"		/* repl len */
arch/x86/include/asm/cpufeature.h:		 " .byte 3b - 2b\n"		/* pad len */
arch/x86/include/asm/cpufeature.h:		 " .byte 3b - 1b\n"		/* src len */
arch/x86/include/asm/cpufeature.h:		 " .byte 0\n"			/* repl len */
arch/x86/include/asm/cpufeature.h:		 " .byte 0\n"			/* pad len */
arch/x86/include/asm/fpu/internal.h:#define XSAVE		".byte " REX_PREFIX "0x0f,0xae,0x27"
arch/x86/include/asm/fpu/internal.h:#define XSAVEOPT	".byte " REX_PREFIX "0x0f,0xae,0x37"
arch/x86/include/asm/fpu/internal.h:#define XSAVES		".byte " REX_PREFIX "0x0f,0xc7,0x2f"
arch/x86/include/asm/fpu/internal.h:#define XRSTOR		".byte " REX_PREFIX "0x0f,0xae,0x2f"
arch/x86/include/asm/fpu/internal.h:#define XRSTORS		".byte " REX_PREFIX "0x0f,0xc7,0x1f"
arch/x86/include/asm/idtentry.h: * Note, that the 'pushq imm8' is emitted via '.byte 0x6a, vector' because
arch/x86/include/asm/idtentry.h: * .byte achieves the same thing and the only fixup needed in the C entry
arch/x86/include/asm/idtentry.h:	.byte	0x6a, vector
arch/x86/include/asm/idtentry.h:	.byte	0x6a, vector
arch/x86/include/asm/inst.h: * Generate .byte code for some instructions not supported by old
arch/x86/include/asm/inst.h:	.byte 0x40 | ((\opd1 & 8) >> 3) | ((\opd2 & 8) >> 1) | (\W << 3)
arch/x86/include/asm/inst.h:	.byte \mod | (\opd1 & 7) | ((\opd2 & 7) << 3)
arch/x86/include/asm/inst.h:	.byte 0xf3
arch/x86/include/asm/inst.h:	.byte 0x0f, 0xc7
arch/x86/include/asm/intel_pconfig.h:#define PCONFIG ".byte 0x0f, 0x01, 0xc5"
arch/x86/include/asm/jump_label.h:		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
arch/x86/include/asm/jump_label.h:		".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
arch/x86/include/asm/jump_label.h:	.byte		0xe9
arch/x86/include/asm/jump_label.h:	.byte		STATIC_KEY_INIT_NOP
arch/x86/include/asm/jump_label.h:	.byte		STATIC_KEY_INIT_NOP
arch/x86/include/asm/jump_label.h:	.byte		0xe9
arch/x86/include/asm/mwait.h:	asm volatile(".byte 0x0f, 0x01, 0xc8;"
arch/x86/include/asm/mwait.h:	asm volatile(".byte 0x0f, 0x01, 0xfa;"
arch/x86/include/asm/mwait.h:	asm volatile(".byte 0x0f, 0x01, 0xc9;"
arch/x86/include/asm/mwait.h:	asm volatile(".byte 0x0f, 0x01, 0xfb;"
arch/x86/include/asm/mwait.h:	asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
arch/x86/include/asm/mwait.h:	asm volatile(".byte 0x66, 0x0f, 0xae, 0xf1\t\n"
arch/x86/include/asm/nops.h:#define _ASM_MK_NOP(x) .byte x
arch/x86/include/asm/nops.h:#define _ASM_MK_NOP(x) ".byte " __stringify(x) "\n"
arch/x86/include/asm/paravirt.h:	 .byte ptype;				\
arch/x86/include/asm/paravirt.h:	 .byte 772b-771b;			\
arch/x86/include/asm/paravirt_types.h:	"  .byte " type "\n"				\
arch/x86/include/asm/paravirt_types.h:	"  .byte 772b-771b\n"				\
arch/x86/include/asm/segment.h:			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
arch/x86/include/asm/smap.h:#define __ASM_CLAC	".byte 0x0f,0x01,0xca"
arch/x86/include/asm/smap.h:#define __ASM_STAC	".byte 0x0f,0x01,0xcb"
arch/x86/include/asm/special_insns.h:	asm volatile(".byte 0x0f,0x01,0xee\n\t"
arch/x86/include/asm/special_insns.h:	asm volatile(".byte 0x0f,0x01,0xef\n\t"
arch/x86/include/asm/special_insns.h:	alternative_io(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
arch/x86/include/asm/special_insns.h:		       ".byte 0x66; clflush %P0",
arch/x86/include/asm/special_insns.h:		".byte " __stringify(NOP_DS_PREFIX) "; clflush (%[pax])",
arch/x86/include/asm/special_insns.h:		".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
arch/x86/include/asm/special_insns.h:		".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
arch/x86/include/asm/special_insns.h:	asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
arch/x86/include/asm/special_insns.h:	asm volatile(".byte 0xf3, 0x0f, 0x38, 0xf8, 0x02, 0x66, 0x90"
arch/x86/include/asm/special_insns.h:	asm volatile(".byte 0xf, 0x1, 0xe8" ::: "memory");
arch/x86/include/asm/static_call.h:	__ARCH_DEFINE_STATIC_CALL_TRAMP(name, ".byte 0xe9; .long " #func " - (. + 4)")
arch/x86/include/asm/xen/interface.h:#define XEN_EMULATE_PREFIX __ASM_FORM(.byte __XEN_EMULATE_PREFIX ;)
arch/x86/kvm/vmx/ops.h:		     ".byte 0x3e\n\t" /* branch taken hint */
arch/x86/kvm/vmx/ops.h:			  ".byte 0x2e\n\t" /* branch not taken hint */	\
arch/x86/kvm/vmx/ops.h:			  ".byte 0x2e\n\t" /* branch not taken hint */	\
arch/x86/realmode/rm/realmode.h:#define LJMPW_RM(to)	.byte 0xea ; .word (to), real_mode_seg

and if lines like:

arch/x86/include/asm/inst.h: * Generate .byte code for some instructions not supported by old
arch/x86/include/asm/paravirt_types.h:	"  .byte " type "\n"				\

are false positives then perhaps change the regex
to add required spaces after the \.byte before the
match.
Joe Perches Oct. 10, 2020, 3:38 p.m. UTC | #4
On Sat, 2020-10-10 at 12:54 +0200, Borislav Petkov wrote:
> > checkpatch uses only a single line output only before $herecurr
> > Output line length doesn't matter.
[]
> WARNING: Please document which binutils version supports these .byte-spelled
>         insn opcodes by adding "binutils version <num>" in a comment above them.
> #90: FILE: arch/x86/include/asm/special_insns.h:254:
> +       asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> 
> 
> is easier readable than this:
> 
> WARNING: Please document which binutils version supports these .byte-spelledinsn opcodes by adding "binutils version <num>" in a comment above them.
> #90: FILE: arch/x86/include/asm/special_insns.h:254:
> +       asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"

Readability is a consideration but it still must be a single line.

using --terse requires single line error output

Perhaps:
			if ($comment !~ /\bbinutils version [0-9.]+/ms) {
				WARN("MISSING_BINUTILS_VERSION",
				     "Please add a comment for .byte-spelled insn opcodes with \"binutils version <minimum_required_version>\"\n" . $herecurr);
Borislav Petkov Oct. 10, 2020, 4:11 p.m. UTC | #5
On Sat, Oct 10, 2020 at 08:27:20AM -0700, Joe Perches wrote:
> Then this could use:
> 
> /"\s*\.byte\s+(?:0x[0-9a-fA-F]{1,2}\s*,\s*){2,4}/

Yes, this is getting close.

I've tweaked it a bit to:

'/\s*\.byte\s+(?:0x[0-9a-f]{1,2}[\s,]*){2,}/i'

which assumes at least 2 opcode bytes; upper limit can be more than 4.
It still has some false positives in crypto but I'd say that's good
enough. I'll play more with it later.

Thx.
Joe Perches Oct. 10, 2020, 4:47 p.m. UTC | #6
On Sat, 2020-10-10 at 18:11 +0200, Borislav Petkov wrote:
> On Sat, Oct 10, 2020 at 08:27:20AM -0700, Joe Perches wrote:
> > Then this could use:
> > 
> > /"\s*\.byte\s+(?:0x[0-9a-fA-F]{1,2}\s*,\s*){2,4}/
> 
> Yes, this is getting close.
> 
> I've tweaked it a bit to:
> 
> '/\s*\.byte\s+(?:0x[0-9a-f]{1,2}[\s,]*){2,}/i'
    ^^^                                       ^
now useless without the "                     matches .BYTE

you probably want (?i:0x[etc...]

I'd prefer to add an upper bound to the {m,n} use.
Unbounded multiple
matches {m,} can cause perl aborts.

This regex would also match

.byte 0x020x02

(which admittedly wouldn't compile, but I've seen really
 bad patches submitted too)

> which assumes at least 2 opcode bytes; upper limit can be more than 4.
> It still has some false positives in crypto but I'd say that's good
> enough. I'll play more with it later

A readability convenience would be to add and use:

our $Hex_byte	= qr{(?i)0x[0-9a-f]{1,2}\b};

So if the minimum length if the isns .byte block is 2,
with a separating comma then the regex could be:

/\.byte\s+$Hex_byte\s*,\s*$Hex_byte\b/

which I think is pretty readable.

$ git grep -P '\.byte\s+(?i:0x[0-9a-f]{1,2}\s*,\s*0x[0-9a-f]{1,2})\b' -- 'arch/x86/*.[ch]'
arch/x86/include/asm/bug.h:#define ASM_UD0              ".byte 0x0f, 0xff" /* + ModRM (for Intel) */
arch/x86/include/asm/bug.h:#define ASM_UD1              ".byte 0x0f, 0xb9" /* + ModRM */
arch/x86/include/asm/bug.h:#define ASM_UD2              ".byte 0x0f, 0x0b"
arch/x86/include/asm/inst.h:    .byte 0x0f, 0xc7
arch/x86/include/asm/intel_pconfig.h:#define PCONFIG ".byte 0x0f, 0x01, 0xc5"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xc8;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xfa;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xc9;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x0f, 0x01, 0xfb;"
arch/x86/include/asm/mwait.h:   asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
arch/x86/include/asm/mwait.h:   asm volatile(".byte 0x66, 0x0f, 0xae, 0xf1\t\n"
arch/x86/include/asm/segment.h:                 ".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
arch/x86/include/asm/smap.h:#define __ASM_CLAC  ".byte 0x0f,0x01,0xca"
arch/x86/include/asm/smap.h:#define __ASM_STAC  ".byte 0x0f,0x01,0xcb"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0x0f,0x01,0xee\n\t"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0x0f,0x01,0xef\n\t"
arch/x86/include/asm/special_insns.h:           ".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0xf3, 0x0f, 0x38, 0xf8, 0x02, 0x66, 0x90"
arch/x86/include/asm/special_insns.h:   asm volatile(".byte 0xf, 0x1, 0xe8" ::: "memory");
Borislav Petkov Oct. 12, 2020, 2:21 p.m. UTC | #7
On Sat, Oct 10, 2020 at 09:47:59AM -0700, Joe Perches wrote:
> > '/\s*\.byte\s+(?:0x[0-9a-f]{1,2}[\s,]*){2,}/i'
>     ^^^                                       ^
> now useless without the "

There are \.byte specifications without " so not useless.

> matches .BYTE

so what. It would have failed before, when trying to compile it.

> you probably want (?i:0x[etc...]
> 
> I'd prefer to add an upper bound to the {m,n} use.
> Unbounded multiple
> matches {m,} can cause perl aborts.

Ok, we can make that 15. Max insn length on x86 is 15 bytes and that
is unrealistically high for this use case so we should be good. And the
range must be {1,15} because you can have single-byte instructions.

And that's fine if there are *some* false positives. And whatever we do,
it won't match everything. For example:

arch/x86/include/asm/fpu/internal.h:208:#define XSAVE           ".byte " REX_PREFIX "0x0f,0xae,0x27"
arch/x86/include/asm/fpu/internal.h:209:#define XSAVEOPT        ".byte " REX_PREFIX "0x0f,0xae,0x37"
arch/x86/include/asm/fpu/internal.h:210:#define XSAVES          ".byte " REX_PREFIX "0x0f,0xc7,0x2f"
arch/x86/include/asm/fpu/internal.h:211:#define XRSTOR          ".byte " REX_PREFIX "0x0f,0xae,0x2f"
arch/x86/include/asm/fpu/internal.h:212:#define XRSTORS         ".byte " REX_PREFIX "0x0f,0xc7,0x1f"

but that's fine. I prefer for the regex to remain readable and single
outliers like those are caught in manual review.

As another example, sometimes it would be a false positive for another
reason:

arch/x86/include/asm/idtentry.h:500: * Note, that the 'pushq imm8' is emitted via '.byte 0x6a, vector' because

that's why I've changed the text to say "Please consider..." implying
thatdocumenting binutils version might not always be necessary/needed.

All in all, it's fine if there are some false positives and it can make
reviewers have a second look.

> This regex would also match
>
> .byte 0x020x02
>
> (which admittedly wouldn't compile, but I've seen really bad patches
> submitted too)

That's fine - I love reviewing !compiled patches. They will never send
!compiled again.

> A readability convenience would be to add and use:
> 
> our $Hex_byte	= qr{(?i)0x[0-9a-f]{1,2}\b};
> 
> So if the minimum length if the isns .byte block is 2,
> with a separating comma then the regex could be:
> 
> /\.byte\s+$Hex_byte\s*,\s*$Hex_byte\b/
> 
> which I think is pretty readable.

Yap, makes sense. v3 coming up...

Thx.
Joe Perches Oct. 12, 2020, 3:02 p.m. UTC | #8
On Mon, 2020-10-12 at 16:21 +0200, Borislav Petkov wrote:
> On Sat, Oct 10, 2020 at 09:47:59AM -0700, Joe Perches wrote:
> > > '/\s*\.byte\s+(?:0x[0-9a-f]{1,2}[\s,]*){2,}/i'
> >     ^^^                                       ^
> > now useless without the "
>
> There are \.byte specifications without " so not useless.

Matching optional leading spaces is useless.

Patch
diff mbox series

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 504d2e431c60..cfa81f1640cd 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -6858,6 +6858,18 @@  sub process {
 			WARN("DUPLICATED_SYSCTL_CONST",
 				"duplicated sysctl range checking value '$1', consider using the shared one in include/linux/sysctl.h\n" . $herecurr);
 		}
+
+# document which binutils version supports the actual insn mnemonic so that the naked opcode bytes can be replaced.
+# x86-only.
+		if ($rawline =~ /(\.byte[\s0-9a-fx,]+)/ && $realfile =~ "^arch/x86/") {
+			my $comment = ctx_locate_comment(0, $linenr);
+			if (! $comment || ($comment !~ /binutils version [0-9.]+/ms)) {
+				WARN("MISSING_BINUTILS_VERSION",
+				     "Please document which binutils version supports these .byte-spelled\n" .
+				     "\tinsn opcodes by adding \"binutils version <num>\" in a comment" .
+				     " above them.\n" . $herecurr);
+			}
+		}
 	}
 
 	# If we have no input at all, then there is nothing to report on