[RFC,v0,00/19] x86/insn: Add an insn_decode() API
diff mbox

Message ID 20201124101952.7909-1-bp@alien8.de
State New, archived
Headers show

Commit Message

Borislav Petkov Nov. 24, 2020, 10:19 a.m. UTC
From: Borislav Petkov <bp@suse.de>

Hi,

here's what I had in mind, finally split into proper patches. The final
goal is for all users of the decoder to simply call insn_decode() and
look at its retval. Simple.

As to amluto's question about detecting partial insns, see the diff
below.

Running that gives:

insn buffer:
0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 
supplied buf size: 15, ret 0
supplied buf size: 2, ret 0
supplied buf size: 3, ret 0
supplied buf size: 4, ret 0

and the return value is always success.

Which means that that buf_len that gets supplied to the decoder
functions doesn't really work and I need to look into it.

That is, provided this is how we want to control what the instruction
decoder decodes - by supplying the length of the buffer it should look
at.

We could also say that probably there should be a way to say "decode
only the first insn in the buffer and ignore the rest". That is all up
to the use cases so I'm looking for suggestions here.

In any case, at least the case where I give it

0x48 0xcf 0x48 0x83

and say that buf size is 4, should return an error because the second
insn is incomplete. So I need to go look at that now.

Thx.

---


Borislav Petkov (19):
  x86/insn: Rename insn_decode() to insn_decode_regs()
  x86/insn: Add @buf_len param to insn_init() kernel-doc comment
  x86/insn: Add an insn_decode() API
  x86/insn-eval: Handle return values from the decoder
  x86/boot/compressed/sev-es: Convert to insn_decode()
  perf/x86/intel/ds: Check insn_get_length() retval
  perf/x86/intel/ds: Check return values of insn decoder functions
  x86/alternative: Use insn_decode()
  x86/mce: Convert to insn_decode()
  x86/kprobes: Convert to insn_decode()
  x86/sev-es: Convert to insn_decode()
  x86/traps: Convert to insn_decode()
  x86/uprobes: Convert to insn_decode()
  x86/tools/insn_decoder_test: Convert to insn_decode()
  tools/objtool: Convert to insn_decode()
  x86/tools/insn_sanity: Convert to insn_decode()
  tools/perf: Convert to insn_decode()
  x86/insn: Remove kernel_insn_init()
  x86/insn: Make insn_complete() static

 arch/x86/boot/compressed/sev-es.c             |  11 +-
 arch/x86/events/intel/ds.c                    |   4 +-
 arch/x86/events/intel/lbr.c                   |  10 +-
 arch/x86/include/asm/insn-eval.h              |   4 +-
 arch/x86/include/asm/insn.h                   |  42 ++--
 arch/x86/kernel/alternative.c                 |   6 +-
 arch/x86/kernel/cpu/mce/severity.c            |  12 +-
 arch/x86/kernel/kprobes/core.c                |  17 +-
 arch/x86/kernel/kprobes/opt.c                 |   9 +-
 arch/x86/kernel/sev-es.c                      |  15 +-
 arch/x86/kernel/traps.c                       |   7 +-
 arch/x86/kernel/umip.c                        |   2 +-
 arch/x86/kernel/uprobes.c                     |   8 +-
 arch/x86/lib/insn-eval.c                      |  20 +-
 arch/x86/lib/insn.c                           | 190 ++++++++++++++----
 arch/x86/tools/insn_decoder_test.c            |  10 +-
 arch/x86/tools/insn_sanity.c                  |   8 +-
 tools/arch/x86/include/asm/insn.h             |  42 ++--
 tools/arch/x86/lib/insn.c                     | 190 ++++++++++++++----
 tools/include/linux/kconfig.h                 |  73 +++++++
 tools/objtool/arch/x86/decode.c               |   9 +-
 tools/perf/arch/x86/tests/insn-x86.c          |   9 +-
 tools/perf/arch/x86/util/archinsn.c           |   9 +-
 .../intel-pt-decoder/intel-pt-insn-decoder.c  |  17 +-
 24 files changed, 507 insertions(+), 217 deletions(-)
 create mode 100644 tools/include/linux/kconfig.h

Comments

Borislav Petkov Nov. 24, 2020, 5:46 p.m. UTC | #1
On Tue, Nov 24, 2020 at 11:19:33AM +0100, Borislav Petkov wrote:
> In any case, at least the case where I give it
> 
> 0x48 0xcf 0x48 0x83
> 
> and say that buf size is 4, should return an error because the second
> insn is incomplete. So I need to go look at that now.

Ok, got it:

./arch/x86/tools/insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x826fdf9c)
insn buffer:
0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 
supplied buf size: 15, ret 0
supplied buf size: 2, ret 0
supplied buf size: 3, ret 0
supplied buf size: 4, ret 0
supplied buf size: 1, ret -22

the current decoder simply decodes the *first* insn in the buffer it
encounters and that's it.

When you give it a buffer of size smaller than the first instruction:

supplied buf size: 1, ret -22

while the first insn is 2 bytes long:

0x48 0xcf (IRETQ)

then it signals an error.

Andy, does that work for your use cases?
Masami Hiramatsu Nov. 25, 2020, 8:03 a.m. UTC | #2
On Tue, 24 Nov 2020 18:46:47 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Tue, Nov 24, 2020 at 11:19:33AM +0100, Borislav Petkov wrote:
> > In any case, at least the case where I give it
> > 
> > 0x48 0xcf 0x48 0x83
> > 
> > and say that buf size is 4, should return an error because the second
> > insn is incomplete. So I need to go look at that now.
> 
> Ok, got it:
> 
> ./arch/x86/tools/insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x826fdf9c)
> insn buffer:
> 0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 
> supplied buf size: 15, ret 0
> supplied buf size: 2, ret 0
> supplied buf size: 3, ret 0
> supplied buf size: 4, ret 0
> supplied buf size: 1, ret -22
> 
> the current decoder simply decodes the *first* insn in the buffer it
> encounters and that's it.

Yes, currently the buf_size is only for checking the maximum length of
the buffer, because we expect the user doesn't know the actual length of
the instruction before calling insn_get_length().
But yes, for the insn_sanity.c, the return length should be compared.

Thank you,

> 
> When you give it a buffer of size smaller than the first instruction:
> 
> supplied buf size: 1, ret -22
> 
> while the first insn is 2 bytes long:
> 
> 0x48 0xcf (IRETQ)
> 
> then it signals an error.
> 
> Andy, does that work for your use cases?
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Andy Lutomirski Nov. 27, 2020, 5:45 p.m. UTC | #3
On Tue, Nov 24, 2020 at 9:46 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Nov 24, 2020 at 11:19:33AM +0100, Borislav Petkov wrote:
> > In any case, at least the case where I give it
> >
> > 0x48 0xcf 0x48 0x83
> >
> > and say that buf size is 4, should return an error because the second
> > insn is incomplete. So I need to go look at that now.
>
> Ok, got it:
>
> ./arch/x86/tools/insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x826fdf9c)
> insn buffer:
> 0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
> supplied buf size: 15, ret 0
> supplied buf size: 2, ret 0
> supplied buf size: 3, ret 0
> supplied buf size: 4, ret 0
> supplied buf size: 1, ret -22
>
> the current decoder simply decodes the *first* insn in the buffer it
> encounters and that's it.
>
> When you give it a buffer of size smaller than the first instruction:
>
> supplied buf size: 1, ret -22
>
> while the first insn is 2 bytes long:
>
> 0x48 0xcf (IRETQ)
>
> then it signals an error.
>
> Andy, does that work for your use cases?

Is -22 (-EINVAL) the same error it returns if you pass in garbage?
How hard would it be to teach it to return a different error code when
the buffer is too small?

>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Nov. 27, 2020, 10:35 p.m. UTC | #4
On Fri, Nov 27, 2020 at 09:45:39AM -0800, Andy Lutomirski wrote:
> Is -22 (-EINVAL) the same error it returns if you pass in garbage?

Define garbage.

Yes, if you have a sequence of bytes which you can unambiguously
determine to be

 - an invalid instruction in some of the tables

 - REX prefix with the wrong bits set

 - a byte says that some insn part like ModRM or SIB is following but
 the buffer falls short

 - ... other error condition

then maybe you can say, yes, I'm looking at garbage and can error out
right then and there. But you need to have enough bytes of information
to determine that.

For example (those are random bytes):

00000000000011ff <.asm_start>:
    11ff:       95                      xchg   %eax,%ebp
    1200:       14 60                   adc    $0x60,%al
    1202:       77 74                   ja     1278 <__libc_csu_init+0x28>
    1204:       82                      (bad)  
    1205:       67 dc 55 35             fcoml  0x35(%ebp)

The 0x82 is usually in opcode group 1 but that opcode is invalid in
64-bit mode. So if this is a 64-bit executable, you know that that is an
invalid insn.

Another example:

              18:       a0                      .byte 0xa0
              19:       17                       (bad)
              1a:       27                       (bad)
              1b:       ea                       (bad)
              1c:       90                      nop
              1d:       90                      nop
              1e:       90                      nop
              1f:       90                      nop

0xa0 is the opcode for

	MOV AL, moffset8

where moffset8 is an address-sized memory offset, which in 64-bit mode
is 64-bit. But we have only 7 bytes after the 0xa0 thus we know that the
buffer is truncated.

If it had one byte more, it would be a valid insn:

               18:       a0 17 27 ea 90 90 90    movabs 0x9090909090ea2717,%al
               20:       90 90 


I'm sure you get the idea: if you have enough unambiguous bits which
tell you that this cannot be a valid insn, then you can return early
from the decoder and signal that fact.

I'm not sure that you can do that for all possible byte combinations
and also I'm not sure that it won't ever happen that it per chance
misinterprets garbage data as valid instructions.

> How hard would it be to teach it to return a different error code when
> the buffer is too small?

Yap, see above. Unambiguous cases are clear but I don't know it would
work in all cases. For example, let's say you give it a zeroed out
buffer of 8 bytes which doesn't contain anything - you've just zeroed it
out and haven't put any insns in there yet.

But those are perfectly valid insns:

               0:       00 00                   add %al,(%rax)
               2:       00 00                   add %al,(%rax)
               4:       00 00                   add %al,(%rax)
               6:       00 00                   add %al,(%rax)

So now you go about your merry way working with those although they're
not real instructions which some tool generated.

See what I mean?
Masami Hiramatsu Nov. 29, 2020, 8:50 a.m. UTC | #5
On Fri, 27 Nov 2020 09:45:39 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On Tue, Nov 24, 2020 at 9:46 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Tue, Nov 24, 2020 at 11:19:33AM +0100, Borislav Petkov wrote:
> > > In any case, at least the case where I give it
> > >
> > > 0x48 0xcf 0x48 0x83
> > >
> > > and say that buf size is 4, should return an error because the second
> > > insn is incomplete. So I need to go look at that now.
> >
> > Ok, got it:
> >
> > ./arch/x86/tools/insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x826fdf9c)
> > insn buffer:
> > 0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
> > supplied buf size: 15, ret 0
> > supplied buf size: 2, ret 0
> > supplied buf size: 3, ret 0
> > supplied buf size: 4, ret 0
> > supplied buf size: 1, ret -22
> >
> > the current decoder simply decodes the *first* insn in the buffer it
> > encounters and that's it.
> >
> > When you give it a buffer of size smaller than the first instruction:
> >
> > supplied buf size: 1, ret -22
> >
> > while the first insn is 2 bytes long:
> >
> > 0x48 0xcf (IRETQ)
> >
> > then it signals an error.
> >
> > Andy, does that work for your use cases?
> 
> Is -22 (-EINVAL) the same error it returns if you pass in garbage?
> How hard would it be to teach it to return a different error code when
> the buffer is too small?
> 

Good point. I think we can return, e.g. -EFAULT if we failed in get_next(). Then, we can read out next page, for example.

Thank you,
Borislav Petkov Nov. 30, 2020, 1:44 p.m. UTC | #6
On Sun, Nov 29, 2020 at 05:50:05PM +0900, Masami Hiramatsu wrote:
> Good point. I think we can return, e.g. -EFAULT if we failed in
> get_next(). Then, we can read out next page, for example.

Why -EFAULT?

Running this

        size = 1;
        ret = insn_decode(&insn, b, size, INSN_MODE_64)

i.e., buffer size is 1:

./arch/x86/tools/insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x9994a137)
insn buffer:
0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 
supplied buf size: 15, ret 0
supplied buf size: 2, ret 0
supplied buf size: 3, ret 0
supplied buf size: 4, ret 0
insn_decode: entry
insn_decode: will get_length
insn_get_immediate: getting immediate
insn_get_displacement: getting displacement
insn_get_sib: getting sib
insn_get_modrm: entry
insn_get_opcode: entry
insn_get_prefixes: entry, prefixes->got: 0
insn_get_prefixes: 1
insn_get_prefixes: REX
insn_get_prefixes: VEX
insn_get_prefixes: validate_next: 0
insn_get_prefixes: insn->next_byte: 0x7ffec297c3e1, insn->end_kaddr: 0x7ffec297c3e1
insn_get_prefixes: errored out
supplied buf size: 1, ret -22

is caught in validate_next() where ->next_byte == ->end_kaddr.

I'm thinking we should return EOF here, to denote that we're reached the
end and then propagate that error up the callchain.

We don't have "define EOF" in the kernel but we can designate one for
the insn decoder, perhaps

#define EOF      -1

as stdio.h does:

/* The value returned by fgetc and similar functions to indicate the
   end of the file.  */
#define EOF (-1)

Hmm, but then the callers would need to know EOF too so maybe EIO or
something.

In any case, it should be a value which callers should be able to use to
get told that input buffer is truncated...

Thx.
Masami Hiramatsu Nov. 30, 2020, 5:21 p.m. UTC | #7
On Mon, 30 Nov 2020 14:44:42 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Sun, Nov 29, 2020 at 05:50:05PM +0900, Masami Hiramatsu wrote:
> > Good point. I think we can return, e.g. -EFAULT if we failed in
> > get_next(). Then, we can read out next page, for example.
> 
> Why -EFAULT?

Because it overruns the buffer. Maybe -E2BIG/ENODATA or any other
error (except for -EINVAL) is OK :)
 
> Running this
> 
>         size = 1;
>         ret = insn_decode(&insn, b, size, INSN_MODE_64)
> 
> i.e., buffer size is 1:
> 
> ./arch/x86/tools/insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x9994a137)
> insn buffer:
> 0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 
> supplied buf size: 15, ret 0
> supplied buf size: 2, ret 0
> supplied buf size: 3, ret 0
> supplied buf size: 4, ret 0
> insn_decode: entry
> insn_decode: will get_length
> insn_get_immediate: getting immediate
> insn_get_displacement: getting displacement
> insn_get_sib: getting sib
> insn_get_modrm: entry
> insn_get_opcode: entry
> insn_get_prefixes: entry, prefixes->got: 0
> insn_get_prefixes: 1
> insn_get_prefixes: REX
> insn_get_prefixes: VEX
> insn_get_prefixes: validate_next: 0
> insn_get_prefixes: insn->next_byte: 0x7ffec297c3e1, insn->end_kaddr: 0x7ffec297c3e1
> insn_get_prefixes: errored out
> supplied buf size: 1, ret -22
> 
> is caught in validate_next() where ->next_byte == ->end_kaddr.
> 
> I'm thinking we should return EOF here, to denote that we're reached the
> end and then propagate that error up the callchain.

EOF means the end of file and it is not an error. Here, since the
decoder fails to decode the data, so it should return an error code.

> 
> We don't have "define EOF" in the kernel but we can designate one for
> the insn decoder, perhaps
> 
> #define EOF      -1
> 
> as stdio.h does:
> 
> /* The value returned by fgetc and similar functions to indicate the
>    end of the file.  */
> #define EOF (-1)

It is because libc fgetc() returns -1 in any error case...

> 
> Hmm, but then the callers would need to know EOF too so maybe EIO or
> something.
> 
> In any case, it should be a value which callers should be able to use to
> get told that input buffer is truncated...

Yes!

Thank you,

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Dec. 2, 2020, 6:04 p.m. UTC | #8
On Tue, Dec 01, 2020 at 02:21:45AM +0900, Masami Hiramatsu wrote:
> Because it overruns the buffer. Maybe -E2BIG/ENODATA or any other
> error (except for -EINVAL) is OK :)

ENODATA it is. :)

And propagating that error value is easy because the err_out: labels are
all coming from the validate_next() error path so we basically know that
the buffer has ended.

./insn_sanity: Success: decoded and checked 10000 random instructions with 0 errors (seed:0x7bdfa56e)
insn buffer:
0x48 0xcf 0x48 0x83 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 
supplied buf size: 15, ret 0
supplied buf size: 2, ret 0
supplied buf size: 3, ret 0
supplied buf size: 4, ret 0
insn_decode: entry
insn_decode: will get_length
insn_get_immediate: getting immediate
insn_get_displacement: getting displacement
insn_get_sib: getting sib
insn_get_modrm: entry
insn_get_opcode: entry
insn_get_prefixes: entry, prefixes->got: 0
insn_get_prefixes: 1
insn_get_prefixes: REX
insn_get_prefixes: VEX
insn_get_prefixes: validate_next: 0
insn_get_prefixes: insn->next_byte: 0x7ffc211eb661, insn->end_kaddr: 0x7ffc211eb661
insn_get_prefixes: errored out
supplied buf size: 1, ret -61

Patch
diff mbox

diff --git a/arch/x86/tools/insn_sanity.c b/arch/x86/tools/insn_sanity.c
index 51309df285b4..41e1ae0cd833 100644
--- a/arch/x86/tools/insn_sanity.c
+++ b/arch/x86/tools/insn_sanity.c
@@ -220,6 +220,45 @@  static void parse_args(int argc, char **argv)
 	}
 }
 
+static void run_insn_limits_test(void)
+{
+	unsigned char b[MAX_INSN_SIZE];
+	struct insn insn;
+	int ret, i, size;
+
+	memset(b, INSN_NOP, MAX_INSN_SIZE);
+
+	/* IRETQ */
+	b[0] = 0x48;
+	b[1] = 0xcf;
+
+	/* partial add $0x8, %rsp */
+	b[2] = 0x48;
+	b[3] = 0x83;
+
+	printf("insn buffer:\n");
+
+	for (i = 0; i < MAX_INSN_SIZE; i++)
+		printf("0x%hhx ", b[i]);
+	printf("\n");
+
+	size = MAX_INSN_SIZE;
+	ret = insn_decode(&insn, b, size, INSN_MODE_64);
+	printf("supplied buf size: %d, ret %d\n", size, ret);
+
+	size = 2;
+	ret = insn_decode(&insn, b, size, INSN_MODE_64);
+	printf("supplied buf size: %d, ret %d\n", size, ret);
+
+	size = 3;
+	ret = insn_decode(&insn, b, size, INSN_MODE_64);
+	printf("supplied buf size: %d, ret %d\n", size, ret);
+
+	size = 4;
+	ret = insn_decode(&insn, b, size, INSN_MODE_64);
+	printf("supplied buf size: %d, ret %d\n", size, ret);
+}
+
 int main(int argc, char **argv)
 {
 	int insns = 0, ret;
@@ -265,5 +304,7 @@  int main(int argc, char **argv)
 		errors,
 		seed);
 
+	run_insn_limits_test();
+
 	return errors ? 1 : 0;
 }