All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: "Daniel T. Lee" <danieltimlee@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: <bpf@vger.kernel.org>, <netdev@vger.kernel.org>
Subject: Re: [PATCH] samples: bpf: fix tracex2 due to empty sys_write count argument
Date: Fri, 10 Dec 2021 10:28:12 -0800	[thread overview]
Message-ID: <e23dbf45-d75b-8411-bc62-d0e26cccab8a@fb.com> (raw)
In-Reply-To: <20211210111918.4904-1-danieltimlee@gmail.com>



On 12/10/21 3:19 AM, Daniel T. Lee wrote:
> Currently from syscall entry, argument can't be fetched correctly as a
> result of register cleanup.
> 
>      commit 6b8cf5cc9965 ("x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface")
> 
> For example in upper commit, registers are cleaned prior to syscall.
> To be more specific, sys_write syscall has count size as a third argument.
> But this can't be fetched from __x64_sys_enter/__s390x_sys_enter due to
> register cleanup. (e.g. [x86] xorl %r8d, %r8d / [s390x] xgr %r7, %r7)

is this the real reason? Did you build 32-bit user space application?
Note that the above commit is for compat syscalls.

> 
> This commit fix this problem by modifying the trace event to ksys_write
> instead of sys_write syscall entry.
> 
>      # Wrong example of 'write()' syscall argument fetching
>      # ./tracex2
>      ...
>      pid 50909 cmd dd uid 0
>             syscall write() stats
>       byte_size       : count     distribution
>         1 -> 1        : 4968837  |************************************* |
> 
>      # Successful example of 'write()' syscall argument fetching
>      # (dd's write bytes at a time defaults to 512)
>      # ./tracex2
>      ...
>      pid 3095 cmd dd uid 0
>             syscall write() stats
>       byte_size       : count     distribution
>      ...
>       256 -> 511      : 0        |                                      |
>       512 -> 1023     : 4968844  |************************************* |
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>   samples/bpf/tracex2_kern.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
> index 5bc696bac27d..96dff3bea227 100644
> --- a/samples/bpf/tracex2_kern.c
> +++ b/samples/bpf/tracex2_kern.c
> @@ -78,7 +78,7 @@ struct {
>   	__uint(max_entries, 1024);
>   } my_hist_map SEC(".maps");
>   
> -SEC("kprobe/" SYSCALL(sys_write))
> +SEC("kprobe/ksys_write")
>   int bpf_prog3(struct pt_regs *ctx)
>   {
>   	long write_size = PT_REGS_PARM3(ctx);

I think the real reason of the failure is due to SYSCALL_WRAPPER.
Please take a look at test_probe_write_user_kern.c.

The issue with ksys_write() is that it can easily be inlined. For 
example, the source code,

ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
{
	...
}

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
                 size_t, count)
{
         return ksys_write(fd, buf, count);
}

In my 5.12 kernel, I have
ffffffff81053b00 <__x64_sys_write>:
ffffffff81053b00: 0f 1f 44 00 00        nopl    (%rax,%rax)
ffffffff81053b05: 41 57                 pushq   %r15
ffffffff81053b07: 41 56                 pushq   %r14
ffffffff81053b09: 41 55                 pushq   %r13
ffffffff81053b0b: 41 54                 pushq   %r12
ffffffff81053b0d: 53                    pushq   %rbx
ffffffff81053b0e: 48 83 ec 10           subq    $16, %rsp
ffffffff81053b12: 65 48 8b 04 25 28 00 00 00    movq    %gs:40, %rax
ffffffff81053b1b: 48 89 44 24 08        movq    %rax, 8(%rsp)
ffffffff81053b20: 8b 47 70              movl    112(%rdi), %eax
ffffffff81053b23: 4c 8b 7f 60           movq    96(%rdi), %r15
ffffffff81053b27: 4c 8b 67 68           movq    104(%rdi), %r12
ffffffff81053b2b: 89 c7                 movl    %eax, %edi
ffffffff81053b2d: e8 6e a3 00 00        callq   0xffffffff8105dea0 
<__fdget_pos>
...

The ksys_write() is inlined.

      parent reply	other threads:[~2021-12-10 18:28 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 11:19 [PATCH] samples: bpf: fix tracex2 due to empty sys_write count argument Daniel T. Lee
2021-12-10 18:00 ` John Fastabend
2021-12-10 18:28 ` Yonghong Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e23dbf45-d75b-8411-bc62-d0e26cccab8a@fb.com \
    --to=yhs@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=danieltimlee@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.