All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Marchevsky <davemarchevsky@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Rik van Riel <riel@surriel.com>,
	Ilya Leoshkevich <iii@linux.ibm.com>, Yonghong Song <yhs@fb.com>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads
Date: Wed, 18 May 2022 04:20:21 -0400	[thread overview]
Message-ID: <9b7cefd5-4906-35d6-ad61-bf7d2ee06033@fb.com> (raw)
In-Reply-To: <CAEf4BzaM0SC3D66NC3djt1fsEQcJ-af0-EgPx5UV8YLDLu8ibg@mail.gmail.com>

On 5/16/22 7:26 PM, Andrii Nakryiko wrote:   
> On Thu, May 12, 2022 at 12:43 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>
>> Handle xmm0,...,xmm15 registers when parsing USDT arguments. Currently
>> only the first 64 bits of the fetched value are returned as I haven't
>> seen the rest of the register used in practice.
>>
>> This patch also handles floats in USDT arg spec by ignoring the fact
>> that they're floats and considering them scalar. Currently we can't do
>> float math in BPF programs anyways, so might as well support passing to
>> userspace and converting there.
>>
>> We can use existing ARG_REG sscanf + logic, adding XMM-specific logic
>> when calc_pt_regs_off fails. If the reg is xmm, arg_spec's reg_off is
>> repurposed to hold reg_no, which is passed to bpf_get_reg_val. Since the
>> helper does the digging around in fxregs_state it's not necessary to
>> calculate offset in bpf code for these regs.
>>
>> NOTE: Changes here cause verification failure for existing USDT tests.
>> Specifically, BPF_USDT prog 'usdt12' fails to verify due to too many
>> insns despite not having its insn count significantly changed.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  tools/lib/bpf/usdt.bpf.h | 36 ++++++++++++++++++++--------
>>  tools/lib/bpf/usdt.c     | 51 ++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 73 insertions(+), 14 deletions(-)
>>
>> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
>> index 4181fddb3687..7b5ed4cbaa2f 100644
>> --- a/tools/lib/bpf/usdt.bpf.h
>> +++ b/tools/lib/bpf/usdt.bpf.h
>> @@ -43,6 +43,7 @@ enum __bpf_usdt_arg_type {
>>         BPF_USDT_ARG_CONST,
>>         BPF_USDT_ARG_REG,
>>         BPF_USDT_ARG_REG_DEREF,
>> +       BPF_USDT_ARG_XMM_REG,
>>  };
>>
>>  struct __bpf_usdt_arg_spec {
>> @@ -129,7 +130,9 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>  {
>>         struct __bpf_usdt_spec *spec;
>>         struct __bpf_usdt_arg_spec *arg_spec;
>> -       unsigned long val;
>> +       struct pt_regs *btf_regs;
>> +       struct task_struct *btf_task;
>> +       struct { __u64 a; __u64 unused; } val = {};
>>         int err, spec_id;
>>
>>         *res = 0;
>> @@ -151,7 +154,7 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>                 /* Arg is just a constant ("-4@$-9" in USDT arg spec).
>>                  * value is recorded in arg_spec->val_off directly.
>>                  */
>> -               val = arg_spec->val_off;
>> +               val.a = arg_spec->val_off;
>>                 break;
>>         case BPF_USDT_ARG_REG:
>>                 /* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
>> @@ -159,7 +162,20 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>                  * struct pt_regs. To keep things simple user-space parts
>>                  * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
>>                  */
>> -               err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
>> +               err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
>> +               if (err)
>> +                       return err;
>> +               break;
>> +       case BPF_USDT_ARG_XMM_REG:
> 
> nit: a bit too XMM-specific name here, we probably want to keep it a bit

Agreed.

> 
>> +               /* Same as above, but arg is an xmm reg, so can't look
>> +                * in pt_regs, need to use special helper.
>> +                * reg_off is the regno ("xmm0" -> regno 0, etc)
>> +                */
>> +               btf_task = bpf_get_current_task_btf();
>> +               btf_regs = (struct pt_regs *)bpf_task_pt_regs(btf_task);
> 
> I'd like to avoid taking dependency on bpf_get_current_task_btf() for
> rare case of XMM register, which makes it impossible to do USDT on
> older kernels. It seems like supporting reading registers from current
> (and maybe current pt_regs context) should cover a lot of practical
> uses.
> 

Yep. We talked about this today. Will remove.

>> +               err = bpf_get_reg_val(&val, sizeof(val),
> 
> But regardless of the above, we'll need to use CO-RE to detect support
> for this new BPF helper (probably using bpf_core_enum_value_exists()?)
> to allow using USDTs on older kernels.
> 

Will add.

> 
>> +                                    ((u64)arg_spec->reg_off + BPF_GETREG_X86_XMM0) << 32,
>> +                                    btf_regs, btf_task);
>>                 if (err)
>>                         return err;
>>                 break;
>> @@ -171,14 +187,14 @@ int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
>>                  * from pt_regs, then do another user-space probe read to
>>                  * fetch argument value itself.
>>                  */
>> -               err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
>> +               err = bpf_probe_read_kernel(&val.a, sizeof(val.a), (void *)ctx + arg_spec->reg_off);
>>                 if (err)
>>                         return err;
>> -               err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
>> +               err = bpf_probe_read_user(&val.a, sizeof(val.a), (void *)val.a + arg_spec->val_off);
> 
> is the useful value in xmm register normally in lower 64-bits of it?
> is it possible to just request reading just the first 64 bits from
> bpf_get_reg_val() and avoid this ugly union?

For USDT usecase, I've only seen lower 64 bits used. Should be possible to just
grab those, will see if there's a clean way to integrate such an option.

> 
>>                 if (err)
>>                         return err;
>>  #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
>> -               val >>= arg_spec->arg_bitshift;
>> +               val.a >>= arg_spec->arg_bitshift;
>>  #endif
>>                 break;
>>         default:
> 
> [...]
> 
>> +static int calc_xmm_regno(const char *reg_name)
>> +{
>> +       static struct {
>> +               const char *name;
>> +               __u16 regno;
>> +       } xmm_reg_map[] = {
>> +               { "xmm0",  0 },
>> +               { "xmm1",  1 },
>> +               { "xmm2",  2 },
>> +               { "xmm3",  3 },
>> +               { "xmm4",  4 },
>> +               { "xmm5",  5 },
>> +               { "xmm6",  6 },
>> +               { "xmm7",  7 },
>> +#ifdef __x86_64__
>> +               { "xmm8",  8 },
>> +               { "xmm9",  9 },
>> +               { "xmm10",  10 },
>> +               { "xmm11",  11 },
>> +               { "xmm12",  12 },
>> +               { "xmm13",  13 },
>> +               { "xmm14",  14 },
>> +               { "xmm15",  15 },
> 
> no-x86 arches parse this generically with sscanf(), seems like we can
> do this simple approach here as well?
> 

Agreed.

> 
>> +#endif
>> +       };
>> +       int i;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(xmm_reg_map); i++) {
>> +               if (strcmp(reg_name, xmm_reg_map[i].name) == 0)
>> +                       return xmm_reg_map[i].regno;
>> +       }
>> +
>>         return -ENOENT;
>>  }
>>
> 
> [...]

  reply	other threads:[~2022-05-18  8:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12  7:43 [RFC PATCH bpf-next 0/5] bpf: add get_reg_val helper Dave Marchevsky
2022-05-12  7:43 ` [RFC PATCH bpf-next 1/5] x86/fpu: Move context.h to include/asm Dave Marchevsky
2022-05-12 13:56   ` David Vernet
2022-05-14  0:44   ` Alexei Starovoitov
2022-05-12  7:43 ` [RFC PATCH bpf-next 2/5] bpf: add get_reg_val helper Dave Marchevsky
2022-05-12 15:29   ` David Vernet
2022-05-18  8:07     ` Dave Marchevsky
2022-05-14  0:41   ` Alexei Starovoitov
2022-05-18  7:35     ` Dave Marchevsky
2022-05-12  7:43 ` [RFC PATCH bpf-next 3/5] libbpf: usdt lib wiring of xmm reads Dave Marchevsky
2022-05-14  0:43   ` Alexei Starovoitov
2022-05-16 23:26   ` Andrii Nakryiko
2022-05-18  8:20     ` Dave Marchevsky [this message]
2022-05-12  7:43 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add test for USDT parse of xmm reg Dave Marchevsky
2022-05-16 23:31   ` Andrii Nakryiko
2022-05-17  1:17     ` Alexei Starovoitov
2022-05-18 23:56       ` Andrii Nakryiko
2022-05-12  7:43 ` [RFC PATCH bpf-next 5/5] selftests/bpf: get_reg_val test exercising fxsave fetch Dave Marchevsky
2022-05-12 17:47   ` Dave Marchevsky
2022-05-16 23:28   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b7cefd5-4906-35d6-ad61-bf7d2ee06033@fb.com \
    --to=davemarchevsky@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=iii@linux.ibm.com \
    --cc=kernel-team@fb.com \
    --cc=riel@surriel.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.