netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next v2] Update perf ring buffer to prevent corruption
       [not found] <VI1PR8303MB00802FE5D289E0D7BA95B7DDFBEE0@VI1PR8303MB0080.EURPRD83.prod.outlook.com>
@ 2020-11-06  4:19 ` Alexei Starovoitov
  2020-11-09 11:29   ` Peter Zijlstra
  2020-11-09 18:22   ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2020-11-06  4:19 UTC (permalink / raw)
  To: Kevin Sheldrake, Peter Zijlstra, Ingo Molnar, Daniel Borkmann,
	Network Development
  Cc: bpf, Andrii Nakryiko, KP Singh

On Thu, Nov 5, 2020 at 7:18 AM Kevin Sheldrake
<Kevin.Sheldrake@microsoft.com> wrote:
>
> Resent due to some failure at my end.  Apologies if it arrives twice.
>
> From 63e34d4106b4dd767f9bfce951f8a35f14b52072 Mon Sep 17 00:00:00 2001
> From: Kevin Sheldrake <kevin.sheldrake@microsoft.com>
> Date: Thu, 5 Nov 2020 12:18:53 +0000
> Subject: [PATCH] Update perf ring buffer to prevent corruption from
>  bpf_perf_output_event()
>
> The bpf_perf_output_event() helper takes a sample size parameter of u64, but
> the underlying perf ring buffer uses a u16 internally. This 64KB maximum size
> has to also accommodate a variable sized header. Failure to observe this
> restriction can result in corruption of the perf ring buffer as samples
> overlap.
>
> Track the sample size and return -E2BIG if too big to fit into the u16
> size parameter.
>
> Signed-off-by: Kevin Sheldrake <kevin.sheldrake@microsoft.com>

The fix makes sense to me.
Peter, Ingo,
should I take it through the bpf tree or you want to route via tip?

> ---
>  include/linux/perf_event.h |  2 +-
>  kernel/events/core.c       | 40 ++++++++++++++++++++++++++--------------
>  2 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0c19d27..b9802e5 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1060,7 +1060,7 @@ extern void perf_output_sample(struct perf_output_handle *handle,
>                                struct perf_event_header *header,
>                                struct perf_sample_data *data,
>                                struct perf_event *event);
> -extern void perf_prepare_sample(struct perf_event_header *header,
> +extern int perf_prepare_sample(struct perf_event_header *header,
>                                 struct perf_sample_data *data,
>                                 struct perf_event *event,
>                                 struct pt_regs *regs);
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index da467e1..c6c4a3c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7016,15 +7016,17 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
>         return callchain ?: &__empty_callchain;
>  }
>
> -void perf_prepare_sample(struct perf_event_header *header,
> +int perf_prepare_sample(struct perf_event_header *header,
>                          struct perf_sample_data *data,
>                          struct perf_event *event,
>                          struct pt_regs *regs)
>  {
>         u64 sample_type = event->attr.sample_type;
> +       u32 header_size = header->size;
> +
>
>         header->type = PERF_RECORD_SAMPLE;
> -       header->size = sizeof(*header) + event->header_size;
> +       header_size = sizeof(*header) + event->header_size;
>
>         header->misc = 0;
>         header->misc |= perf_misc_flags(regs);
> @@ -7042,7 +7044,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>
>                 size += data->callchain->nr;
>
> -               header->size += size * sizeof(u64);
> +               header_size += size * sizeof(u64);
>         }
>
>         if (sample_type & PERF_SAMPLE_RAW) {
> @@ -7067,7 +7069,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>                         size = sizeof(u64);
>                 }
>
> -               header->size += size;
> +               header_size += size;
>         }
>
>         if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
> @@ -7079,7 +7081,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>                         size += data->br_stack->nr
>                               * sizeof(struct perf_branch_entry);
>                 }
> -               header->size += size;
> +               header_size += size;
>         }
>
>         if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER))
> @@ -7095,7 +7097,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>                         size += hweight64(mask) * sizeof(u64);
>                 }
>
> -               header->size += size;
> +               header_size += size;
>         }
>
>         if (sample_type & PERF_SAMPLE_STACK_USER) {
> @@ -7108,7 +7110,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>                 u16 stack_size = event->attr.sample_stack_user;
>                 u16 size = sizeof(u64);
>
> -               stack_size = perf_sample_ustack_size(stack_size, header->size,
> +               stack_size = perf_sample_ustack_size(stack_size, header_size,
>                                                      data->regs_user.regs);
>
>                 /*
> @@ -7120,7 +7122,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>                         size += sizeof(u64) + stack_size;
>
>                 data->stack_user_size = stack_size;
> -               header->size += size;
> +               header_size += size;
>         }
>
>         if (sample_type & PERF_SAMPLE_REGS_INTR) {
> @@ -7135,7 +7137,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>                         size += hweight64(mask) * sizeof(u64);
>                 }
>
> -               header->size += size;
> +               header_size += size;
>         }
>
>         if (sample_type & PERF_SAMPLE_PHYS_ADDR)
> @@ -7154,7 +7156,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>         if (sample_type & PERF_SAMPLE_AUX) {
>                 u64 size;
>
> -               header->size += sizeof(u64); /* size */
> +               header_size += sizeof(u64); /* size */
>
>                 /*
>                  * Given the 16bit nature of header::size, an AUX sample can
> @@ -7162,14 +7164,20 @@ void perf_prepare_sample(struct perf_event_header *header,
>                  * Make sure this doesn't happen by using up to U16_MAX bytes
>                  * per sample in total (rounded down to 8 byte boundary).
>                  */
> -               size = min_t(size_t, U16_MAX - header->size,
> +               size = min_t(size_t, U16_MAX - header_size,
>                              event->attr.aux_sample_size);
>                 size = rounddown(size, 8);
>                 size = perf_prepare_sample_aux(event, data, size);
>
> -               WARN_ON_ONCE(size + header->size > U16_MAX);
> -               header->size += size;
> +               WARN_ON_ONCE(size + header_size > U16_MAX);
> +               header_size += size;
>         }
> +
> +       if (header_size > U16_MAX)
> +               return -E2BIG;
> +
> +       header->size = header_size;
> +
>         /*
>          * If you're adding more sample types here, you likely need to do
>          * something about the overflowing header::size, like repurpose the
> @@ -7179,6 +7187,8 @@ void perf_prepare_sample(struct perf_event_header *header,
>          * do here next.
>          */
>         WARN_ON_ONCE(header->size & 7);
> +
> +       return 0;
>  }
>
>  static __always_inline int
> @@ -7196,7 +7206,9 @@ __perf_event_output(struct perf_event *event,
>         /* protect the callchain buffers */
>         rcu_read_lock();
>
> -       perf_prepare_sample(&header, data, event, regs);
> +       err = perf_prepare_sample(&header, data, event, regs);
> +       if (err)
> +               goto exit;
>
>         err = output_begin(&handle, event, header.size);
>         if (err)
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v2] Update perf ring buffer to prevent corruption
  2020-11-06  4:19 ` [PATCH bpf-next v2] Update perf ring buffer to prevent corruption Alexei Starovoitov
@ 2020-11-09 11:29   ` Peter Zijlstra
  2020-11-09 14:22     ` [EXTERNAL] " Kevin Sheldrake
  2020-11-09 18:22   ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2020-11-09 11:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kevin Sheldrake, Ingo Molnar, Daniel Borkmann,
	Network Development, bpf, Andrii Nakryiko, KP Singh

On Thu, Nov 05, 2020 at 08:19:47PM -0800, Alexei Starovoitov wrote:
> On Thu, Nov 5, 2020 at 7:18 AM Kevin Sheldrake
> <Kevin.Sheldrake@microsoft.com> wrote:
> >
> > Resent due to some failure at my end.  Apologies if it arrives twice.
> >
> > From 63e34d4106b4dd767f9bfce951f8a35f14b52072 Mon Sep 17 00:00:00 2001
> > From: Kevin Sheldrake <kevin.sheldrake@microsoft.com>
> > Date: Thu, 5 Nov 2020 12:18:53 +0000
> > Subject: [PATCH] Update perf ring buffer to prevent corruption from
> >  bpf_perf_output_event()
> >
> > The bpf_perf_output_event() helper takes a sample size parameter of u64, but
> > the underlying perf ring buffer uses a u16 internally. This 64KB maximum size
> > has to also accommodate a variable sized header. Failure to observe this
> > restriction can result in corruption of the perf ring buffer as samples
> > overlap.
> >
> > Track the sample size and return -E2BIG if too big to fit into the u16
> > size parameter.
> >
> > Signed-off-by: Kevin Sheldrake <kevin.sheldrake@microsoft.com>
> 
> The fix makes sense to me.
> Peter, Ingo,
> should I take it through the bpf tree or you want to route via tip?

What are you doing to trigger this? The Changelog is devoid of much
useful information?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [EXTERNAL] Re: [PATCH bpf-next v2] Update perf ring buffer to prevent corruption
  2020-11-09 11:29   ` Peter Zijlstra
@ 2020-11-09 14:22     ` Kevin Sheldrake
  2020-11-09 17:55       ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Sheldrake @ 2020-11-09 14:22 UTC (permalink / raw)
  To: Peter Zijlstra, Alexei Starovoitov
  Cc: Ingo Molnar, Daniel Borkmann, Network Development, bpf,
	Andrii Nakryiko, KP Singh



> -----Original Message-----
> From: Peter Zijlstra <peterz@infradead.org>
> Sent: 09 November 2020 11:29
> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Kevin Sheldrake <Kevin.Sheldrake@microsoft.com>; Ingo Molnar
> <mingo@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>; Network
> Development <netdev@vger.kernel.org>; bpf@vger.kernel.org; Andrii
> Nakryiko <andrii.nakryiko@gmail.com>; KP Singh <kpsingh@google.com>
> Subject: [EXTERNAL] Re: [PATCH bpf-next v2] Update perf ring buffer to
> prevent corruption
> 
> On Thu, Nov 05, 2020 at 08:19:47PM -0800, Alexei Starovoitov wrote:
> > On Thu, Nov 5, 2020 at 7:18 AM Kevin Sheldrake
> > <Kevin.Sheldrake@microsoft.com> wrote:
> > >
> > > Resent due to some failure at my end.  Apologies if it arrives twice.
> > >
> > > From 63e34d4106b4dd767f9bfce951f8a35f14b52072 Mon Sep 17 00:00:00
> 2001
> > > From: Kevin Sheldrake <kevin.sheldrake@microsoft.com>
> > > Date: Thu, 5 Nov 2020 12:18:53 +0000
> > > Subject: [PATCH] Update perf ring buffer to prevent corruption from
> > >  bpf_perf_output_event()
> > >
> > > The bpf_perf_output_event() helper takes a sample size parameter of
> u64, but
> > > the underlying perf ring buffer uses a u16 internally. This 64KB maximum
> size
> > > has to also accommodate a variable sized header. Failure to observe this
> > > restriction can result in corruption of the perf ring buffer as samples
> > > overlap.
> > >
> > > Track the sample size and return -E2BIG if too big to fit into the u16
> > > size parameter.
> > >
> > > Signed-off-by: Kevin Sheldrake <kevin.sheldrake@microsoft.com>
> >
> > The fix makes sense to me.
> > Peter, Ingo,
> > should I take it through the bpf tree or you want to route via tip?
> 
> What are you doing to trigger this? The Changelog is devoid of much
> useful information?

Hello

I triggered the corruption by sending samples larger than 64KB-24 bytes
to a perf ring buffer from eBPF using bpf_perf_event_output().  The u16
that holds the size in the struct perf_event_header is overflowed and
the distance between adjacent samples in the perf ring buffer is set
by this overflowed value; hence if samples of 64KB are sent, adjacent
samples are placed 24 bytes apart in the ring buffer, with the later ones
overwriting parts of the earlier ones.  If samples aren't read as quickly
as they are received, then they are corrupted by the time they are read.

Attempts to fix this in the eBPF verifier failed as the actual sample is
constructed from a variable sized header in addition to the raw data
supplied from eBPF.  The sample is constructed in perf_prepare_sample(),
outside of the eBPF engine.

My proposed fix is to check that the constructed size is <U16_MAX before
committing it to the struct perf_event_header::size variable.

A reproduction of the bug can be found at:
https://github.com/microsoft/OMS-Auditd-Plugin/tree/MSTIC-Research/ebpf_perf_output_poc

Thanks

Kevin Sheldrake


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXTERNAL] Re: [PATCH bpf-next v2] Update perf ring buffer to prevent corruption
  2020-11-09 14:22     ` [EXTERNAL] " Kevin Sheldrake
@ 2020-11-09 17:55       ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-11-09 17:55 UTC (permalink / raw)
  To: Kevin Sheldrake
  Cc: Alexei Starovoitov, Ingo Molnar, Daniel Borkmann,
	Network Development, bpf, Andrii Nakryiko, KP Singh

On Mon, Nov 09, 2020 at 02:22:28PM +0000, Kevin Sheldrake wrote:

> I triggered the corruption by sending samples larger than 64KB-24 bytes
> to a perf ring buffer from eBPF using bpf_perf_event_output().  The u16
> that holds the size in the struct perf_event_header is overflowed and
> the distance between adjacent samples in the perf ring buffer is set
> by this overflowed value; hence if samples of 64KB are sent, adjacent
> samples are placed 24 bytes apart in the ring buffer, with the later ones
> overwriting parts of the earlier ones.  If samples aren't read as quickly
> as they are received, then they are corrupted by the time they are read.
> 
> Attempts to fix this in the eBPF verifier failed as the actual sample is
> constructed from a variable sized header in addition to the raw data
> supplied from eBPF.  The sample is constructed in perf_prepare_sample(),
> outside of the eBPF engine.
> 
> My proposed fix is to check that the constructed size is <U16_MAX before
> committing it to the struct perf_event_header::size variable.
> 
> A reproduction of the bug can be found at:
> https://github.com/microsoft/OMS-Auditd-Plugin/tree/MSTIC-Research/ebpf_perf_output_poc

OK, so I can't actually operate any of this fancy BPF nonsense. But if
I'm not mistaken this calls into:
kernel/trace/bpf_trace.c:BPF_CALL_5(bpf_perf_event_output) with a giant
@data.

Let me try and figure out what that code does.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v2] Update perf ring buffer to prevent corruption
  2020-11-06  4:19 ` [PATCH bpf-next v2] Update perf ring buffer to prevent corruption Alexei Starovoitov
  2020-11-09 11:29   ` Peter Zijlstra
@ 2020-11-09 18:22   ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-11-09 18:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kevin Sheldrake, Ingo Molnar, Daniel Borkmann,
	Network Development, bpf, Andrii Nakryiko, KP Singh

On Thu, Nov 05, 2020 at 08:19:47PM -0800, Alexei Starovoitov wrote:

> > Subject: [PATCH] Update perf ring buffer to prevent corruption from
> >  bpf_perf_output_event()

$Subject is broken, it lacks subsystem prefix.

> >
> > The bpf_perf_output_event() helper takes a sample size parameter of u64, but
> > the underlying perf ring buffer uses a u16 internally. This 64KB maximum size
> > has to also accommodate a variable sized header. Failure to observe this
> > restriction can result in corruption of the perf ring buffer as samples
> > overlap.
> >
> > Track the sample size and return -E2BIG if too big to fit into the u16
> > size parameter.
> >
> > Signed-off-by: Kevin Sheldrake <kevin.sheldrake@microsoft.com>

> > ---
> >  include/linux/perf_event.h |  2 +-
> >  kernel/events/core.c       | 40 ++++++++++++++++++++++++++--------------
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 0c19d27..b9802e5 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -1060,7 +1060,7 @@ extern void perf_output_sample(struct perf_output_handle *handle,
> >                                struct perf_event_header *header,
> >                                struct perf_sample_data *data,
> >                                struct perf_event *event);
> > -extern void perf_prepare_sample(struct perf_event_header *header,
> > +extern int perf_prepare_sample(struct perf_event_header *header,
> >                                 struct perf_sample_data *data,
> >                                 struct perf_event *event,
> >                                 struct pt_regs *regs);
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index da467e1..c6c4a3c 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -7016,15 +7016,17 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
> >         return callchain ?: &__empty_callchain;
> >  }
> >
> > -void perf_prepare_sample(struct perf_event_header *header,
> > +int perf_prepare_sample(struct perf_event_header *header,
> >                          struct perf_sample_data *data,
> >                          struct perf_event *event,
> >                          struct pt_regs *regs)

please re-align things.

> >  {
> >         u64 sample_type = event->attr.sample_type;
> > +       u32 header_size = header->size;
> > +
> >
> >         header->type = PERF_RECORD_SAMPLE;
> > -       header->size = sizeof(*header) + event->header_size;
> > +       header_size = sizeof(*header) + event->header_size;
> >
> >         header->misc = 0;
> >         header->misc |= perf_misc_flags(regs);
> > @@ -7042,7 +7044,7 @@ void perf_prepare_sample(struct perf_event_header *header,
> >
> >                 size += data->callchain->nr;
> >
> > -               header->size += size * sizeof(u64);
> > +               header_size += size * sizeof(u64);
> >         }
> >
> >         if (sample_type & PERF_SAMPLE_RAW) {
> > @@ -7067,7 +7069,7 @@ void perf_prepare_sample(struct perf_event_header *header,
> >                         size = sizeof(u64);
> >                 }
> >
> > -               header->size += size;
> > +               header_size += size;
> >         }

AFAICT perf_raw_frag::size is a u32, so the above addition can already
fully overflow. Best is probably to make header_size u64 and delay that
until the final tally below.

> >
> >         if (sample_type & PERF_SAMPLE_BRANCH_STACK) {

> > @@ -7162,14 +7164,20 @@ void perf_prepare_sample(struct perf_event_header *header,
> >                  * Make sure this doesn't happen by using up to U16_MAX bytes
> >                  * per sample in total (rounded down to 8 byte boundary).
> >                  */
> > -               size = min_t(size_t, U16_MAX - header->size,
> > +               size = min_t(size_t, U16_MAX - header_size,
> >                              event->attr.aux_sample_size);
> >                 size = rounddown(size, 8);
> >                 size = perf_prepare_sample_aux(event, data, size);
> >
> > -               WARN_ON_ONCE(size + header->size > U16_MAX);
> > -               header->size += size;
> > +               WARN_ON_ONCE(size + header_size > U16_MAX);
> > +               header_size += size;
> >         }
> > +
> > +       if (header_size > U16_MAX)
> > +               return -E2BIG;
> > +
> > +       header->size = header_size;
> > +
> >         /*
> >          * If you're adding more sample types here, you likely need to do
> >          * something about the overflowing header::size, like repurpose the
> > @@ -7179,6 +7187,8 @@ void perf_prepare_sample(struct perf_event_header *header,
> >          * do here next.
> >          */
> >         WARN_ON_ONCE(header->size & 7);
> > +
> > +       return 0;
> >  }
> >
> >  static __always_inline int
> > @@ -7196,7 +7206,9 @@ __perf_event_output(struct perf_event *event,
> >         /* protect the callchain buffers */
> >         rcu_read_lock();
> >
> > -       perf_prepare_sample(&header, data, event, regs);
> > +       err = perf_prepare_sample(&header, data, event, regs);
> > +       if (err)
> > +               goto exit;

This is wrong I think. The thing is that when output_begin() below
returns an error, there either is no buffer (in which case we can't do
anything much at all) or it will have incremented rb->lost.

This OTOH will completely fail to report the loss. The error case here
is to immediately try and emit a RECORD_LOST event, but then please also
consider these patches:

  https://lkml.kernel.org/r/20201030151345.540479897@infradead.org

(which I'll be pushing into tip/perf/urgent soonish)

> >
> >         err = output_begin(&handle, event, header.size);
> >         if (err)
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-09 18:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <VI1PR8303MB00802FE5D289E0D7BA95B7DDFBEE0@VI1PR8303MB0080.EURPRD83.prod.outlook.com>
2020-11-06  4:19 ` [PATCH bpf-next v2] Update perf ring buffer to prevent corruption Alexei Starovoitov
2020-11-09 11:29   ` Peter Zijlstra
2020-11-09 14:22     ` [EXTERNAL] " Kevin Sheldrake
2020-11-09 17:55       ` Peter Zijlstra
2020-11-09 18:22   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).