All of lore.kernel.org
 help / color / mirror / Atom feed
* Mapping local-storage maps into user space
@ 2023-01-26 19:11 David Vernet
  2023-01-27  0:42 ` Hao Luo
  0 siblings, 1 reply; 3+ messages in thread
From: David Vernet @ 2023-01-26 19:11 UTC (permalink / raw)
  To: lsf-pc; +Cc: bpf

Hi everyone,

Another proposal from me for LSF/MM/BPF, and the last one for the time
being. I'd like to discuss enabling local-storage maps (e.g.
BPF_MAP_TYPE_TASK_STORAGE and BPF_MAP_TYPE_CGRP_STORAGE) to be r/o
mapped directly into user space. This would allow for quick lookups of
per-object state from user space, similar to how we allow it for
BPF_MAP_TYPE_ARRAY, without having to do something like either of the
following:

- Allocating a statically sized BPF_MAP_TYPE_ARRAY which is >= the # of
  possible local-storage elements, which is likely wasteful in terms of
  memory, and which isn't easy to iterate over.

- Use something like https://docs.kernel.org/bpf/bpf_iterators.html to
  iterate over tasks or cgroups, and collect information for each which
  is then dumped to user space. This would probably work, but it's not
  terribly performant in that it requires copying memory, trapping into
  the kernel, and full iteration even when it's only necessary to look
  up e.g. a single element.

Designing and implementing this would be pretty non-trivial. We'd have
to probably do a few things:

1. Write an allocator that dynamically allocates statically sized
   local-storage entries for local-storage maps, and populates them into
   pages which are mapped into user space.

2. Come up with some idr-like mechanism for mapping a local-storage
   object to an index into the mapping. For example, mapping a task with
   global pid 12345 to BPF_MAP_TYPE_TASK_STORAGE index 5, and providing
   ergonomic and safe ways to update these entries in the kernel and
   communicate them to user space.

3. Related to point 1 above, come up with some way to dynamically extend
   the user space mapping as more local-storage elements are added. We
   could potentially reserve a statically sized VA range and map all
   unused VA pages to the zero page, or instead possibly just leave them
   unmapped until they're actually needed.

There are a lot of open questions, but I think it could be very useful
if we can make it work. Let me know what you all think.

Thanks,
David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Mapping local-storage maps into user space
  2023-01-26 19:11 Mapping local-storage maps into user space David Vernet
@ 2023-01-27  0:42 ` Hao Luo
  2023-02-07 18:25   ` David Vernet
  0 siblings, 1 reply; 3+ messages in thread
From: Hao Luo @ 2023-01-27  0:42 UTC (permalink / raw)
  To: David Vernet; +Cc: lsf-pc, bpf

On Thu, Jan 26, 2023 at 11:11 AM David Vernet <void@manifault.com> wrote:
>
> Hi everyone,
>
> Another proposal from me for LSF/MM/BPF, and the last one for the time
> being. I'd like to discuss enabling local-storage maps (e.g.
> BPF_MAP_TYPE_TASK_STORAGE and BPF_MAP_TYPE_CGRP_STORAGE) to be r/o
> mapped directly into user space. This would allow for quick lookups of
> per-object state from user space, similar to how we allow it for
> BPF_MAP_TYPE_ARRAY, without having to do something like either of the
> following:
>
> - Allocating a statically sized BPF_MAP_TYPE_ARRAY which is >= the # of
>   possible local-storage elements, which is likely wasteful in terms of
>   memory, and which isn't easy to iterate over.
>
> - Use something like https://docs.kernel.org/bpf/bpf_iterators.html to
>   iterate over tasks or cgroups, and collect information for each which
>   is then dumped to user space. This would probably work, but it's not
>   terribly performant in that it requires copying memory, trapping into
>   the kernel, and full iteration even when it's only necessary to look
>   up e.g. a single element.
>
> Designing and implementing this would be pretty non-trivial. We'd have
> to probably do a few things:
>
> 1. Write an allocator that dynamically allocates statically sized
>    local-storage entries for local-storage maps, and populates them into
>    pages which are mapped into user space.
>
> 2. Come up with some idr-like mechanism for mapping a local-storage
>    object to an index into the mapping. For example, mapping a task with
>    global pid 12345 to BPF_MAP_TYPE_TASK_STORAGE index 5, and providing
>    ergonomic and safe ways to update these entries in the kernel and
>    communicate them to user space.
>
> 3. Related to point 1 above, come up with some way to dynamically extend
>    the user space mapping as more local-storage elements are added. We
>    could potentially reserve a statically sized VA range and map all
>    unused VA pages to the zero page, or instead possibly just leave them
>    unmapped until they're actually needed.
>
> There are a lot of open questions, but I think it could be very useful
> if we can make it work. Let me know what you all think.
>

Hi David,

I remember, I had a similar idea and played with it last year. I don't
recall why I needed that feature back then, probably looking for ways
to pass per-task information from userspace and read it from within
BPF. I sent an RFC to the mailing list [1]. You could take a look, see
whether it is of help to you.

[1] https://www.spinics.net/lists/bpf/msg57565.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Mapping local-storage maps into user space
  2023-01-27  0:42 ` Hao Luo
@ 2023-02-07 18:25   ` David Vernet
  0 siblings, 0 replies; 3+ messages in thread
From: David Vernet @ 2023-02-07 18:25 UTC (permalink / raw)
  To: Hao Luo; +Cc: lsf-pc, bpf

On Thu, Jan 26, 2023 at 04:42:20PM -0800, Hao Luo wrote:
> On Thu, Jan 26, 2023 at 11:11 AM David Vernet <void@manifault.com> wrote:
> >
> > Hi everyone,
> >
> > Another proposal from me for LSF/MM/BPF, and the last one for the time
> > being. I'd like to discuss enabling local-storage maps (e.g.
> > BPF_MAP_TYPE_TASK_STORAGE and BPF_MAP_TYPE_CGRP_STORAGE) to be r/o
> > mapped directly into user space. This would allow for quick lookups of
> > per-object state from user space, similar to how we allow it for
> > BPF_MAP_TYPE_ARRAY, without having to do something like either of the
> > following:
> >
> > - Allocating a statically sized BPF_MAP_TYPE_ARRAY which is >= the # of
> >   possible local-storage elements, which is likely wasteful in terms of
> >   memory, and which isn't easy to iterate over.
> >
> > - Use something like https://docs.kernel.org/bpf/bpf_iterators.html to
> >   iterate over tasks or cgroups, and collect information for each which
> >   is then dumped to user space. This would probably work, but it's not
> >   terribly performant in that it requires copying memory, trapping into
> >   the kernel, and full iteration even when it's only necessary to look
> >   up e.g. a single element.
> >
> > Designing and implementing this would be pretty non-trivial. We'd have
> > to probably do a few things:
> >
> > 1. Write an allocator that dynamically allocates statically sized
> >    local-storage entries for local-storage maps, and populates them into
> >    pages which are mapped into user space.
> >
> > 2. Come up with some idr-like mechanism for mapping a local-storage
> >    object to an index into the mapping. For example, mapping a task with
> >    global pid 12345 to BPF_MAP_TYPE_TASK_STORAGE index 5, and providing
> >    ergonomic and safe ways to update these entries in the kernel and
> >    communicate them to user space.
> >
> > 3. Related to point 1 above, come up with some way to dynamically extend
> >    the user space mapping as more local-storage elements are added. We
> >    could potentially reserve a statically sized VA range and map all
> >    unused VA pages to the zero page, or instead possibly just leave them
> >    unmapped until they're actually needed.
> >
> > There are a lot of open questions, but I think it could be very useful
> > if we can make it work. Let me know what you all think.
> >
> 
> Hi David,
> 
> I remember, I had a similar idea and played with it last year. I don't
> recall why I needed that feature back then, probably looking for ways
> to pass per-task information from userspace and read it from within
> BPF. I sent an RFC to the mailing list [1]. You could take a look, see
> whether it is of help to you.
> 
> [1] https://www.spinics.net/lists/bpf/msg57565.html

Hi Hao,

Thanks for sharing that thread, it's great to see that there is already
interest from other folks. It looks like the main use case you were
trying to enable was passing an fd from user space to a TLS element for
the current task, which the BPF prog would then pass to helpers that
take an fd. There was a need specifically to enable this for
unprivileged programs which can't e.g.  use bpf_prog_test_run to set the
fd. Alexei proposed an alternative option in [0] which it seemed like
everyone was on-board with.

[0]: https://lore.kernel.org/bpf/20220329232956.gbsr65jdbe4lw2m6@ast-mbp/

The use-case I was envisioning is a bit different in a couple ways:

- I was anticipating that user space could map an entire task (or
  cgroup, sk, etc) local-storage map, rather than a task only being able
  to mmap its own TLS entry. I think this approach would be more
  generalizable for other local-storage map types like cgroup and sk,
  and would also be useful for ghOSt, sched_ext, etc. It could also
  serve as a higher-performance alternative to bpf-iter for user space
  applications that don't want to have to trap into the kernel.

- I was envisioning this being read-only, though I expect it would be
  possible to enable writeable mappings as well. The tricky part here is
  that we will eventually certainly want to enable referenced kptrs in
  local-storage maps, so it's not always safe to let user space mutate
  local-storage entries.

- I'd like to avoid allocating an entire page for each entry. Most of
  the local-storage entries that I've used are far smaller than a page,
  so it seems prudent to have some kind of allocator that we could use
  to pack multiple entries into a single page. This would also have the
  benefit of potentially allowing an O(1) lookup for a map entry as
  well, rather than requiring us to do an O(n) iteration over a task's
  local storage entries when doing a lookup from a program. This is a
  super hand-wavey claim though -- a lot more details and validation
  need to be ironed out.

What do you think?

Thanks,
David

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-02-07 18:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-26 19:11 Mapping local-storage maps into user space David Vernet
2023-01-27  0:42 ` Hao Luo
2023-02-07 18:25   ` David Vernet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.