* Mapping local-storage maps into user space
@ 2023-01-26 19:11 David Vernet
2023-01-27 0:42 ` Hao Luo
0 siblings, 1 reply; 3+ messages in thread
From: David Vernet @ 2023-01-26 19:11 UTC (permalink / raw)
To: lsf-pc; +Cc: bpf
Hi everyone,
Another proposal from me for LSF/MM/BPF, and the last one for the time
being. I'd like to discuss enabling local-storage maps (e.g.
BPF_MAP_TYPE_TASK_STORAGE and BPF_MAP_TYPE_CGRP_STORAGE) to be r/o
mapped directly into user space. This would allow for quick lookups of
per-object state from user space, similar to how we allow it for
BPF_MAP_TYPE_ARRAY, without having to do something like either of the
following:
- Allocating a statically sized BPF_MAP_TYPE_ARRAY which is >= the # of
possible local-storage elements, which is likely wasteful in terms of
memory, and which isn't easy to iterate over.
- Use something like https://docs.kernel.org/bpf/bpf_iterators.html to
iterate over tasks or cgroups, and collect information for each which
is then dumped to user space. This would probably work, but it's not
terribly performant in that it requires copying memory, trapping into
the kernel, and full iteration even when it's only necessary to look
up e.g. a single element.
Designing and implementing this would be pretty non-trivial. We'd have
to probably do a few things:
1. Write an allocator that dynamically allocates statically sized
local-storage entries for local-storage maps, and populates them into
pages which are mapped into user space.
2. Come up with some idr-like mechanism for mapping a local-storage
object to an index into the mapping. For example, mapping a task with
global pid 12345 to BPF_MAP_TYPE_TASK_STORAGE index 5, and providing
ergonomic and safe ways to update these entries in the kernel and
communicate them to user space.
3. Related to point 1 above, come up with some way to dynamically extend
the user space mapping as more local-storage elements are added. We
could potentially reserve a statically sized VA range and map all
unused VA pages to the zero page, or instead possibly just leave them
unmapped until they're actually needed.
There are a lot of open questions, but I think it could be very useful
if we can make it work. Let me know what you all think.
Thanks,
David
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Mapping local-storage maps into user space
2023-01-26 19:11 Mapping local-storage maps into user space David Vernet
@ 2023-01-27 0:42 ` Hao Luo
2023-02-07 18:25 ` David Vernet
0 siblings, 1 reply; 3+ messages in thread
From: Hao Luo @ 2023-01-27 0:42 UTC (permalink / raw)
To: David Vernet; +Cc: lsf-pc, bpf
On Thu, Jan 26, 2023 at 11:11 AM David Vernet <void@manifault.com> wrote:
>
> Hi everyone,
>
> Another proposal from me for LSF/MM/BPF, and the last one for the time
> being. I'd like to discuss enabling local-storage maps (e.g.
> BPF_MAP_TYPE_TASK_STORAGE and BPF_MAP_TYPE_CGRP_STORAGE) to be r/o
> mapped directly into user space. This would allow for quick lookups of
> per-object state from user space, similar to how we allow it for
> BPF_MAP_TYPE_ARRAY, without having to do something like either of the
> following:
>
> - Allocating a statically sized BPF_MAP_TYPE_ARRAY which is >= the # of
> possible local-storage elements, which is likely wasteful in terms of
> memory, and which isn't easy to iterate over.
>
> - Use something like https://docs.kernel.org/bpf/bpf_iterators.html to
> iterate over tasks or cgroups, and collect information for each which
> is then dumped to user space. This would probably work, but it's not
> terribly performant in that it requires copying memory, trapping into
> the kernel, and full iteration even when it's only necessary to look
> up e.g. a single element.
>
> Designing and implementing this would be pretty non-trivial. We'd have
> to probably do a few things:
>
> 1. Write an allocator that dynamically allocates statically sized
> local-storage entries for local-storage maps, and populates them into
> pages which are mapped into user space.
>
> 2. Come up with some idr-like mechanism for mapping a local-storage
> object to an index into the mapping. For example, mapping a task with
> global pid 12345 to BPF_MAP_TYPE_TASK_STORAGE index 5, and providing
> ergonomic and safe ways to update these entries in the kernel and
> communicate them to user space.
>
> 3. Related to point 1 above, come up with some way to dynamically extend
> the user space mapping as more local-storage elements are added. We
> could potentially reserve a statically sized VA range and map all
> unused VA pages to the zero page, or instead possibly just leave them
> unmapped until they're actually needed.
>
> There are a lot of open questions, but I think it could be very useful
> if we can make it work. Let me know what you all think.
>
Hi David,
I remember, I had a similar idea and played with it last year. I don't
recall why I needed that feature back then, probably looking for ways
to pass per-task information from userspace and read it from within
BPF. I sent an RFC to the mailing list [1]. You could take a look, see
whether it is of help to you.
[1] https://www.spinics.net/lists/bpf/msg57565.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Mapping local-storage maps into user space
2023-01-27 0:42 ` Hao Luo
@ 2023-02-07 18:25 ` David Vernet
0 siblings, 0 replies; 3+ messages in thread
From: David Vernet @ 2023-02-07 18:25 UTC (permalink / raw)
To: Hao Luo; +Cc: lsf-pc, bpf
On Thu, Jan 26, 2023 at 04:42:20PM -0800, Hao Luo wrote:
> On Thu, Jan 26, 2023 at 11:11 AM David Vernet <void@manifault.com> wrote:
> >
> > Hi everyone,
> >
> > Another proposal from me for LSF/MM/BPF, and the last one for the time
> > being. I'd like to discuss enabling local-storage maps (e.g.
> > BPF_MAP_TYPE_TASK_STORAGE and BPF_MAP_TYPE_CGRP_STORAGE) to be r/o
> > mapped directly into user space. This would allow for quick lookups of
> > per-object state from user space, similar to how we allow it for
> > BPF_MAP_TYPE_ARRAY, without having to do something like either of the
> > following:
> >
> > - Allocating a statically sized BPF_MAP_TYPE_ARRAY which is >= the # of
> > possible local-storage elements, which is likely wasteful in terms of
> > memory, and which isn't easy to iterate over.
> >
> > - Use something like https://docs.kernel.org/bpf/bpf_iterators.html to
> > iterate over tasks or cgroups, and collect information for each which
> > is then dumped to user space. This would probably work, but it's not
> > terribly performant in that it requires copying memory, trapping into
> > the kernel, and full iteration even when it's only necessary to look
> > up e.g. a single element.
> >
> > Designing and implementing this would be pretty non-trivial. We'd have
> > to probably do a few things:
> >
> > 1. Write an allocator that dynamically allocates statically sized
> > local-storage entries for local-storage maps, and populates them into
> > pages which are mapped into user space.
> >
> > 2. Come up with some idr-like mechanism for mapping a local-storage
> > object to an index into the mapping. For example, mapping a task with
> > global pid 12345 to BPF_MAP_TYPE_TASK_STORAGE index 5, and providing
> > ergonomic and safe ways to update these entries in the kernel and
> > communicate them to user space.
> >
> > 3. Related to point 1 above, come up with some way to dynamically extend
> > the user space mapping as more local-storage elements are added. We
> > could potentially reserve a statically sized VA range and map all
> > unused VA pages to the zero page, or instead possibly just leave them
> > unmapped until they're actually needed.
> >
> > There are a lot of open questions, but I think it could be very useful
> > if we can make it work. Let me know what you all think.
> >
>
> Hi David,
>
> I remember, I had a similar idea and played with it last year. I don't
> recall why I needed that feature back then, probably looking for ways
> to pass per-task information from userspace and read it from within
> BPF. I sent an RFC to the mailing list [1]. You could take a look, see
> whether it is of help to you.
>
> [1] https://www.spinics.net/lists/bpf/msg57565.html
Hi Hao,
Thanks for sharing that thread, it's great to see that there is already
interest from other folks. It looks like the main use case you were
trying to enable was passing an fd from user space to a TLS element for
the current task, which the BPF prog would then pass to helpers that
take an fd. There was a need specifically to enable this for
unprivileged programs which can't e.g. use bpf_prog_test_run to set the
fd. Alexei proposed an alternative option in [0] which it seemed like
everyone was on-board with.
[0]: https://lore.kernel.org/bpf/20220329232956.gbsr65jdbe4lw2m6@ast-mbp/
The use-case I was envisioning is a bit different in a couple ways:
- I was anticipating that user space could map an entire task (or
cgroup, sk, etc) local-storage map, rather than a task only being able
to mmap its own TLS entry. I think this approach would be more
generalizable for other local-storage map types like cgroup and sk,
and would also be useful for ghOSt, sched_ext, etc. It could also
serve as a higher-performance alternative to bpf-iter for user space
applications that don't want to have to trap into the kernel.
- I was envisioning this being read-only, though I expect it would be
possible to enable writeable mappings as well. The tricky part here is
that we will eventually certainly want to enable referenced kptrs in
local-storage maps, so it's not always safe to let user space mutate
local-storage entries.
- I'd like to avoid allocating an entire page for each entry. Most of
the local-storage entries that I've used are far smaller than a page,
so it seems prudent to have some kind of allocator that we could use
to pack multiple entries into a single page. This would also have the
benefit of potentially allowing an O(1) lookup for a map entry as
well, rather than requiring us to do an O(n) iteration over a task's
local storage entries when doing a lookup from a program. This is a
super hand-wavey claim though -- a lot more details and validation
need to be ironed out.
What do you think?
Thanks,
David
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-02-07 18:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-26 19:11 Mapping local-storage maps into user space David Vernet
2023-01-27 0:42 ` Hao Luo
2023-02-07 18:25 ` David Vernet
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.