All of lore.kernel.org
 help / color / mirror / Atom feed
* Struct_ops Questions
@ 2023-01-14  1:05 Daniel Rosenberg
  2023-01-17 19:05 ` Martin KaFai Lau
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Rosenberg @ 2023-01-14  1:05 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

I'm currently working on switching Fuse-BPF over to using struct_ops,
and have a few questions.

I noticed that the lifespan of the [name]_struct_op struct passed in
reg, and the associated maps have a lifespan that can't be influenced
by the subsystem. Fuse-bpf keeps struct ops on an arbitrary number of
inodes which will potentially outlive the op structure. My current
plan is to have fuse treat unreg similar to if the daemon simply
stopped responding, failing all impacted calls with ENOTCONN. I'm
currently looking at two approaches.

1. Copy the struct received in reg, and have a flag to indicate if the
structure is live, with unreg just marking it as dead and some RCU
style sync to ensure we don't lose the function pointers post check.
2. Maintain an additional struct that points to the reg provided
struct. Null out that pointer in unreg, with the same sort of RCU
sync.

I'm currently leaning towards 1, but not sure what the best approach
is here, or if there should be some way for the subsystem to grab a
kref on the maps/struct_op structures. Given the analogue of the FUSE
daemon being able to disappear at any given time, I think one of the
above options should be enough.

The old fuse-bpf implementation, which had its own program type
defined, would use the program fd as an identifier, which allowed it
to increment the ref count as needed. That sort of information isn't
exposed to the register function, and you can't reach the struct_ops
structure from a map fd as is. Either of those would allow us to
directly use the map/struct objects without needing to maintain an
extra layer of duplicated data. Currently all the register function
does is add information to be able to check if the map still exists,
which wouldn't be needed if we could just grab a ref.

I'm not sure how to handle Fuse being built as a module. Currently,
bpf_struct_ops uses an include file list to define all available
struct_ops, along with externs for their bpf_struct_ops structure. If
we build fuse as a module, that either would not be available, or
would require us to build extra things into the kernel when we make
fuse as a module, which sort of seems to defeat the point.

Do we need a module interface here? At the moment I'm messing around
with just having the reg/unreg functions implemented within the FUSE
module, with all of the verification functions built in on the bpf
side. I've got a rough prototype working, but there's some messiness
around unloading the module while there are still struct_op programs
registered. If you unload and reload the module, the struct_op program
will still show up via bpftools, but would be unusable since it would
no longer be registered. All of that goes away if we can directly use
the map fd as an identifier. That wouldn't be useful for anything that
requires extra setup in their reg function of course.

It seems like a fair bit of these issues go away if we allow for a way
to grab the specific struct_op structure from the map fd. Would that
be a reasonable thing to expose to a module?

-Daniel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Struct_ops Questions
  2023-01-14  1:05 Struct_ops Questions Daniel Rosenberg
@ 2023-01-17 19:05 ` Martin KaFai Lau
  2023-01-24  1:55   ` Daniel Rosenberg
  0 siblings, 1 reply; 3+ messages in thread
From: Martin KaFai Lau @ 2023-01-17 19:05 UTC (permalink / raw)
  To: Daniel Rosenberg; +Cc: bpf, Alexei Starovoitov

On 1/13/23 5:05 PM, Daniel Rosenberg wrote:
> I'm currently working on switching Fuse-BPF over to using struct_ops,
> and have a few questions.
> 
> I noticed that the lifespan of the [name]_struct_op struct passed in
> reg, and the associated maps have a lifespan that can't be influenced
> by the subsystem. Fuse-bpf keeps struct ops on an arbitrary number of
> inodes which will potentially outlive the op structure. My current
> plan is to have fuse treat unreg similar to if the daemon simply
> stopped responding, failing all impacted calls with ENOTCONN. I'm
> currently looking at two approaches.
> 
> 1. Copy the struct received in reg, and have a flag to indicate if the
> structure is live, with unreg just marking it as dead and some RCU
> style sync to ensure we don't lose the function pointers post check.
> 2. Maintain an additional struct that points to the reg provided
> struct. Null out that pointer in unreg, with the same sort of RCU
> sync.
> 
> I'm currently leaning towards 1, but not sure what the best approach
> is here, or if there should be some way for the subsystem to grab a
> kref on the maps/struct_op structures. Given the analogue of the FUSE
> daemon being able to disappear at any given time, I think one of the
> above options should be enough.
> 
> The old fuse-bpf implementation, which had its own program type
> defined, would use the program fd as an identifier, which allowed it
> to increment the ref count as needed. That sort of information isn't
> exposed to the register function, and you can't reach the struct_ops
> structure from a map fd as is. Either of those would allow us to
> directly use the map/struct objects without needing to maintain an
> extra layer of duplicated data. Currently all the register function
> does is add information to be able to check if the map still exists,
> which wouldn't be needed if we could just grab a ref.
> 
> I'm not sure how to handle Fuse being built as a module. Currently,
> bpf_struct_ops uses an include file list to define all available
> struct_ops, along with externs for their bpf_struct_ops structure. If
> we build fuse as a module, that either would not be available, or
> would require us to build extra things into the kernel when we make
> fuse as a module, which sort of seems to defeat the point.
> 
> Do we need a module interface here? At the moment I'm messing around
> with just having the reg/unreg functions implemented within the FUSE
> module, with all of the verification functions built in on the bpf
> side. I've got a rough prototype working, but there's some messiness
> around unloading the module while there are still struct_op programs
> registered. If you unload and reload the module, the struct_op program
> will still show up via bpftools, but would be unusable since it would
> no longer be registered. All of that goes away if we can directly use
> the map fd as an identifier. That wouldn't be useful for anything that
> requires extra setup in their reg function of course.
> 
> It seems like a fair bit of these issues go away if we allow for a way
> to grab the specific struct_op structure from the map fd. Would that
> be a reasonable thing to expose to a module?

I think the first question is related to the refcnt of the struct_ops. Whenever 
a new tcp connection is established, it also needs to bump the refcnt of a (bpf) 
tcp_congestion_ops. The tcp subsystem currently does a bpf_try_module_get() 
which then calls bpf_struct_ops_get() if 'owner == BPF_MODULE_OWNER'. 
bpf_struct_ops_get() will inc the refcnt of the struct_ops.

Regarding unreg, the st_map->st_ops->unreg() will unregister the struct_ops from 
the tcp subsystem. The future tcp connection won't be able to find this (bpf) 
tcp_congestion_ops.

After unreg(), the existing tcp connections could still hold a refcnt to the 
already unregistered (bpf) tcp_congestion_ops. The refcnt will eventually be 
released when all these old connections are closed. This is similar to all other 
kernel tcp-cc modules (eg. when tcp_dctcp.c is built as a module).  Note that 
once unreg, the struct_ops will still be shown in 'bpftools struct_ops dump 
id...' but the status will be in BPF_STRUCT_OPS_STATE_TOBEFREE and it won't be 
usable by new connection.

Does the above behavior work for FUSE also? If not, how is FUSE different?

I think the next set of question is about when FUSE itself is built as a module. 
The first is how to register 'struct bpf_struct_ops bpf_fuse_struct_ops'. You 
are correct that right now it is done during compile time in 
bpf_struct_ops_types.h. To make FUSE itself built as a module and 
bpf_fuse_struct_ops defined in the FUSE module, some work is needed in 
bpf_struct_ops.c to allow registering struct_ops in runtime during module_init. 
For module reference counting, I think the bpf_fuse_struct_ops should be able to 
hold one refcnt of the fuse module. Not sure how bpf_fuse_struct_ops looks like 
and I also don't know how grabbing the map fd will help. It seems you already 
have something working, so it may be easier to discuss base on that.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Struct_ops Questions
  2023-01-17 19:05 ` Martin KaFai Lau
@ 2023-01-24  1:55   ` Daniel Rosenberg
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Rosenberg @ 2023-01-24  1:55 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, Alexei Starovoitov

On Tue, Jan 17, 2023 at 11:06 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/13/23 5:05 PM, Daniel Rosenberg wrote:
>
> I think the first question is related to the refcnt of the struct_ops. Whenever
> a new tcp connection is established, it also needs to bump the refcnt of a (bpf)
> tcp_congestion_ops. The tcp subsystem currently does a bpf_try_module_get()
> which then calls bpf_struct_ops_get() if 'owner == BPF_MODULE_OWNER'.
> bpf_struct_ops_get() will inc the refcnt of the struct_ops.
>

Thanks! I hadn't noticed the 'owner == BPF_MODULE_OWNER' case. That
fits what I need exactly. The only issue there is that those functions
aren't exported for module use, which is easy enough to change.

>
> I think the next set of question is about when FUSE itself is built as a module.
> The first is how to register 'struct bpf_struct_ops bpf_fuse_struct_ops'. You
> are correct that right now it is done during compile time in
> bpf_struct_ops_types.h. To make FUSE itself built as a module and
> bpf_fuse_struct_ops defined in the FUSE module, some work is needed in
> bpf_struct_ops.c to allow registering struct_ops in runtime during module_init.
> For module reference counting, I think the bpf_fuse_struct_ops should be able to
> hold one refcnt of the fuse module. Not sure how bpf_fuse_struct_ops looks like
> and I also don't know how grabbing the map fd will help. It seems you already
> have something working, so it may be easier to discuss base on that.

For the moment, I'm just building the verification side into the
kernel and exporting a function to register the reg/unreg functions.
Adjusting bpf_struct_ops.c for registering struct_ops at runtime is
probably a better path. I'll look into that after I've got more
working under the new system. I still need to figure out how to get
dynptrs working with struct_ops, which I think is going to involve a
new dynptr type... Something like DYNPTR_TYPE_KERN, where the memory
is managed by the kernel, and only fully initialized dynptrs are
passed in. It's possible that BPF_DYNPTR_TYPE_RINGBUF might mostly
cover that already, but I imagine there's some ringbuf specific things
to it... At the moment, I've just got a toy setup for
bpf_fuse_struct_ops to figure out the verification I need, so there's
not all that much to see there yet.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-24  1:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-14  1:05 Struct_ops Questions Daniel Rosenberg
2023-01-17 19:05 ` Martin KaFai Lau
2023-01-24  1:55   ` Daniel Rosenberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.