Do I really need to add mount2 and umount3 syscalls for some crazy experiment

All of lore.kernel.org
 help / color / mirror / Atom feed

* Do I really need to add mount2 and umount3 syscalls for some crazy experiment
@ 2023-01-09  4:46 Anadon
  2023-01-09  7:01 ` Amir Goldstein
  2023-01-10  0:48 ` Theodore Ts'o
  0 siblings, 2 replies; 4+ messages in thread
From: Anadon @ 2023-01-09  4:46 UTC (permalink / raw)
  To: linux-fsdevel

I never post, be gentle.

I am looking into implementing a distributed RAFT filesystem for
reasons.  Before this, I want what is in effect a simple pass-through
filesystem.  Something which just takes in calls to open, read, close,
etc and forwards them to a specified mounted* filesystem.  Hopefully
through FUSE before jumping straight into kernel development.

Doing this and having files appear in two places by calling `mount()`
then calling the (potentially) userland functions to the mapped file
by changing the file path is a way to technically accomplish
something.  This has the effect of the files being accessible in two
locations.  The problems start where the underlying filesystem won't
notify my passthrough layer if there are changes made.  Since my end
goal is to have a robust consensus filesystem, having all the files
able to silently be modified in such an easy and user accessible way
is a problem.  What would be better is to have some struct with all
relevant function pointers and data accessible.  That sounds like
adding syscalls `int mount2(const char* device, ..., struct
return_fs_interface)` and `int umuont3(struct return_fs_interface)`.
Adding two new syscalls which look almost nothing like other syscalls
all in the name to break "everything is a file" in favor of
"everything is an API" is a lot.  It sounds like a fight and work I
would like to avoid.

I have looked at `fsopen(...)` as an alternative, but it still does
not meet my use case.  Another way would be to compile in every
filesystem driver but this just seems downright mad.  Is there a good
option I have overlooked?  Am I even asking in the right place?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Do I really need to add mount2 and umount3 syscalls for some crazy experiment
  2023-01-09  4:46 Do I really need to add mount2 and umount3 syscalls for some crazy experiment Anadon
@ 2023-01-09  7:01 ` Amir Goldstein
  2023-01-10  0:48 ` Theodore Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Amir Goldstein @ 2023-01-09  7:01 UTC (permalink / raw)
  To: Anadon; +Cc: linux-fsdevel

On Mon, Jan 9, 2023 at 7:08 AM Anadon <joshua.r.marshall.1991@gmail.com> wrote:
>
> I never post, be gentle.
>
> I am looking into implementing a distributed RAFT filesystem for
> reasons.  Before this, I want what is in effect a simple pass-through
> filesystem.  Something which just takes in calls to open, read, close,
> etc and forwards them to a specified mounted* filesystem.  Hopefully
> through FUSE before jumping straight into kernel development.
>
> Doing this and having files appear in two places by calling `mount()`
> then calling the (potentially) userland functions to the mapped file
> by changing the file path is a way to technically accomplish
> something.  This has the effect of the files being accessible in two
> locations.  The problems start where the underlying filesystem won't
> notify my passthrough layer if there are changes made.  Since my end
> goal is to have a robust consensus filesystem, having all the files
> able to silently be modified in such an easy and user accessible way
> is a problem.

Have you considered using fanotify for the FUSE implementsation?

You can currently get async change notifications from fanotify.
Do you require synchronous change notifications?
Do you require the change notifications to survive system crashes?

Because that is what my HSM fanotify project is aiming to achieve:
https://github.com/amir73il/fsnotify-utils/wiki/Hierarchical-Storage-Management-API#tracking-local-modifications

> What would be better is to have some struct with all
> relevant function pointers and data accessible.  That sounds like
> adding syscalls `int mount2(const char* device, ..., struct
> return_fs_interface)` and `int umuont3(struct return_fs_interface)`.
> Adding two new syscalls which look almost nothing like other syscalls
> all in the name to break "everything is a file" in favor of
> "everything is an API" is a lot.  It sounds like a fight and work I
> would like to avoid.

Don't go there.

>
> I have looked at `fsopen(...)` as an alternative, but it still does
> not meet my use case.  Another way would be to compile in every
> filesystem driver but this just seems downright mad.  Is there a good
> option I have overlooked?  Am I even asking in the right place?

If you are looking for similar code, the overlayfs filesystem driver
is probably the closest to what you are looking for in upstream
kernel, because it takes two underlying paths and merges them
in one unified namespace.

Somewhat similar to leader change, some union files switch the backing
file at runtime (a.k.a copy up).

Upstream overlayfs does not watch for underlying filesystem changes,
in fact, those changes are not allowed, but it could be done.
I have another project where overlayfs driver watches the underlying
filesystem for changes:
https://github.com/amir73il/overlayfs/wiki/Overlayfs-watch

The out-of-tree aufs has had underlying change tracking for a long time.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Do I really need to add mount2 and umount3 syscalls for some crazy experiment
  2023-01-09  4:46 Do I really need to add mount2 and umount3 syscalls for some crazy experiment Anadon
  2023-01-09  7:01 ` Amir Goldstein
@ 2023-01-10  0:48 ` Theodore Ts'o
  2023-01-11  0:45   ` Anadon
  1 sibling, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2023-01-10  0:48 UTC (permalink / raw)
  To: Anadon; +Cc: linux-fsdevel

On Sun, Jan 08, 2023 at 11:46:38PM -0500, Anadon wrote:
> I am looking into implementing a distributed RAFT filesystem for
> reasons.  Before this, I want what is in effect a simple pass-through
> filesystem.  Something which just takes in calls to open, read, close,
> etc and forwards them to a specified mounted* filesystem.  Hopefully
> through FUSE before jumping straight into kernel development.
> 
> Doing this and having files appear in two places by calling `mount()`
> then calling the (potentially) userland functions to the mapped file
> by changing the file path is a way to technically accomplish
> something.  This has the effect of the files being accessible in two
> locations.

I fon't quite understand *why* you want to implement a passthrough
filesystem in terms of how it relates to creating a distributed RAFT
file system.

Files (and indeed, entire directory hierarchies) being accessible in
two locations is not a big deal; this can be done using a bind mount
without needing any kernel code.

And if the question is how to deal with files that can be modified
externally, from a system elsewhere in the local network (or
cluster/cell), well, all networked file systems such as NFS, et.al.,
do this.  If a networked file system knows that a particular file has
been modified, it can easily invalidate the local page cache copies of
the file.  Now, if that file is currently being used, and is being
mmap'ed into some process's address space, perhaps as the read-only
text (code) segment, what the semantics should be if it is modified
remotely out from under that process --- or whether the file should be
allowed to be modified at all if it is being used in certain ways, is
a semantic/design/policy question you need to answer before we can
talk about how this might be implemented.

> The problems start where the underlying filesystem won't
> notify my passthrough layer if there are changes made.

Now, how an underlying file system might notify your passthrough layer
will no doubt be comletely different from how a distributed/networked
file system will notify the node-level implementation of that file
system.  So I'm not at all sure how this is relevant for your use
case.  (And if you want a file to appear in two places at the same
time, just make that file *be* the same file in two places via a bind
mount.)

>  What would be better is to have some struct with all
> relevant function pointers and data accessible.  That sounds like
> adding syscalls `int mount2(const char* device, ..., struct
> return_fs_interface)` and `int umuont3(struct return_fs_interface)`.

I don't understand what you want to do here.  What is going to be in
this "struct return_fs_interface"?  Function pointers?  Do you really
want to have kernel code calling userspace functions?  Uh, that's a
really bad idea.  You should take a closer look at how the FUSE
kernel/usersace interface works, if the goal is to do something like
this via a userspace FUSE-like scheme.

> I have looked at `fsopen(...)` as an alternative, but it still does
> not meet my use case.  Another way would be to compile in every
> filesystem driver but this just seems downright mad.  Is there a good
> option I have overlooked?  Am I even asking in the right place?

I'm not sure what "compiling in efvery filesystem driver" is trying to
accomplish.  Compiling into *what*?   For what purpose?

I'm not sure you are asking the right questions.  It might be better
to say in more detail what are the functional requirements what it is
you are trying to achieve, before asking people to evaluate potential
implementation approaches.

Best regards,

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Do I really need to add mount2 and umount3 syscalls for some crazy experiment
  2023-01-10  0:48 ` Theodore Ts'o
@ 2023-01-11  0:45   ` Anadon
  0 siblings, 0 replies; 4+ messages in thread
From: Anadon @ 2023-01-11  0:45 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-fsdevel

Good evening Ted,

You bring up good points, and I'll clarify.  The original prompt for
me was that there are tons of network queue, distributed lock, and
network file sharing software.  On a single system, all of these have
been solved through simple file idioms like named pipes, file locking,
and basic file operations.  Wouldn't it be nice to have all those
idioms transparently work for single systems as in a distributed
manner?  And I'd like to avoid a single point of failure while better
understanding and addressing concerns system administrators have with
NFS.  You're right, I don't have many details about this prompt.
There's not much I can do about my ignorance without trying.

There are a set of characteristics which I would like to preserve.  I
would like connected mount points using a consensus algorithm to
preserve user facing guarantees specified by POSIX.1 such that an
operation applied to one mount point is applied synchronously to all
connected mount points.  Separate operations are asynchronous and it
is the responsibility of the user or program to handle these cases
appropriately, as would be the case if operating strictly locally with
similar concurrency conditions.  It should be accessible in some way
without special permissions and require secure operation via key
exchange and encrypted messaging.   This description is still
malformed.  I want to do a PhD on the subject, but I haven't done that
yet.

Before I step into that thicket, it looks like my trivial case is a
"pass-through" file system.  So I need to learn and implement that.
It seems like a pass-through would just forward all function calls.
That's non-trivial and I need to investigate security hooks because
they might meet my needs, overlayfs because it may allow locking the
underlying directory in a serviceable way, and much of the work
referenced by Amir.

On Mon, Jan 9, 2023 at 7:48 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Sun, Jan 08, 2023 at 11:46:38PM -0500, Anadon wrote:
> > I am looking into implementing a distributed RAFT filesystem for
> > reasons.  Before this, I want what is in effect a simple pass-through
> > filesystem.  Something which just takes in calls to open, read, close,
> > etc and forwards them to a specified mounted* filesystem.  Hopefully
> > through FUSE before jumping straight into kernel development.
> >
> > Doing this and having files appear in two places by calling `mount()`
> > then calling the (potentially) userland functions to the mapped file
> > by changing the file path is a way to technically accomplish
> > something.  This has the effect of the files being accessible in two
> > locations.
>
> I fon't quite understand *why* you want to implement a passthrough
> filesystem in terms of how it relates to creating a distributed RAFT
> file system.
>
> Files (and indeed, entire directory hierarchies) being accessible in
> two locations is not a big deal; this can be done using a bind mount
> without needing any kernel code.
>
> And if the question is how to deal with files that can be modified
> externally, from a system elsewhere in the local network (or
> cluster/cell), well, all networked file systems such as NFS, et.al.,
> do this.  If a networked file system knows that a particular file has
> been modified, it can easily invalidate the local page cache copies of
> the file.  Now, if that file is currently being used, and is being
> mmap'ed into some process's address space, perhaps as the read-only
> text (code) segment, what the semantics should be if it is modified
> remotely out from under that process --- or whether the file should be
> allowed to be modified at all if it is being used in certain ways, is
> a semantic/design/policy question you need to answer before we can
> talk about how this might be implemented.
>
> > The problems start where the underlying filesystem won't
> > notify my passthrough layer if there are changes made.
>
> Now, how an underlying file system might notify your passthrough layer
> will no doubt be comletely different from how a distributed/networked
> file system will notify the node-level implementation of that file
> system.  So I'm not at all sure how this is relevant for your use
> case.  (And if you want a file to appear in two places at the same
> time, just make that file *be* the same file in two places via a bind
> mount.)
>
> >  What would be better is to have some struct with all
> > relevant function pointers and data accessible.  That sounds like
> > adding syscalls `int mount2(const char* device, ..., struct
> > return_fs_interface)` and `int umuont3(struct return_fs_interface)`.
>
> I don't understand what you want to do here.  What is going to be in
> this "struct return_fs_interface"?  Function pointers?  Do you really
> want to have kernel code calling userspace functions?  Uh, that's a
> really bad idea.  You should take a closer look at how the FUSE
> kernel/usersace interface works, if the goal is to do something like
> this via a userspace FUSE-like scheme.
>
> > I have looked at `fsopen(...)` as an alternative, but it still does
> > not meet my use case.  Another way would be to compile in every
> > filesystem driver but this just seems downright mad.  Is there a good
> > option I have overlooked?  Am I even asking in the right place?
>
> I'm not sure what "compiling in efvery filesystem driver" is trying to
> accomplish.  Compiling into *what*?   For what purpose?
>
> I'm not sure you are asking the right questions.  It might be better
> to say in more detail what are the functional requirements what it is
> you are trying to achieve, before asking people to evaluate potential
> implementation approaches.
>
> Best regards,
>
>                                                 - Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-01-11  0:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09  4:46 Do I really need to add mount2 and umount3 syscalls for some crazy experiment Anadon
2023-01-09  7:01 ` Amir Goldstein
2023-01-10  0:48 ` Theodore Ts'o
2023-01-11  0:45   ` Anadon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.