All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [fuse-devel] Reconnect to FUSE session
       [not found] <CADVsYmhF2=Y9AktyHdvKq5=CzJBALBjKfrSu8+2+=YdkSRazpg@mail.gmail.com>
@ 2021-12-14 14:04 ` Miklos Szeredi
  2021-12-16 12:59   ` Andreas Gnau
  0 siblings, 1 reply; 2+ messages in thread
From: Miklos Szeredi @ 2021-12-14 14:04 UTC (permalink / raw)
  To: Robert Vasek; +Cc: fuse-devel, Hao Peng, linux-fsdevel

On Tue, 14 Dec 2021 at 13:58, Robert Vasek <rvasek01@gmail.com> wrote:
>
> Hello fuse-devel,
>
> I'd like to ask about the feasibility of having a reconnect feature added into the FUSE kernel module.
>
> The idea is that when a FUSE driver disconnects (process exited due to a bug, signal, etc.), all pending and future ops for that session would wait for that driver to appear again, and then continue as normal. Waiting would be on a timer, with ENOTCONN returned in case it times out. Obviously, "continue as normal" isn't possible for all FUSE drivers, as it depends on what they do and how they implement things -- they would have to opt-in for this feature.
>
> Use-cases span across basically anything where the lifecycle of a FUSE driver is managed by some external component (e.g. systemd, container orchestrators). This is especially true in containerized environments: volume mounts provided by FUSE drivers running in containers may get killed / rescheduled by the Orchestrator, or they may crash due to bugs, memory pressure, ..., leading to very possible data corruption and severed mounts. Having the ability to recover from such situations would greatly improve reliability of these systems.
>
> I haven't looked at how this would be implemented yet though. I'm just wondering if this makes sense at all and if you folks would be interested in such a feature?

A kernel patch[1] as well as example userspace code[2] has already
been proposed.

[1] https://lore.kernel.org/linux-fsdevel/CAPm50a+j8UL9g3UwpRsye5e+a=M0Hy7Tf1FdfwOrUUBWMyosNg@mail.gmail.com/

[2] https://lore.kernel.org/linux-fsdevel/CAPm50aLuK8Smy4NzdytUPmGM1vpzokKJdRuwxawUDA4jnJg=Fg@mail.gmail.com/

The example recovery is not very practical, but I can see how it would
be possible to extend to a read-only fs.

Is this what you had in mind?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [fuse-devel] Reconnect to FUSE session
  2021-12-14 14:04 ` [fuse-devel] Reconnect to FUSE session Miklos Szeredi
@ 2021-12-16 12:59   ` Andreas Gnau
  0 siblings, 0 replies; 2+ messages in thread
From: Andreas Gnau @ 2021-12-16 12:59 UTC (permalink / raw)
  To: Miklos Szeredi, Robert Vasek
  Cc: fuse-devel, linux-fsdevel, Hao Peng, swami, laxmanv, dusseau, remzi

On 14/12/2021 15:04, Miklos Szeredi wrote:
> On Tue, 14 Dec 2021 at 13:58, Robert Vasek <rvasek01@gmail.com> wrote:
>>
>> Hello fuse-devel,
>>
>> I'd like to ask about the feasibility of having a reconnect feature added into the FUSE kernel module.
>>
>> The idea is that when a FUSE driver disconnects (process exited due to a bug, signal, etc.), all pending and future ops for that session would wait for that driver to appear again, and then continue as normal. Waiting would be on a timer, with ENOTCONN returned in case it times out. Obviously, "continue as normal" isn't possible for all FUSE drivers, as it depends on what they do and how they implement things -- they would have to opt-in for this feature.
> 
> A kernel patch[1] as well as example userspace code[2] has already
> been proposed.
> 
> [1] https://lore.kernel.org/linux-fsdevel/CAPm50a+j8UL9g3UwpRsye5e+a=M0Hy7Tf1FdfwOrUUBWMyosNg@mail.gmail.com/
> 
> [2] https://lore.kernel.org/linux-fsdevel/CAPm50aLuK8Smy4NzdytUPmGM1vpzokKJdRuwxawUDA4jnJg=Fg@mail.gmail.com/
> 
> The example recovery is not very practical, but I can see how it would
> be possible to extend to a read-only fs.
> 

There has also been some related work in the paper
"Refuse to Crash with Re-FUSE"

https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.pdf
https://eurosys2011.cs.uni-salzburg.at/pdf/eurosys2011-sundararaman.pdf

The paper gives some insight into the challenges associated with 
restarting and it seems like it worked better for them than I would have 
thought. Not sure if any source-code for their work is available to 
reproduce their findings, though.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-12-16 13:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CADVsYmhF2=Y9AktyHdvKq5=CzJBALBjKfrSu8+2+=YdkSRazpg@mail.gmail.com>
2021-12-14 14:04 ` [fuse-devel] Reconnect to FUSE session Miklos Szeredi
2021-12-16 12:59   ` Andreas Gnau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.