From: Jia Zhu <zhujia.zj@bytedance.com> To: dhowells@redhat.com, xiang@kernel.org, jefflexu@linux.alibaba.com Cc: linux-cachefs@redhat.com, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, yinxin.x@bytedance.com, Jia Zhu <zhujia.zj@bytedance.com> Subject: [PATCH V2 0/5] Introduce daemon failover mechanism to recover from crashing Date: Fri, 14 Oct 2022 11:07:40 +0800 [thread overview] Message-ID: <20221014030745.25748-1-zhujia.zj@bytedance.com> (raw) Changes since V1: 1. Extract cachefiles_ondemand_select_req() from cachefiles_ondemand_daemon_read() to make the code more readable. 2. Fix a UAF bug reported by JeffleXu. 3. Modify some code comments. [Background] ============ In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon crashes), subsequent read and inflight requests based on these fd will return -EIO. Even if above mentioned case is tolerable for some individual users, but when it happenens in real cloud service production environment, such IO errors will be passed to cloud service users and impact its working jobs. It's terrible for cloud service stability. [Design] ======== This patchset introduce three states for ondemand object: CLOSE: Object which just be allocated or closed by user daemon. OPEN: Object which related OPEN request has been processed correctly. REOPENING: Object which has been closed, and is drived to open by a read request. [Flow Path] =========== [Daemon Crash] 0. Daemon use UDS send/receive fd to keep and pass the fd reference of "/dev/cachefiles". 1. User daemon crashes -> restart and recover dev fd's reference. 2. User daemon write "restore" to device. 2.1 Reset the object's state from CLOSE to OPENING. 2.2 Init a work which reinit the object and add it to wq. (daemon can get rid of kernel space and handle that open request). 3. The user of upper filesystem won't notice that the daemon ever crashed since the inflight IO is restored and handled correctly. [Daemon Close fd] 1. User daemon closes an anonymous fd. 2. User daemon reads a READ request which the associated anonymous fd was closed and init a work which re-open the object. 3. User daemon handles above open request normally. 4. The user of upper filesystem won't notice that the daemon ever closed any fd since the closed object is re-opened and related request was handled correctly. [Test] ====== There is a testcase for above mentioned scenario. A user process read the file by fscache ondemand reading. At the same time, we kill the daemon constantly. The expected result is that the file read by user is consistent with original, and the user doesn't notice that daemon has ever been killed. https://github.com/userzj/demand-read-cachefilesd/commits/failover-test [GitWeb] ======== https://github.com/userzj/linux/tree/fscache-failover-v3 Jia Zhu (5): cachefiles: introduce object ondemand state cachefiles: extract ondemand info field from cachefiles_object cachefiles: resend an open request if the read request's object is closed cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode cachefiles: add restore command to recover inflight ondemand read requests fs/cachefiles/daemon.c | 14 +++- fs/cachefiles/interface.c | 6 ++ fs/cachefiles/internal.h | 58 +++++++++++++- fs/cachefiles/ondemand.c | 156 ++++++++++++++++++++++++++++---------- 4 files changed, 188 insertions(+), 46 deletions(-) -- 2.20.1
WARNING: multiple messages have this Message-ID (diff)
From: Jia Zhu <zhujia.zj@bytedance.com> To: dhowells@redhat.com, xiang@kernel.org, jefflexu@linux.alibaba.com Cc: linux-kernel@vger.kernel.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org, yinxin.x@bytedance.com Subject: [PATCH V2 0/5] Introduce daemon failover mechanism to recover from crashing Date: Fri, 14 Oct 2022 11:07:40 +0800 [thread overview] Message-ID: <20221014030745.25748-1-zhujia.zj@bytedance.com> (raw) Changes since V1: 1. Extract cachefiles_ondemand_select_req() from cachefiles_ondemand_daemon_read() to make the code more readable. 2. Fix a UAF bug reported by JeffleXu. 3. Modify some code comments. [Background] ============ In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon crashes), subsequent read and inflight requests based on these fd will return -EIO. Even if above mentioned case is tolerable for some individual users, but when it happenens in real cloud service production environment, such IO errors will be passed to cloud service users and impact its working jobs. It's terrible for cloud service stability. [Design] ======== This patchset introduce three states for ondemand object: CLOSE: Object which just be allocated or closed by user daemon. OPEN: Object which related OPEN request has been processed correctly. REOPENING: Object which has been closed, and is drived to open by a read request. [Flow Path] =========== [Daemon Crash] 0. Daemon use UDS send/receive fd to keep and pass the fd reference of "/dev/cachefiles". 1. User daemon crashes -> restart and recover dev fd's reference. 2. User daemon write "restore" to device. 2.1 Reset the object's state from CLOSE to OPENING. 2.2 Init a work which reinit the object and add it to wq. (daemon can get rid of kernel space and handle that open request). 3. The user of upper filesystem won't notice that the daemon ever crashed since the inflight IO is restored and handled correctly. [Daemon Close fd] 1. User daemon closes an anonymous fd. 2. User daemon reads a READ request which the associated anonymous fd was closed and init a work which re-open the object. 3. User daemon handles above open request normally. 4. The user of upper filesystem won't notice that the daemon ever closed any fd since the closed object is re-opened and related request was handled correctly. [Test] ====== There is a testcase for above mentioned scenario. A user process read the file by fscache ondemand reading. At the same time, we kill the daemon constantly. The expected result is that the file read by user is consistent with original, and the user doesn't notice that daemon has ever been killed. https://github.com/userzj/demand-read-cachefilesd/commits/failover-test [GitWeb] ======== https://github.com/userzj/linux/tree/fscache-failover-v3 Jia Zhu (5): cachefiles: introduce object ondemand state cachefiles: extract ondemand info field from cachefiles_object cachefiles: resend an open request if the read request's object is closed cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode cachefiles: add restore command to recover inflight ondemand read requests fs/cachefiles/daemon.c | 14 +++- fs/cachefiles/interface.c | 6 ++ fs/cachefiles/internal.h | 58 +++++++++++++- fs/cachefiles/ondemand.c | 156 ++++++++++++++++++++++++++++---------- 4 files changed, 188 insertions(+), 46 deletions(-) -- 2.20.1
next reply other threads:[~2022-10-14 3:08 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-10-14 3:07 Jia Zhu [this message] 2022-10-14 3:07 ` [PATCH V2 0/5] Introduce daemon failover mechanism to recover from crashing Jia Zhu 2022-10-14 3:07 ` [PATCH V2 1/5] cachefiles: introduce object ondemand state Jia Zhu 2022-10-14 3:07 ` Jia Zhu 2022-10-14 6:31 ` JeffleXu 2022-10-14 6:31 ` JeffleXu 2022-10-14 3:07 ` [PATCH V2 2/5] cachefiles: extract ondemand info field from cachefiles_object Jia Zhu 2022-10-14 3:07 ` Jia Zhu 2022-10-14 6:32 ` JeffleXu 2022-10-14 6:32 ` JeffleXu 2022-10-14 3:07 ` [PATCH V2 3/5] cachefiles: resend an open request if the read request's object is closed Jia Zhu 2022-10-14 3:07 ` Jia Zhu 2022-10-14 6:31 ` JeffleXu 2022-10-14 6:31 ` JeffleXu 2022-10-14 3:07 ` [PATCH V2 4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode Jia Zhu 2022-10-14 3:07 ` Jia Zhu 2022-10-14 6:32 ` JeffleXu 2022-10-14 6:32 ` JeffleXu 2022-10-14 3:07 ` [PATCH V2 5/5] cachefiles: add restore command to recover inflight ondemand read requests Jia Zhu 2022-10-14 3:07 ` Jia Zhu 2022-10-14 6:33 ` JeffleXu 2022-10-14 6:33 ` JeffleXu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20221014030745.25748-1-zhujia.zj@bytedance.com \ --to=zhujia.zj@bytedance.com \ --cc=dhowells@redhat.com \ --cc=jefflexu@linux.alibaba.com \ --cc=linux-cachefs@redhat.com \ --cc=linux-erofs@lists.ozlabs.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=xiang@kernel.org \ --cc=yinxin.x@bytedance.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.