* [RESEND1, PATCH 0/2] fuse: allow filesystems to have precise control over data cache @ 2019-03-27 9:15 Kirill Smelkov 2019-03-27 9:15 ` [RESEND1, PATCH 1/2] fuse: convert printk -> pr_* Kirill Smelkov 2019-03-27 10:14 ` [RESEND1, PATCH v2 2/2] fuse: allow filesystems to have precise control over data cache Kirill Smelkov 0 siblings, 2 replies; 7+ messages in thread From: Kirill Smelkov @ 2019-03-27 9:15 UTC (permalink / raw) To: Miklos Szeredi, Miklos Szeredi Cc: Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, linux-kernel, Han-Wen Nienhuys, Andrew Morton, Kirill Smelkov Miklos, This is a resend of the patches that teach fs/fuse/ to give filesystems full control over data cache if the filesystem server indicates to kernel that it is fully responsible for data cache invalidation. This functionality is essential when the data in cache are relatively big and it is very desirable to avoid automatically clearing the data cache of inode on file size change. The second patch of the series describes the problem in details as well as the fix to it. I send the change initially ~ 2 weeks ago https://lwn.net/ml/linux-fsdevel/20190315212556.9315-1-kirr@nexedi.com/ but had not heard from you at all. Could you please have a look? Thanks beforehand, Kirill Kirill Smelkov (2): fuse: convert printk -> pr_* fuse: allow filesystems to have precise control over data cache fs/fuse/cuse.c | 13 +++++++------ fs/fuse/dev.c | 4 ++-- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 18 +++++++++++++----- include/uapi/linux/fuse.h | 7 ++++++- 5 files changed, 35 insertions(+), 14 deletions(-) -- 2.21.0.392.gf8f6787159 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RESEND1, PATCH 1/2] fuse: convert printk -> pr_* 2019-03-27 9:15 [RESEND1, PATCH 0/2] fuse: allow filesystems to have precise control over data cache Kirill Smelkov @ 2019-03-27 9:15 ` Kirill Smelkov 2019-04-23 14:57 ` Miklos Szeredi 2019-03-27 10:14 ` [RESEND1, PATCH v2 2/2] fuse: allow filesystems to have precise control over data cache Kirill Smelkov 1 sibling, 1 reply; 7+ messages in thread From: Kirill Smelkov @ 2019-03-27 9:15 UTC (permalink / raw) To: Miklos Szeredi, Miklos Szeredi Cc: Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, linux-kernel, Han-Wen Nienhuys, Andrew Morton, Kirill Smelkov Functions, like pr_err, are a more modern variant of printing compared to printk. They could be used to denoise sources by using needed level in the print function name, and by automatically inserting per-driver / function / ... print prefix as defined by pr_fmt macro. pr_* are also said to be used in Documentation/process/coding-style.rst and more recent code - for example overlayfs - uses them instead of printk. Convert CUSE and FUSE to use the new pr_* functions. CUSE output stays completely unchanged, while FUSE output is amended a bit for "trying to steal weird page" warning - the second line now comes also with "fuse:" prefix. I hope it is ok. Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> --- fs/fuse/cuse.c | 13 +++++++------ fs/fuse/dev.c | 4 ++-- fs/fuse/fuse_i.h | 4 ++++ fs/fuse/inode.c | 6 +++--- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 55a26f351467..4b41df1d4642 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -33,6 +33,8 @@ * closed. */ +#define pr_fmt(fmt) "CUSE: " fmt + #include <linux/fuse.h> #include <linux/cdev.h> #include <linux/device.h> @@ -225,7 +227,7 @@ static int cuse_parse_one(char **pp, char *end, char **keyp, char **valp) return 0; if (end[-1] != '\0') { - printk(KERN_ERR "CUSE: info not properly terminated\n"); + pr_err("info not properly terminated\n"); return -EINVAL; } @@ -242,7 +244,7 @@ static int cuse_parse_one(char **pp, char *end, char **keyp, char **valp) key = strstrip(key); if (!strlen(key)) { - printk(KERN_ERR "CUSE: zero length info key specified\n"); + pr_err("zero length info key specified\n"); return -EINVAL; } @@ -282,12 +284,11 @@ static int cuse_parse_devinfo(char *p, size_t len, struct cuse_devinfo *devinfo) if (strcmp(key, "DEVNAME") == 0) devinfo->name = val; else - printk(KERN_WARNING "CUSE: unknown device info \"%s\"\n", - key); + pr_warn("unknown device info \"%s\"\n", key); } if (!devinfo->name || !strlen(devinfo->name)) { - printk(KERN_ERR "CUSE: DEVNAME unspecified\n"); + pr_err("DEVNAME unspecified\n"); return -EINVAL; } @@ -341,7 +342,7 @@ static void cuse_process_init_reply(struct fuse_conn *fc, struct fuse_req *req) else rc = register_chrdev_region(devt, 1, devinfo.name); if (rc) { - printk(KERN_ERR "CUSE: failed to register chrdev region\n"); + pr_err("failed to register chrdev region\n"); goto err; } diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 8a63e52785e9..ccb4c3980829 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -906,8 +906,8 @@ static int fuse_check_page(struct page *page) 1 << PG_lru | 1 << PG_active | 1 << PG_reclaim))) { - printk(KERN_WARNING "fuse: trying to steal weird page\n"); - printk(KERN_WARNING " page=%p index=%li flags=%08lx, count=%i, mapcount=%i, mapping=%p\n", page, page->index, page->flags, page_count(page), page_mapcount(page), page->mapping); + pr_warn("trying to steal weird page\n"); + pr_warn(" page=%p index=%li flags=%08lx, count=%i, mapcount=%i, mapping=%p\n", page, page->index, page->flags, page_count(page), page_mapcount(page), page->mapping); return 1; } return 0; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 0920c0c032a0..e6195bc8f836 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -9,6 +9,10 @@ #ifndef _FS_FUSE_I_H #define _FS_FUSE_I_H +#ifndef pr_fmt +# define pr_fmt(fmt) "fuse: " fmt +#endif + #include <linux/fuse.h> #include <linux/fs.h> #include <linux/mount.h> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 1b3f3b67d9f0..1bca5023bcc5 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1397,8 +1397,8 @@ static int __init fuse_init(void) { int res; - printk(KERN_INFO "fuse init (API version %i.%i)\n", - FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION); + pr_info("init (API version %i.%i)\n", + FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION); INIT_LIST_HEAD(&fuse_conn_list); res = fuse_fs_init(); @@ -1434,7 +1434,7 @@ static int __init fuse_init(void) static void __exit fuse_exit(void) { - printk(KERN_DEBUG "fuse exit\n"); + pr_debug("exit\n"); fuse_ctl_cleanup(); fuse_sysfs_cleanup(); -- 2.21.0.392.gf8f6787159 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_* 2019-03-27 9:15 ` [RESEND1, PATCH 1/2] fuse: convert printk -> pr_* Kirill Smelkov @ 2019-04-23 14:57 ` Miklos Szeredi 2019-04-24 8:38 ` FUSE workflow=? (Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_*) Kirill Smelkov 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2019-04-23 14:57 UTC (permalink / raw) To: Kirill Smelkov Cc: Miklos Szeredi, Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, linux-kernel, Han-Wen Nienhuys, Andrew Morton On Wed, Mar 27, 2019 at 10:15 AM Kirill Smelkov <kirr@nexedi.com> wrote: > > Functions, like pr_err, are a more modern variant of printing compared to > printk. They could be used to denoise sources by using needed level in > the print function name, and by automatically inserting per-driver / > function / ... print prefix as defined by pr_fmt macro. pr_* are also > said to be used in Documentation/process/coding-style.rst and more > recent code - for example overlayfs - uses them instead of printk. > > Convert CUSE and FUSE to use the new pr_* functions. > > CUSE output stays completely unchanged, while FUSE output is amended a > bit for "trying to steal weird page" warning - the second line now comes > also with "fuse:" prefix. I hope it is ok. Yep. Applied, thanks. Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* FUSE workflow=? (Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_*) 2019-04-23 14:57 ` Miklos Szeredi @ 2019-04-24 8:38 ` Kirill Smelkov 2019-04-24 8:57 ` Miklos Szeredi 0 siblings, 1 reply; 7+ messages in thread From: Kirill Smelkov @ 2019-04-24 8:38 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, linux-kernel, Han-Wen Nienhuys, Andrew Morton, Linus Torvalds +torvalds On Tue, Apr 23, 2019 at 04:57:58PM +0200, Miklos Szeredi wrote: > On Wed, Mar 27, 2019 at 10:15 AM Kirill Smelkov <kirr@nexedi.com> wrote: > > > > Functions, like pr_err, are a more modern variant of printing compared to > > printk. They could be used to denoise sources by using needed level in > > the print function name, and by automatically inserting per-driver / > > function / ... print prefix as defined by pr_fmt macro. pr_* are also > > said to be used in Documentation/process/coding-style.rst and more > > recent code - for example overlayfs - uses them instead of printk. > > > > Convert CUSE and FUSE to use the new pr_* functions. > > > > CUSE output stays completely unchanged, while FUSE output is amended a > > bit for "trying to steal weird page" warning - the second line now comes > > also with "fuse:" prefix. I hope it is ok. > > Yep. Applied, thanks. Miklos, thanks for feedback. Could you please clarify where the patch is applied? Here is what linux/MAINTAINERS says FUSE: FILESYSTEM IN USERSPACE M: Miklos Szeredi <miklos@szeredi.hu> L: linux-fsdevel@vger.kernel.org W: http://fuse.sourceforge.net/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git S: Maintained F: fs/fuse/ F: include/uapi/linux/fuse.h F: Documentation/filesystems/fuse.txt but git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git was not updated for ~ 2 months. I see other "Applied, thanks" replies from you on linux-fsdevel in recent days and it suggests that patches are indeed applied, but where they are integrated is the question. Linux-next also has no post-5.1 fuse patches at all, so I'm really puzzled about what is going on. Is there any reason not to keep for-next fuse branch publicly available? Or am I missing something? Could you please also have a look at other posted patches? I'm struggling for months sending them to you and not getting feedback. It is kind of frustrating to work in this mode. Here they are: - FOPEN_STREAM to fix read/write deadlock on stream-like files: https://lore.kernel.org/linux-fsdevel/20190424071316.11967-1-kirr@nexedi.com/ the basis for this patch was landed to master already: git.kernel.org/linus/10dce8af3422 - FUSE_PRECISE_INVAL_DATA to allow filesystems to have precise control over data cache and in particular not to loose the whole data cache on file size change: https://lore.kernel.org/linux-fsdevel/e0b43507976d6ea9010f1bacaef067f18de49f1f.1553677194.git.kirr@nexedi.com/ cover letter: https://lore.kernel.org/linux-fsdevel/cover.1553677194.git.kirr@nexedi.com/ this patch is essential for my filesystem which cares very deeply about not loosing local file cache. ( "fuse: convert printk -> pr_*" was only a preparatory patch in that series suggested by Kirill Tkhai ) - don't stuck clients on retrieve_notify with size > max_write https://lore.kernel.org/linux-fsdevel/cover.1553680185.git.kirr@nexedi.com/ https://lore.kernel.org/linux-fsdevel/12f7d0d98555ee0d174d04bb47644f65c07f035a.1553680185.git.kirr@nexedi.com/ https://lore.kernel.org/linux-fsdevel/d74b17b9d33c3dcc7a1f2fa2914fb3c4e7cda127.1553680185.git.kirr@nexedi.com/ this is kind of no-op if server behaves sanely, but for slightly misbehaving server changes kernel to return a regular error instead of promising to userspace that it will send a reply and not doing so, thus getting userspace stuck. when I got my filesystem initially stuck it required to dig a lot to understand what was going on https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 (starting from "I've hit this bug for real ...") Even though go-fuse (the fuse library that was slightly misbehaving) is now fixed https://github.com/hanwen/go-fuse/commit/58dcd77a24, it is a big difference if userspace gets an error, or it gets "ok" return and is further stuck waiting for promised message. Besides libfuse and go-fuse there are several other fuse libraries and by fixing kernel behaviour here we care about all fuse users. In February you set 10 lines budget for this "non-bug fix" and this budget is met with the patches which cumulatively are 2 lines of code change and 7 lines of comments. Thanks beforehand, Kirill ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FUSE workflow=? (Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_*) 2019-04-24 8:38 ` FUSE workflow=? (Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_*) Kirill Smelkov @ 2019-04-24 8:57 ` Miklos Szeredi 2019-04-24 9:54 ` Kirill Smelkov 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2019-04-24 8:57 UTC (permalink / raw) To: Kirill Smelkov Cc: Miklos Szeredi, Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, lkml, Han-Wen Nienhuys, Andrew Morton, Linus Torvalds On Wed, Apr 24, 2019 at 10:38 AM Kirill Smelkov <kirr@nexedi.com> wrote: > > +torvalds > > On Tue, Apr 23, 2019 at 04:57:58PM +0200, Miklos Szeredi wrote: > > On Wed, Mar 27, 2019 at 10:15 AM Kirill Smelkov <kirr@nexedi.com> wrote: > > > > > > Functions, like pr_err, are a more modern variant of printing compared to > > > printk. They could be used to denoise sources by using needed level in > > > the print function name, and by automatically inserting per-driver / > > > function / ... print prefix as defined by pr_fmt macro. pr_* are also > > > said to be used in Documentation/process/coding-style.rst and more > > > recent code - for example overlayfs - uses them instead of printk. > > > > > > Convert CUSE and FUSE to use the new pr_* functions. > > > > > > CUSE output stays completely unchanged, while FUSE output is amended a > > > bit for "trying to steal weird page" warning - the second line now comes > > > also with "fuse:" prefix. I hope it is ok. > > > > Yep. Applied, thanks. > > Miklos, thanks for feedback. Could you please clarify where the patch is > applied? Here is what linux/MAINTAINERS says > > FUSE: FILESYSTEM IN USERSPACE > M: Miklos Szeredi <miklos@szeredi.hu> > L: linux-fsdevel@vger.kernel.org > W: http://fuse.sourceforge.net/ > T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git > S: Maintained > F: fs/fuse/ > F: include/uapi/linux/fuse.h > F: Documentation/filesystems/fuse.txt > > but git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git was > not updated for ~ 2 months. I see other "Applied, thanks" replies from > you on linux-fsdevel in recent days and it suggests that patches are > indeed applied, but where they are integrated is the question. My private patch queue. > Linux-next also has no post-5.1 fuse patches at all, so I'm really > puzzled about what is going on. > > Is there any reason not to keep for-next fuse branch publicly available? > Or am I missing something? I usually push to fuse.git#for-next within a day or two of adding it to my queue. > Could you please also have a look at other posted patches? I'm > struggling for months sending them to you and not getting feedback. It > is kind of frustrating to work in this mode. I see. I'll try to give more frequent feedback on patches. The reason for not replying is not that I intentionally ignore incoming patches, but because I'm working on something else and context switching between completely different projects is not easy for me. Thanks, Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FUSE workflow=? (Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_*) 2019-04-24 8:57 ` Miklos Szeredi @ 2019-04-24 9:54 ` Kirill Smelkov 0 siblings, 0 replies; 7+ messages in thread From: Kirill Smelkov @ 2019-04-24 9:54 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, lkml, Han-Wen Nienhuys, Andrew Morton, Linus Torvalds On Wed, Apr 24, 2019 at 10:57:35AM +0200, Miklos Szeredi wrote: > On Wed, Apr 24, 2019 at 10:38 AM Kirill Smelkov <kirr@nexedi.com> wrote: > > > > +torvalds > > > > On Tue, Apr 23, 2019 at 04:57:58PM +0200, Miklos Szeredi wrote: > > > On Wed, Mar 27, 2019 at 10:15 AM Kirill Smelkov <kirr@nexedi.com> wrote: > > > > > > > > Functions, like pr_err, are a more modern variant of printing compared to > > > > printk. They could be used to denoise sources by using needed level in > > > > the print function name, and by automatically inserting per-driver / > > > > function / ... print prefix as defined by pr_fmt macro. pr_* are also > > > > said to be used in Documentation/process/coding-style.rst and more > > > > recent code - for example overlayfs - uses them instead of printk. > > > > > > > > Convert CUSE and FUSE to use the new pr_* functions. > > > > > > > > CUSE output stays completely unchanged, while FUSE output is amended a > > > > bit for "trying to steal weird page" warning - the second line now comes > > > > also with "fuse:" prefix. I hope it is ok. > > > > > > Yep. Applied, thanks. > > > > Miklos, thanks for feedback. Could you please clarify where the patch is > > applied? Here is what linux/MAINTAINERS says > > > > FUSE: FILESYSTEM IN USERSPACE > > M: Miklos Szeredi <miklos@szeredi.hu> > > L: linux-fsdevel@vger.kernel.org > > W: http://fuse.sourceforge.net/ > > T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git > > S: Maintained > > F: fs/fuse/ > > F: include/uapi/linux/fuse.h > > F: Documentation/filesystems/fuse.txt > > > > but git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git was > > not updated for ~ 2 months. I see other "Applied, thanks" replies from > > you on linux-fsdevel in recent days and it suggests that patches are > > indeed applied, but where they are integrated is the question. > > My private patch queue. > > > Linux-next also has no post-5.1 fuse patches at all, so I'm really > > puzzled about what is going on. > > > > Is there any reason not to keep for-next fuse branch publicly available? > > Or am I missing something? > > I usually push to fuse.git#for-next within a day or two of adding it > to my queue. Miklos, first of all thanks a lot for feedback. I see about fuse.git#for-next. I was checking linux-fsdevel and saw one patch said to be applied ~ one week ago. That timeframe was bigger than a private queue for in-house testing that I was imagining, and that's why I asked. https://lore.kernel.org/linux-fsdevel/CAJfpegsyuNtS7afL3Wqjx7m2ewPUUih9pznRmxtrsHYOKER2Gw@mail.gmail.com/ > > Could you please also have a look at other posted patches? I'm > > struggling for months sending them to you and not getting feedback. It > > is kind of frustrating to work in this mode. > > I see. I'll try to give more frequent feedback on patches. The > reason for not replying is not that I intentionally ignore incoming > patches, but because I'm working on something else and context > switching between completely different projects is not easy for me. I see. I understand about context switching difficulty especially that it is similarly not easy on my side. However big latency on patches feedback creates uncertainty e.g. whether a patch will be considered at all or dropped? If it will be considered there are chances that it will have to be reworked, and if feedback is e.g. once per cycle (~2.5 months) the reworked patch will have to wait for another cycle which gets it close to half a year. And with those latencies it starts to be close to kill the motivation to do the work at all. (a record example - I once got first reply after 5 _years_ since posting a patch) We all have limited resources and I'm not talking about getting feedback (though that would be appreciated) in the days timeframe. Getting feedback in one or two weeks should be reasonably though. With more delay, at least for me, it starts to be on the edge and the patch considered to be lost. It is very appreciated if you could indeed try to provide feedback more frequently. I apologize if maybe my email touched some pain points. I asked because it was not clear what to do with the patches and my filesystem becomes completely useless without e.g. precise-cache fix. I was just trying to find what should be the way for my fuse changes into the kernel. Thanks a lot, once again, for your feedback. Kirill ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RESEND1, PATCH v2 2/2] fuse: allow filesystems to have precise control over data cache 2019-03-27 9:15 [RESEND1, PATCH 0/2] fuse: allow filesystems to have precise control over data cache Kirill Smelkov 2019-03-27 9:15 ` [RESEND1, PATCH 1/2] fuse: convert printk -> pr_* Kirill Smelkov @ 2019-03-27 10:14 ` Kirill Smelkov 1 sibling, 0 replies; 7+ messages in thread From: Kirill Smelkov @ 2019-03-27 10:14 UTC (permalink / raw) To: Miklos Szeredi, Miklos Szeredi Cc: Brian Foster, Maxim Patlasov, Anatol Pomozov, Pavel Emelyanov, Andrew Gallagher, Anand V . Avati, Alexey Kuznetsov, Andrey Ryabinin, Kirill Tkhai, Constantine Shulyupin, Chad Austin, Dan Schatzberg, linux-fsdevel, fuse-devel, linux-kernel, Han-Wen Nienhuys, Andrew Morton, Kirill Smelkov On networked filesystems file data can be changed externally. FUSE provides notification messages for filesystem to inform kernel that metadata or data region of a file needs to be invalidated in local page cache. That provides the basis for filesystem implementations to invalidate kernel cache precisely based on observed filesystem-specific events. FUSE has also "automatic" invalidation mode(*) when the kernel automatically invalidates data cache of a file if it sees mtime change. It also automatically invalidates whole data cache of a file if it sees file size being changed. The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA. However, due to probably historical reason, that capability controls only whether mtime change should be resulting in automatic invalidation or not. A change in file size always results in invalidating whole data cache of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+). The filesystem I write[1] represents data arrays stored in networked database as local files suitable for mmap. It is read-only filesystem - changes to data are committed externally via database interfaces and the filesystem only glues data into contiguous file streams suitable for mmap and traditional array processing. The files are big - starting from hundreds gigabytes and more. The files change regularly, and frequently by data being appended to their end. The size of files thus changes frequently. If a file was accessed locally and some part of its data got into page cache, we want that data to stay cached unless there is memory pressure, or unless corresponding part of the file was actually changed. However current FUSE behaviour - when it sees file size change - is to invalidate the whole file. The data cache of the file is thus completely lost even on small size change, and despite that the filesystem server is careful to accurately translate database changes into FUSE invalidation messages to kernel. Let's fix it: if a filesystem, through new FUSE_PRECISE_INVAL_DATA capability, indicates to kernel that it is fully responsible for data cache invalidation, then the kernel won't invalidate files data cache on size change and only truncate that cache to new size in case the size decreased. (*) see 72d0d248ca "fuse: add FUSE_AUTO_INVAL_DATA init flag", eed2179efe "fuse: invalidate inode mapping if mtime changes" (+) in writeback mode the kernel does not invalidate data cache on file size change, but neither it allows the filesystem to set the size due to external event (see 8373200b12 "fuse: Trust kernel i_size only") [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20 Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 12 ++++++++++-- include/uapi/linux/fuse.h | 7 ++++++- 3 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e6195bc8f836..154f6cdd94d1 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -694,6 +694,9 @@ struct fuse_conn { /** Use enhanced/automatic page cache invalidation. */ unsigned auto_inval_data:1; + /** Filesystem is fully reponsible for page cache invalidation. */ + unsigned precise_inval_data:1; + /** Does the filesystem support readdirplus? */ unsigned do_readdirplus:1; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 1bca5023bcc5..2be0bca7f76c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -237,7 +237,8 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, if (oldsize != attr->size) { truncate_pagecache(inode, attr->size); - inval = true; + if (!fc->precise_inval_data) + inval = true; } else if (fc->auto_inval_data) { struct timespec64 new_mtime = { .tv_sec = attr->mtime, @@ -912,6 +913,13 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->dont_mask = 1; if (arg->flags & FUSE_AUTO_INVAL_DATA) fc->auto_inval_data = 1; + if (arg->flags & FUSE_PRECISE_INVAL_DATA) + fc->precise_inval_data = 1; + if (fc->auto_inval_data && fc->precise_inval_data) { + pr_warn("filesystem requested both auto and " + "precise cache control - using auto\n"); + fc->precise_inval_data = 0; + } if (arg->flags & FUSE_DO_READDIRPLUS) { fc->do_readdirplus = 1; if (arg->flags & FUSE_READDIRPLUS_AUTO) @@ -973,7 +981,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT | FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL | FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | - FUSE_NO_OPENDIR_SUPPORT; + FUSE_NO_OPENDIR_SUPPORT | FUSE_PRECISE_INVAL_DATA; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 2ac598614a8f..33de8f6391ec 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -125,6 +125,9 @@ * * 7.29 * - add FUSE_NO_OPENDIR_SUPPORT flag + * + * 7.30 + * - add FUSE_PRECISE_INVAL_DATA */ #ifndef _LINUX_FUSE_H @@ -160,7 +163,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 29 +#define FUSE_KERNEL_MINOR_VERSION 30 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -263,6 +266,7 @@ struct fuse_file_lock { * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages * FUSE_CACHE_SYMLINKS: cache READLINK responses * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir + * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for data cache invalidation */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -289,6 +293,7 @@ struct fuse_file_lock { #define FUSE_MAX_PAGES (1 << 22) #define FUSE_CACHE_SYMLINKS (1 << 23) #define FUSE_NO_OPENDIR_SUPPORT (1 << 24) +#define FUSE_PRECISE_INVAL_DATA (1 << 25) /** * CUSE INIT request/reply flags -- 2.21.0.392.gf8f6787159 ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-04-24 10:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-27 9:15 [RESEND1, PATCH 0/2] fuse: allow filesystems to have precise control over data cache Kirill Smelkov 2019-03-27 9:15 ` [RESEND1, PATCH 1/2] fuse: convert printk -> pr_* Kirill Smelkov 2019-04-23 14:57 ` Miklos Szeredi 2019-04-24 8:38 ` FUSE workflow=? (Re: [RESEND1, PATCH 1/2] fuse: convert printk -> pr_*) Kirill Smelkov 2019-04-24 8:57 ` Miklos Szeredi 2019-04-24 9:54 ` Kirill Smelkov 2019-03-27 10:14 ` [RESEND1, PATCH v2 2/2] fuse: allow filesystems to have precise control over data cache Kirill Smelkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).