linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [FUSE] notify_store usage: deadlocks with other read / write requests
       [not found] <CH2PR14MB410492CB0C3AB8EA0833F963D6C69@CH2PR14MB4104.namprd14.prod.outlook.com>
@ 2021-08-27 17:31 ` Teng Qin
  2021-09-08  9:30   ` Miklos Szeredi
  0 siblings, 1 reply; 3+ messages in thread
From: Teng Qin @ 2021-08-27 17:31 UTC (permalink / raw)
  To: linux-fsdevel

I am developing a file system that has underlying block size way larger than the number of pages VFS would request to the FUSE daemon (2MB / 4MB vs 32 pages = 128K).
I currently cache the block data in user space, but it would be more ideal to have Kernel manage this with page cache, and save round-trips between VFS and FUSE daemon. So I was looking at use FUSE_NOTIFY_STORE to proactively offer the data to Kernel. However, I found that the notify store often deadlocks with user read requests.

For example, say the user process is doing sequential read from offset 0.
Kernel requests a 128K read to FUSE daemon and I fetch the 2MB block from underlying storage. After replying the read request, I would like to offer the rest of the 1920K data to Kernel from offset 128K. However, at this point Kernel most likely alraedy started the next read request also at offset 128K, and have those page locked:

  wait_on_page_locked_killable
  generic_file_buffered_read
  generic_file_read_iter

On the other hand, the notify store is also waiting on locking those pages:

  __lock_page
  __find_lock_page
  find_or_create_page
  fuse_notify_store

This normally deadlocks the FUSE daemon.

The notify store is a pretty old feature so I'm not sure if this is really an issue or I'm using it wrong. I would be very grateful if anyone could help me with some insights on how this is intended to be used. On the other hand, I was thinking maybe we could support an async notify store requests. When the Kernel moduels gets the requests, if it can not acquire lock on the relevant pages, it could just store the user provided data in dis-attached page structs, add them to a background requetss, and try later. If people are OK with such ideas, I would be more than happy to try with an implementation.

Thank you very much for help in ahead!

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data protection and privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for legal, compliance, and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company’s treatment of personal data, please email datarequests@jumptrading.com.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [FUSE] notify_store usage: deadlocks with other read / write requests
  2021-08-27 17:31 ` [FUSE] notify_store usage: deadlocks with other read / write requests Teng Qin
@ 2021-09-08  9:30   ` Miklos Szeredi
  2021-09-08  9:47     ` Miklos Szeredi
  0 siblings, 1 reply; 3+ messages in thread
From: Miklos Szeredi @ 2021-09-08  9:30 UTC (permalink / raw)
  To: Teng Qin; +Cc: linux-fsdevel

On Fri, Aug 27, 2021 at 05:31:18PM +0000, Teng Qin wrote:
> I am developing a file system that has underlying block size way larger than the number of pages VFS would request to the FUSE daemon (2MB / 4MB vs 32 pages = 128K).
> I currently cache the block data in user space, but it would be more ideal to have Kernel manage this with page cache, and save round-trips between VFS and FUSE daemon. So I was looking at use FUSE_NOTIFY_STORE to proactively offer the data to Kernel. However, I found that the notify store often deadlocks with user read requests.
> 
> For example, say the user process is doing sequential read from offset 0.
> Kernel requests a 128K read to FUSE daemon and I fetch the 2MB block from underlying storage. After replying the read request, I would like to offer the rest of the 1920K data to Kernel from offset 128K. However, at this point Kernel most likely alraedy started the next read request also at offset 128K, and have those page locked:
> 
>   wait_on_page_locked_killable
>   generic_file_buffered_read
>   generic_file_read_iter
> 
> On the other hand, the notify store is also waiting on locking those pages:
> 
>   __lock_page
>   __find_lock_page
>   find_or_create_page
>   fuse_notify_store
> 
> This normally deadlocks the FUSE daemon.
> 
> The notify store is a pretty old feature so I'm not sure if this is really an issue or I'm using it wrong. I would be very grateful if anyone could help me with some insights on how this is intended to be used. On the other hand, I was thinking maybe we could support an async notify store requests. When the Kernel moduels gets the requests, if it can not acquire lock on the relevant pages, it could just store the user provided data in dis-attached page structs, add them to a background requetss, and try later. If people are OK with such ideas, I would be more than happy to try with an implementation.

Hi,

Simplest solution is to just skip locked pages in NOTIFY_STORE.  Can you try the
attached patch (untested)?

Thanks,
Miklos

---
 fs/fuse/dev.c             |   16 +++++++++++++---
 include/uapi/linux/fuse.h |    9 ++++++++-
 2 files changed, 21 insertions(+), 4 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1562,6 +1562,7 @@ static int fuse_notify_store(struct fuse
 	unsigned int num;
 	loff_t file_size;
 	loff_t end;
+	bool nowait;
 
 	err = -EINVAL;
 	if (size < sizeof(outarg))
@@ -1576,6 +1577,7 @@ static int fuse_notify_store(struct fuse
 		goto out_finish;
 
 	nodeid = outarg.nodeid;
+	nowait = outarg.flags & FUSE_NOTIFY_STORE_NOWAIT;
 
 	down_read(&fc->killsb);
 
@@ -1598,12 +1600,19 @@ static int fuse_notify_store(struct fuse
 	while (num) {
 		struct page *page;
 		unsigned int this_num;
+		int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
+
+		if (nowait)
+			fgp_flags |= FGP_NOWAIT;
 
 		err = -ENOMEM;
-		page = find_or_create_page(mapping, index,
-					   mapping_gfp_mask(mapping));
-		if (!page)
+		page = pagecache_get_page(mapping, index, fgp_flags,
+					  mapping_gfp_mask(mapping));
+		if (!page) {
+			if (nowait)
+				goto skip;
 			goto out_iput;
+		}
 
 		this_num = min_t(unsigned, num, PAGE_SIZE - offset);
 		err = fuse_copy_page(cs, &page, offset, this_num, 0);
@@ -1616,6 +1625,7 @@ static int fuse_notify_store(struct fuse
 		if (err)
 			goto out_iput;
 
+skip:
 		num -= this_num;
 		offset = 0;
 		index++;
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -464,6 +464,13 @@ struct fuse_file_lock {
  */
 #define FUSE_SETXATTR_ACL_KILL_SGID	(1 << 0)
 
+
+/*
+ * notify_store flags
+ * FUSE_NOTIFY_STORE_NOWAIT: skip locked pages
+ */
+#define FUSE_NOTIFY_STORE_NOWAIT	(1 << 0)
+
 enum fuse_opcode {
 	FUSE_LOOKUP		= 1,
 	FUSE_FORGET		= 2,  /* no reply */
@@ -899,7 +906,7 @@ struct fuse_notify_store_out {
 	uint64_t	nodeid;
 	uint64_t	offset;
 	uint32_t	size;
-	uint32_t	padding;
+	uint32_t	flags;
 };
 
 struct fuse_notify_retrieve_out {

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [FUSE] notify_store usage: deadlocks with other read / write requests
  2021-09-08  9:30   ` Miklos Szeredi
@ 2021-09-08  9:47     ` Miklos Szeredi
  0 siblings, 0 replies; 3+ messages in thread
From: Miklos Szeredi @ 2021-09-08  9:47 UTC (permalink / raw)
  To: Teng Qin; +Cc: linux-fsdevel

On Wed, Sep 08, 2021 at 11:30:13AM +0200, Miklos Szeredi wrote:
> On Fri, Aug 27, 2021 at 05:31:18PM +0000, Teng Qin wrote:
> > I am developing a file system that has underlying block size way larger than the number of pages VFS would request to the FUSE daemon (2MB / 4MB vs 32 pages = 128K).
> > I currently cache the block data in user space, but it would be more ideal to have Kernel manage this with page cache, and save round-trips between VFS and FUSE daemon. So I was looking at use FUSE_NOTIFY_STORE to proactively offer the data to Kernel. However, I found that the notify store often deadlocks with user read requests.
> > 
> > For example, say the user process is doing sequential read from offset 0.
> > Kernel requests a 128K read to FUSE daemon and I fetch the 2MB block from underlying storage. After replying the read request, I would like to offer the rest of the 1920K data to Kernel from offset 128K. However, at this point Kernel most likely alraedy started the next read request also at offset 128K, and have those page locked:
> > 
> >   wait_on_page_locked_killable
> >   generic_file_buffered_read
> >   generic_file_read_iter
> > 
> > On the other hand, the notify store is also waiting on locking those pages:
> > 
> >   __lock_page
> >   __find_lock_page
> >   find_or_create_page
> >   fuse_notify_store
> > 
> > This normally deadlocks the FUSE daemon.
> > 
> > The notify store is a pretty old feature so I'm not sure if this is really an issue or I'm using it wrong. I would be very grateful if anyone could help me with some insights on how this is intended to be used. On the other hand, I was thinking maybe we could support an async notify store requests. When the Kernel moduels gets the requests, if it can not acquire lock on the relevant pages, it could just store the user provided data in dis-attached page structs, add them to a background requetss, and try later. If people are OK with such ideas, I would be more than happy to try with an implementation.
> 
> Hi,
> 
> Simplest solution is to just skip locked pages in NOTIFY_STORE.  Can you try the
> attached patch (untested)?

And another version (data needs to be skipped as well).

Thanks,
Miklos

---
 fs/fuse/dev.c             |   17 +++++++++++++----
 include/uapi/linux/fuse.h |    9 ++++++++-
 2 files changed, 21 insertions(+), 5 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1562,6 +1562,7 @@ static int fuse_notify_store(struct fuse
 	unsigned int num;
 	loff_t file_size;
 	loff_t end;
+	bool nowait;
 
 	err = -EINVAL;
 	if (size < sizeof(outarg))
@@ -1576,6 +1577,7 @@ static int fuse_notify_store(struct fuse
 		goto out_finish;
 
 	nodeid = outarg.nodeid;
+	nowait = outarg.flags & FUSE_NOTIFY_STORE_NOWAIT;
 
 	down_read(&fc->killsb);
 
@@ -1598,21 +1600,28 @@ static int fuse_notify_store(struct fuse
 	while (num) {
 		struct page *page;
 		unsigned int this_num;
+		int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
+
+		if (nowait)
+			fgp_flags |= FGP_NOWAIT;
 
 		err = -ENOMEM;
-		page = find_or_create_page(mapping, index,
-					   mapping_gfp_mask(mapping));
-		if (!page)
+		page = pagecache_get_page(mapping, index, fgp_flags,
+					  mapping_gfp_mask(mapping));
+		if (!page && !nowait)
 			goto out_iput;
 
 		this_num = min_t(unsigned, num, PAGE_SIZE - offset);
 		err = fuse_copy_page(cs, &page, offset, this_num, 0);
+		if (!page)
+			goto skip;
+
 		if (!err && offset == 0 &&
 		    (this_num == PAGE_SIZE || file_size == end))
 			SetPageUptodate(page);
 		unlock_page(page);
 		put_page(page);
-
+skip:
 		if (err)
 			goto out_iput;
 
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -464,6 +464,13 @@ struct fuse_file_lock {
  */
 #define FUSE_SETXATTR_ACL_KILL_SGID	(1 << 0)
 
+
+/*
+ * notify_store flags
+ * FUSE_NOTIFY_STORE_NOWAIT: skip locked pages
+ */
+#define FUSE_NOTIFY_STORE_NOWAIT	(1 << 0)
+
 enum fuse_opcode {
 	FUSE_LOOKUP		= 1,
 	FUSE_FORGET		= 2,  /* no reply */
@@ -899,7 +906,7 @@ struct fuse_notify_store_out {
 	uint64_t	nodeid;
 	uint64_t	offset;
 	uint32_t	size;
-	uint32_t	padding;
+	uint32_t	flags;
 };
 
 struct fuse_notify_retrieve_out {

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-08  9:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CH2PR14MB410492CB0C3AB8EA0833F963D6C69@CH2PR14MB4104.namprd14.prod.outlook.com>
2021-08-27 17:31 ` [FUSE] notify_store usage: deadlocks with other read / write requests Teng Qin
2021-09-08  9:30   ` Miklos Szeredi
2021-09-08  9:47     ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).