linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/14] fuse: An attempt to implement a write-back cache policy
@ 2013-04-01 10:40 Maxim V. Patlasov
  2013-04-01 10:40 ` [PATCH 01/14] fuse: Linking file to inode helper Maxim V. Patlasov
                   ` (14 more replies)
  0 siblings, 15 replies; 30+ messages in thread
From: Maxim V. Patlasov @ 2013-04-01 10:40 UTC (permalink / raw)
  To: miklos
  Cc: dev, xemul, fuse-devel, linux-kernel, jbottomley, viro,
	linux-fsdevel, devel

Hi,

This is the fourth iteration of Pavel Emelyanov's patch-set implementing
write-back policy for FUSE page cache. Initial patch-set description was
the following:

One of the problems with the existing FUSE implementation is that it uses the
write-through cache policy which results in performance problems on certain
workloads. E.g. when copying a big file into a FUSE file the cp pushes every
128k to the userspace synchronously. This becomes a problem when the userspace
back-end uses networking for storing the data.

A good solution of this is switching the FUSE page cache into a write-back policy.
With this file data are pushed to the userspace with big chunks (depending on the
dirty memory limits, but this is much more than 128k) which lets the FUSE daemons
handle the size updates in a more efficient manner.

The writeback feature is per-connection and is explicitly configurable at the
init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is
turned ON:

* still copy writeback pages to temporary buffer when sending a writeback request
  and finish the page writeback immediately

* make kernel maintain the inode's i_size to avoid frequent i_size synchronization
  with the user space

* take NR_WRITEBACK_TEMP into account when makeing balance_dirty_pages decision.
  This protects us from having too many dirty pages on FUSE

The provided patchset survives the fsx test. Performance measurements are not yet
all finished, but the mentioned copying of a huge file becomes noticeably faster
even on machines with few RAM and doesn't make the system stuck (the dirty pages
balancer does its work OK). Applies on top of v3.5-rc4.

We are currently exploring this with our own distributed storage implementation
which is heavily oriented on storing big blobs of data with extremely rare meta-data
updates (virtual machines' and containers' disk images). With the existing cache
policy a typical usage scenario -- copying a big VM disk into a cloud -- takes way
too much time to proceed, much longer than if it was simply scp-ed over the same
network. The write-back policy (as I mentioned) noticeably improves this scenario.
Kirill (in Cc) can share more details about the performance and the storage concepts
details if required.

Changed in v2:
 - numerous bugfixes:
   - fuse_write_begin and fuse_writepages_fill and fuse_writepage_locked must wait
     on page writeback because page writeback can extend beyond the lifetime of
     the page-cache page
   - fuse_send_writepages can end_page_writeback on original page only after adding
     request to fi->writepages list; otherwise another writeback may happen inside
     the gap between end_page_writeback and adding to the list
   - fuse_direct_io must wait on page writeback; otherwise data corruption is possible
     due to reordering requests
   - fuse_flush must flush dirty memory and wait for all writeback on given inode
     before sending FUSE_FLUSH to userspace; otherwise FUSE_FLUSH is not reliable
   - fuse_file_fallocate must hold i_mutex around FUSE_FALLOCATE and i_size update;
     otherwise a race with a writer extending i_size is possible
   - fix handling errors in fuse_writepages and fuse_send_writepages
 - handle i_mtime intelligently if writeback cache is on (see patch #7 (update i_mtime
   on buffered writes) for details.
 - put enabling writeback cache under fusermount control; (see mount option
   'allow_wbcache' introduced by patch #13 (turn writeback cache on))
 - rebased on v3.7-rc5

Changed in v3:
 - rebased on for-next branch of the fuse tree (fb05f41f5f96f7423c53da4d87913fb44fd0565d)

Changed in v4:
 - rebased on for-next branch of the fuse tree (634734b63ac39e137a1c623ba74f3e062b6577db)
 - fixed fuse_fillattr() for non-writeback_chace case
 - added comments explaining why we cannot trust size from server
 - rewrote patch handling i_mtime; it's titled Trust-kernel-i_mtime-only now
 - simplified patch titled Flush-files-on-wb-close
 - eliminated code duplications from fuse_readpage() ans fuse_prepare_write()
 - added comment about "disk full" errors to fuse_write_begin()

Thanks,
Maxim

---

Maxim V. Patlasov (14):
      fuse: Linking file to inode helper
      fuse: Getting file for writeback helper
      fuse: Prepare to handle short reads
      fuse: Prepare to handle multiple pages in writeback
      fuse: Connection bit for enabling writeback
      fuse: Trust kernel i_size only - v3
      fuse: Trust kernel i_mtime only
      fuse: Flush files on wb close
      fuse: Implement writepages and write_begin/write_end callbacks - v3
      fuse: fuse_writepage_locked() should wait on writeback
      fuse: fuse_flush() should wait on writeback
      fuse: Fix O_DIRECT operations vs cached writeback misorder - v2
      fuse: Turn writeback cache on
      mm: Account for WRITEBACK_TEMP in balance_dirty_pages


 fs/fuse/cuse.c            |    5 
 fs/fuse/dir.c             |  127 +++++++++-
 fs/fuse/file.c            |  575 ++++++++++++++++++++++++++++++++++++++++-----
 fs/fuse/fuse_i.h          |   26 ++
 fs/fuse/inode.c           |   37 +++
 include/uapi/linux/fuse.h |    2 
 mm/page-writeback.c       |    3 
 7 files changed, 689 insertions(+), 86 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v3 00/14] fuse: An attempt to implement a write-back cache policy
@ 2013-01-25 18:20 Maxim V. Patlasov
  2013-01-25 18:26 ` [PATCH 11/14] fuse: fuse_flush() should wait on writeback Maxim V. Patlasov
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim V. Patlasov @ 2013-01-25 18:20 UTC (permalink / raw)
  To: miklos
  Cc: dev, xemul, fuse-devel, linux-kernel, jbottomley, viro,
	linux-fsdevel, devel

Hi,

This is the second iteration of Pavel Emelyanov's patch-set implementing
write-back policy for FUSE page cache. Initial patch-set description was
the following:

One of the problems with the existing FUSE implementation is that it uses the
write-through cache policy which results in performance problems on certain
workloads. E.g. when copying a big file into a FUSE file the cp pushes every
128k to the userspace synchronously. This becomes a problem when the userspace
back-end uses networking for storing the data.

A good solution of this is switching the FUSE page cache into a write-back policy.
With this file data are pushed to the userspace with big chunks (depending on the
dirty memory limits, but this is much more than 128k) which lets the FUSE daemons
handle the size updates in a more efficient manner.

The writeback feature is per-connection and is explicitly configurable at the
init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is
turned ON:

* still copy writeback pages to temporary buffer when sending a writeback request
  and finish the page writeback immediately

* make kernel maintain the inode's i_size to avoid frequent i_size synchronization
  with the user space

* take NR_WRITEBACK_TEMP into account when makeing balance_dirty_pages decision.
  This protects us from having too many dirty pages on FUSE

The provided patchset survives the fsx test. Performance measurements are not yet
all finished, but the mentioned copying of a huge file becomes noticeably faster
even on machines with few RAM and doesn't make the system stuck (the dirty pages
balancer does its work OK). Applies on top of v3.5-rc4.

We are currently exploring this with our own distributed storage implementation
which is heavily oriented on storing big blobs of data with extremely rare meta-data
updates (virtual machines' and containers' disk images). With the existing cache
policy a typical usage scenario -- copying a big VM disk into a cloud -- takes way
too much time to proceed, much longer than if it was simply scp-ed over the same
network. The write-back policy (as I mentioned) noticeably improves this scenario.
Kirill (in Cc) can share more details about the performance and the storage concepts
details if required.

Changed in v2:
 - numerous bugfixes:
   - fuse_write_begin and fuse_writepages_fill and fuse_writepage_locked must wait
     on page writeback because page writeback can extend beyond the lifetime of
     the page-cache page
   - fuse_send_writepages can end_page_writeback on original page only after adding
     request to fi->writepages list; otherwise another writeback may happen inside
     the gap between end_page_writeback and adding to the list
   - fuse_direct_io must wait on page writeback; otherwise data corruption is possible
     due to reordering requests
   - fuse_flush must flush dirty memory and wait for all writeback on given inode
     before sending FUSE_FLUSH to userspace; otherwise FUSE_FLUSH is not reliable
   - fuse_file_fallocate must hold i_mutex around FUSE_FALLOCATE and i_size update;
     otherwise a race with a writer extending i_size is possible
   - fix handling errors in fuse_writepages and fuse_send_writepages
 - handle i_mtime intelligently if writeback cache is on (see patch #7 (update i_mtime
   on buffered writes) for details.
 - put enabling writeback cache under fusermount control; (see mount option
   'allow_wbcache' introduced by patch #13 (turn writeback cache on))
 - rebased on v3.7-rc5

Changed in v3:
 - rebased on for-next branch of the fuse tree

Thanks,
Maxim

---

Maxim V. Patlasov (14):
      fuse: Linking file to inode helper
      fuse: Getting file for writeback helper
      fuse: Prepare to handle short reads
      fuse: Prepare to handle multiple pages in writeback
      fuse: Connection bit for enabling writeback
      fuse: Trust kernel i_size only - v2
      fuse: Update i_mtime on buffered writes
      fuse: Flush files on wb close
      fuse: Implement writepages and write_begin/write_end callbacks - v2
      fuse: fuse_writepage_locked() should wait on writeback
      fuse: fuse_flush() should wait on writeback
      fuse: Fix O_DIRECT operations vs cached writeback misorder - v2
      fuse: Turn writeback cache on
      mm: Account for WRITEBACK_TEMP in balance_dirty_pages


 fs/fuse/cuse.c            |    5 
 fs/fuse/dir.c             |   51 +++-
 fs/fuse/file.c            |  567 +++++++++++++++++++++++++++++++++++++++++----
 fs/fuse/fuse_i.h          |   33 ++-
 fs/fuse/inode.c           |   98 ++++++++
 include/uapi/linux/fuse.h |    2 
 mm/page-writeback.c       |    3 
 7 files changed, 696 insertions(+), 63 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v2 00/14] fuse: An attempt to implement a write-back cache policy
@ 2012-11-16 17:04 Maxim Patlasov
  2012-11-16 17:10 ` [PATCH 11/14] fuse: fuse_flush() should wait on writeback Maxim Patlasov
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Patlasov @ 2012-11-16 17:04 UTC (permalink / raw)
  To: miklos
  Cc: dev, fuse-devel, linux-kernel, jbottomley, viro, linux-fsdevel, xemul

Hi,

This is the second iteration of Pavel Emelyanov's patch-set implementing
write-back policy for FUSE page cache. Initial patch-set description was
the following:

One of the problems with the existing FUSE implementation is that it uses the
write-through cache policy which results in performance problems on certain
workloads. E.g. when copying a big file into a FUSE file the cp pushes every
128k to the userspace synchronously. This becomes a problem when the userspace
back-end uses networking for storing the data.

A good solution of this is switching the FUSE page cache into a write-back policy.
With this file data are pushed to the userspace with big chunks (depending on the
dirty memory limits, but this is much more than 128k) which lets the FUSE daemons
handle the size updates in a more efficient manner.

The writeback feature is per-connection and is explicitly configurable at the
init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is
turned ON:

* still copy writeback pages to temporary buffer when sending a writeback request
  and finish the page writeback immediately

* make kernel maintain the inode's i_size to avoid frequent i_size synchronization
  with the user space

* take NR_WRITEBACK_TEMP into account when makeing balance_dirty_pages decision.
  This protects us from having too many dirty pages on FUSE

The provided patchset survives the fsx test. Performance measurements are not yet
all finished, but the mentioned copying of a huge file becomes noticeably faster
even on machines with few RAM and doesn't make the system stuck (the dirty pages
balancer does its work OK). Applies on top of v3.5-rc4.

We are currently exploring this with our own distributed storage implementation
which is heavily oriented on storing big blobs of data with extremely rare meta-data
updates (virtual machines' and containers' disk images). With the existing cache
policy a typical usage scenario -- copying a big VM disk into a cloud -- takes way
too much time to proceed, much longer than if it was simply scp-ed over the same
network. The write-back policy (as I mentioned) noticeably improves this scenario.
Kirill (in Cc) can share more details about the performance and the storage concepts
details if required.

Changed in v2:
 - numerous bugfixes:
   - fuse_write_begin and fuse_writepages_fill and fuse_writepage_locked must wait
     on page writeback because page writeback can extend beyond the lifetime of
     the page-cache page
   - fuse_send_writepages can end_page_writeback on original page only after adding
     request to fi->writepages list; otherwise another writeback may happen inside
     the gap between end_page_writeback and adding to the list
   - fuse_direct_io must wait on page writeback; otherwise data corruption is possible
     due to reordering requests
   - fuse_flush must flush dirty memory and wait for all writeback on given inode
     before sending FUSE_FLUSH to userspace; otherwise FUSE_FLUSH is not reliable
   - fuse_file_fallocate must hold i_mutex around FUSE_FALLOCATE and i_size update;
     otherwise a race with a writer extending i_size is possible
   - fix handling errors in fuse_writepages and fuse_send_writepages
 - handle i_mtime intelligently if writeback cache is on (see patch #7 (update i_mtime
   on buffered writes) for details.
 - put enabling writeback cache under fusermount control; (see mount option
   'allow_wbcache' introduced by patch #13 (turn writeback cache on))
 - rebased on v3.7-rc5

Thanks,
Maxim

---

Maxim Patlasov (14):
      fuse: Linking file to inode helper
      fuse: Getting file for writeback helper
      fuse: Prepare to handle short reads
      fuse: Prepare to handle multiple pages in writeback
      fuse: Connection bit for enabling writeback
      fuse: Trust kernel i_size only
      fuse: Update i_mtime on buffered writes
      fuse: Flush files on wb close
      fuse: Implement writepages and write_begin/write_end callbacks
      fuse: fuse_writepage_locked() should wait on writeback
      fuse: fuse_flush() should wait on writeback
      fuse: Fix O_DIRECT operations vs cached writeback misorder
      fuse: Turn writeback cache on
      mm: Account for WRITEBACK_TEMP in balance_dirty_pages


 fs/fuse/dir.c             |   51 ++++
 fs/fuse/file.c            |  523 +++++++++++++++++++++++++++++++++++++++++----
 fs/fuse/fuse_i.h          |   20 ++
 fs/fuse/inode.c           |   98 ++++++++
 include/uapi/linux/fuse.h |    1 
 mm/page-writeback.c       |    3 
 6 files changed, 638 insertions(+), 58 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-06-14 14:03 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-01 10:40 [PATCH v4 00/14] fuse: An attempt to implement a write-back cache policy Maxim V. Patlasov
2013-04-01 10:40 ` [PATCH 01/14] fuse: Linking file to inode helper Maxim V. Patlasov
2013-04-01 10:40 ` [PATCH 02/14] fuse: Getting file for writeback helper Maxim V. Patlasov
2013-04-01 10:41 ` [PATCH 03/14] fuse: Prepare to handle short reads Maxim V. Patlasov
2013-04-01 10:41 ` [PATCH 04/14] fuse: Prepare to handle multiple pages in writeback Maxim V. Patlasov
2013-04-25 10:22   ` Miklos Szeredi
2013-04-01 10:41 ` [PATCH 05/14] fuse: Connection bit for enabling writeback Maxim V. Patlasov
2013-04-01 10:41 ` [PATCH 06/14] fuse: Trust kernel i_size only - v3 Maxim V. Patlasov
2013-04-01 10:41 ` [PATCH 07/14] fuse: Trust kernel i_mtime only Maxim V. Patlasov
2013-04-01 10:41 ` [PATCH 08/14] fuse: Flush files on wb close Maxim V. Patlasov
2013-04-01 10:42 ` [PATCH 09/14] fuse: Implement writepages and write_begin/write_end callbacks - v3 Maxim V. Patlasov
2013-04-25 10:35   ` Miklos Szeredi
2013-06-14 14:03     ` Maxim Patlasov
2013-04-01 10:42 ` [PATCH 10/14] fuse: fuse_writepage_locked() should wait on writeback Maxim V. Patlasov
2013-04-01 10:42 ` [PATCH 11/14] fuse: fuse_flush() " Maxim V. Patlasov
2013-04-01 10:42 ` [PATCH 12/14] fuse: Fix O_DIRECT operations vs cached writeback misorder - v2 Maxim V. Patlasov
2013-04-01 10:42 ` [PATCH 13/14] fuse: Turn writeback cache on Maxim V. Patlasov
2013-04-01 10:42 ` [PATCH 14/14] mm: Account for WRITEBACK_TEMP in balance_dirty_pages Maxim V. Patlasov
2013-04-25 14:29   ` [fuse-devel] " Maxim V. Patlasov
2013-04-25 15:49     ` Miklos Szeredi
2013-04-25 16:16       ` Maxim V. Patlasov
2013-04-25 20:43         ` Miklos Szeredi
2013-04-26  8:32           ` Maxim V. Patlasov
2013-04-26 14:02             ` Miklos Szeredi
2013-04-26 17:44               ` Maxim V. Patlasov
2013-05-07 11:39                 ` Miklos Szeredi
2013-04-11 11:18 ` [fuse-devel] [PATCH v4 00/14] fuse: An attempt to implement a write-back cache policy Maxim V. Patlasov
2013-04-11 14:36   ` Miklos Szeredi
  -- strict thread matches above, loose matches on Subject: below --
2013-01-25 18:20 [PATCH v3 " Maxim V. Patlasov
2013-01-25 18:26 ` [PATCH 11/14] fuse: fuse_flush() should wait on writeback Maxim V. Patlasov
2012-11-16 17:04 [PATCH v2 00/14] fuse: An attempt to implement a write-back cache policy Maxim Patlasov
2012-11-16 17:10 ` [PATCH 11/14] fuse: fuse_flush() should wait on writeback Maxim Patlasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).