All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: linux-fsdevel@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org, Goldwyn Rodrigues <rgoldwyn@suse.com>
Subject: [RFC PATCH 0/5] Shared memory for shared extents
Date: Fri, 22 Oct 2021 15:15:00 -0500	[thread overview]
Message-ID: <cover.1634933121.git.rgoldwyn@suse.com> (raw)

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This is an attempt to reduce the memory footprint by using a shared
page(s) for shared extent(s) in the filesystem. I am hoping to start a
discussion to iron out the details for implementation.

Abstract
If mutiple files share an extent, reads from each of the files would
read individual page copies in the inode pagecache mapping, leading to
waste of memory. Instead add the read pages of the filesystem to
underlying device's bd_inode as opposed to file's inode mapping. The
cost is that file offset to device offset conversion happens early in
the read cycle.

Motivation:
 - Reduce memory usage for shared extents
 - Ease DAX implementation for shared extents
 - Reduce Container memory utilization

Implementation
In the read path, pages are read and added to the block_device's
inode's mapping as opposed to the inode's mapping. This is limited
to reads, while write's (and read before writes) still go through
inode's i_mapping. The path does check the inode's i_mapping before
falling back to block device's i_mapping to read pages which may be
dirty. The cost of the operation is file_to_device_offset() translation
on reads. The translation should return a valid value only in case
the file is CoW.

This also means that page->mapping may not point to file's mapping.

Questions:
1. Are there security implications for performing this for read-only
pages? An alternate idea is to add a "struct fspage", which would be
pointed by file's mapping and would point to the block device's page.
Multiple files with shared extents have their own independent fspage
pointing to the same page mapped to block device's mapping.
Any security parameters, if required, would be in this structure. The
advantage of this approach is it would be more flexible with respect to
CoW when the page is dirtied after reads. With the current approach, a
read for write is an independent operation so we can end up with two
copies of the same page. This implementation got complicated too quickly.

2. Should pages be dropped after writebacks (and clone_range) to avoid
duplicate copies?

Limitations:
1. The filesystem have exactly one underlying device.
2. Page size should be equal to filesystem block size

Goldwyn Rodrigues (5):
  mm: Use file parameter to determine bdi
  mm: Switch mapping to device mapping
  btrfs: Add sharedext mount option
  btrfs: Set s_bdev for btrfs super block
  btrfs: function to convert file offset to device offset

 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/file.c    | 42 ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/super.c   |  7 +++++++
 include/linux/fs.h |  7 ++++++-
 mm/filemap.c       | 34 ++++++++++++++++++++++++++--------
 mm/readahead.c     |  3 +++
 6 files changed, 83 insertions(+), 11 deletions(-)

-- 
2.33.1


             reply	other threads:[~2021-10-22 20:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-22 20:15 Goldwyn Rodrigues [this message]
2021-10-22 20:15 ` [RFC PATCH 1/5] mm: Use file parameter to determine bdi Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 2/5] mm: Switch mapping to device mapping Goldwyn Rodrigues
2021-10-23  1:36   ` Matthew Wilcox
2021-10-22 20:15 ` [RFC PATCH 3/5] btrfs: Add sharedext mount option Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 4/5] btrfs: Set s_bdev for btrfs super block Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 5/5] btrfs: function to convert file offset to device offset Goldwyn Rodrigues
2021-10-23  1:43 ` [RFC PATCH 0/5] Shared memory for shared extents Matthew Wilcox
2021-10-25 14:53   ` Goldwyn Rodrigues
2021-10-25 15:43     ` Matthew Wilcox
2021-10-25 16:43       ` Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1634933121.git.rgoldwyn@suse.com \
    --to=rgoldwyn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.