From: ruansy.fnst <ruansy.fnst@fujitsu.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
Christoph Hellwig <hch@lst.de>,
linux-xfs <linux-xfs@vger.kernel.org>,
david <david@fromorbit.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
Goldwyn Rodrigues <rgoldwyn@suse.de>,
Al Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v7 2/8] fsdax: Introduce dax_iomap_cow_copy()
Date: Fri, 20 Aug 2021 13:59:05 +0800 [thread overview]
Message-ID: <052a38ba-9c47-fc0a-61bf-6e6c8765ca8b@fujitsu.com> (raw)
In-Reply-To: <CAPcyv4h0dukvcxN4Bc5a-jHk2FQ-j7ay9P1AB0wq9pNNSBU8-A@mail.gmail.com>
On 2021/8/20 上午6:35, Dan Williams wrote:
> On Sun, Aug 15, 2021 at 11:04 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:
>>
>> In the case where the iomap is a write operation and iomap is not equal
>> to srcmap after iomap_begin, we consider it is a CoW operation.
>>
>> The destance extent which iomap indicated is new allocated extent.
>
> s/destance/destination/
>
> That sentence is still hard to grok though, did it mean to say:
>
> "In this case, the destination (iomap->addr) points to a newly
> allocated extent."
>
>> So, it is needed to copy the data from srcmap to new allocated extent.
>> In theory, it is better to copy the head and tail ranges which is
>> outside of the non-aligned area instead of copying the whole aligned
>> range. But in dax page fault, it will always be an aligned range. So,
>> we have to copy the whole range in this case.
>
> s/we have to copy/copy/
>
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> ---
>> fs/dax.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 84 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index 9fb6218f42be..697a7b7bb96f 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -1050,6 +1050,61 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
>> return rc;
>> }
>>
>> +/**
>> + * dax_iomap_cow_copy(): Copy the data from source to destination before write.
>
> s/():/() -/
>
> ...to be kernel-doc compliant
>
>> + * @pos: address to do copy from.
>> + * @length: size of copy operation.
>> + * @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
>> + * @srcmap: iomap srcmap
>> + * @daddr: destination address to copy to.
>> + *
>> + * This can be called from two places. Either during DAX write fault, to copy
>> + * the length size data to daddr. Or, while doing normal DAX write operation,
>> + * dax_iomap_actor() might call this to do the copy of either start or end
>> + * unaligned address. In this case the rest of the copy of aligned ranges is
>
> Which "this", the latter, or the former? Looks like the latter.
>
> "In the latter case the rest of the copy..."
>
>> + * taken care by dax_iomap_actor() itself.
>> + * Also, note DAX fault will always result in aligned pos and pos + length.
>
> Perhaps drop this sentence and just say:
>
> "Either during DAX write fault (page aligned), to copy..."
>
> ...in that earlier sentence so this comment flows better.
>
>> + */
>> +static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
>> + const struct iomap *srcmap, void *daddr)
>> +{
>> + loff_t head_off = pos & (align_size - 1);
>> + size_t size = ALIGN(head_off + length, align_size);
>> + loff_t end = pos + length;
>> + loff_t pg_end = round_up(end, align_size);
>> + bool copy_all = head_off == 0 && end == pg_end;
>> + void *saddr = 0;
>> + int ret = 0;
>> +
>> + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
>> + if (ret)
>> + return ret;
>> +
>> + if (copy_all) {
>> + ret = copy_mc_to_kernel(daddr, saddr, length);
>> + return ret ? -EIO : 0;
>> + }
>> +
>> + /* Copy the head part of the range. Note: we pass offset as length. */
>
> I've re-read this a few times and this comment is not helping, why is
> the offset used as the copy length?
I forgot to update the comment. It should be 'head_off', which is
passed as the length for copy_mc_to_kernel(). It is the head part we
need to COPY before write.
--
Thanks,
Ruan.
>
>> + if (head_off) {
>> + ret = copy_mc_to_kernel(daddr, saddr, head_off);
>> + if (ret)
>> + return -EIO;
>> + }
>> +
>> + /* Copy the tail part of the range */
>> + if (end < pg_end) {
>> + loff_t tail_off = head_off + length;
>> + loff_t tail_len = pg_end - end;
>> +
>> + ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
>> + tail_len);
>> + if (ret)
>> + return -EIO;
>> + }
>> + return 0;
>> +}
>> +
>> /*
>> * The user has performed a load from a hole in the file. Allocating a new
>> * page in the file would cause excessive storage usage for workloads with
>> @@ -1175,16 +1230,18 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>> struct iov_iter *iter)
>> {
>> const struct iomap *iomap = &iomi->iomap;
>> + const struct iomap *srcmap = &iomi->srcmap;
>> loff_t length = iomap_length(iomi);
>> loff_t pos = iomi->pos;
>> struct block_device *bdev = iomap->bdev;
>> struct dax_device *dax_dev = iomap->dax_dev;
>> loff_t end = pos + length, done = 0;
>> + bool write = iov_iter_rw(iter) == WRITE;
>> ssize_t ret = 0;
>> size_t xfer;
>> int id;
>>
>> - if (iov_iter_rw(iter) == READ) {
>> + if (!write) {
>> end = min(end, i_size_read(iomi->inode));
>> if (pos >= end)
>> return 0;
>> @@ -1193,7 +1250,12 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>> return iov_iter_zero(min(length, end - pos), iter);
>> }
>>
>> - if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
>> + /*
>> + * In DAX mode, we allow either pure overwrites of written extents, or
>
> s/we allow/enforce/
>
>> + * writes to unwritten extents as part of a copy-on-write operation.
>> + */
>> + if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
>> + !(iomap->flags & IOMAP_F_SHARED)))
>> return -EIO;
>>
>> /*
>> @@ -1232,6 +1294,14 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>> break;
>> }
>>
>> + if (write &&
>> + srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
>> + ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
>> + kaddr);
>> + if (ret)
>> + break;
>> + }
>> +
>> map_len = PFN_PHYS(map_len);
>> kaddr += offset;
>> map_len -= offset;
>> @@ -1243,7 +1313,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>> * validated via access_ok() in either vfs_read() or
>> * vfs_write(), depending on which operation we are doing.
>> */
>> - if (iov_iter_rw(iter) == WRITE)
>> + if (write)
>> xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
>> map_len, iter);
>> else
>> @@ -1385,6 +1455,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>> {
>> struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>> const struct iomap *iomap = &iter->iomap;
>> + const struct iomap *srcmap = &iter->srcmap;
>> size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
>> loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
>> bool write = vmf->flags & FAULT_FLAG_WRITE;
>> @@ -1392,6 +1463,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>> unsigned long entry_flags = pmd ? DAX_PMD : 0;
>> int err = 0;
>> pfn_t pfn;
>> + void *kaddr;
>>
>> if (!pmd && vmf->cow_page)
>> return dax_fault_cow_page(vmf, iter);
>> @@ -1404,18 +1476,25 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>> return dax_pmd_load_hole(xas, vmf, iomap, entry);
>> }
>>
>> - if (iomap->type != IOMAP_MAPPED) {
>> + if (iomap->type != IOMAP_MAPPED && !(iomap->flags & IOMAP_F_SHARED)) {
>> WARN_ON_ONCE(1);
>> return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
>> }
>>
>> - err = dax_iomap_direct_access(&iter->iomap, pos, size, NULL, &pfn);
>> + err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
>> if (err)
>> return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
>>
>> *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags,
>> write && !sync);
>>
>> + if (write &&
>> + srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
>> + err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
>> + if (err)
>> + return dax_fault_return(err);
>> + }
>> +
>> if (sync)
>> return dax_fault_synchronous_pfnp(pfnp, pfn);
>>
>> --
>> 2.32.0
>>
>>
>>
next prev parent reply other threads:[~2021-08-20 5:59 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-16 6:03 [PATCH v7 0/8] fsdax,xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
2021-08-16 6:03 ` [PATCH v7 1/8] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
2021-08-18 21:01 ` Dan Williams
2021-08-16 6:03 ` [PATCH v7 2/8] fsdax: Introduce dax_iomap_cow_copy() Shiyang Ruan
2021-08-19 22:35 ` Dan Williams
2021-08-20 5:59 ` ruansy.fnst [this message]
2021-08-16 6:03 ` [PATCH v7 3/8] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
2021-08-19 22:54 ` Dan Williams
2021-08-23 12:57 ` Christoph Hellwig
2021-08-27 3:22 ` Shiyang Ruan
2021-08-27 5:00 ` Dan Williams
2021-08-27 5:26 ` Shiyang Ruan
2021-08-16 6:03 ` [PATCH v7 4/8] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero Shiyang Ruan
2021-08-20 2:39 ` Dan Williams
2021-08-27 3:23 ` Shiyang Ruan
2021-08-16 6:03 ` [PATCH v7 5/8] iomap: Introduce iomap_iter2 for two files Shiyang Ruan
2021-08-23 12:58 ` Christoph Hellwig
2021-08-16 6:03 ` [PATCH v7 6/8] fsdax: Dedup file range to use a compare function Shiyang Ruan
2021-08-23 13:16 ` Christoph Hellwig
2021-08-16 6:03 ` [PATCH v7 7/8] fsdax: Introduce dax_iomap_ops for end of reflink Shiyang Ruan
2021-08-20 3:01 ` Dan Williams
2021-08-20 6:13 ` ruansy.fnst
2021-08-20 15:18 ` Dan Williams
2021-08-23 13:02 ` Christoph Hellwig
2021-08-27 3:29 ` Shiyang Ruan
2021-08-27 5:04 ` Dan Williams
2021-08-27 5:27 ` Shiyang Ruan
2021-08-16 6:03 ` [PATCH v7 8/8] fs/xfs: Add dax dedupe support Shiyang Ruan
2021-08-20 3:08 ` Dan Williams
2021-08-27 3:36 ` Shiyang Ruan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=052a38ba-9c47-fc0a-61bf-6e6c8765ca8b@fujitsu.com \
--to=ruansy.fnst@fujitsu.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=rgoldwyn@suse.de \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).