nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: ruansy.fnst <ruansy.fnst@fujitsu.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	david <david@fromorbit.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Goldwyn Rodrigues <rgoldwyn@suse.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v7 2/8] fsdax: Introduce dax_iomap_cow_copy()
Date: Fri, 20 Aug 2021 13:59:05 +0800	[thread overview]
Message-ID: <052a38ba-9c47-fc0a-61bf-6e6c8765ca8b@fujitsu.com> (raw)
In-Reply-To: <CAPcyv4h0dukvcxN4Bc5a-jHk2FQ-j7ay9P1AB0wq9pNNSBU8-A@mail.gmail.com>



On 2021/8/20 上午6:35, Dan Williams wrote:
> On Sun, Aug 15, 2021 at 11:04 PM Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:
>>
>> In the case where the iomap is a write operation and iomap is not equal
>> to srcmap after iomap_begin, we consider it is a CoW operation.
>>
>> The destance extent which iomap indicated is new allocated extent.
> 
> s/destance/destination/
> 
> That sentence is still hard to grok though, did it mean to say:
> 
> "In this case, the destination (iomap->addr) points to a newly
> allocated extent."
> 
>> So, it is needed to copy the data from srcmap to new allocated extent.
>> In theory, it is better to copy the head and tail ranges which is
>> outside of the non-aligned area instead of copying the whole aligned
>> range. But in dax page fault, it will always be an aligned range.  So,
>> we have to copy the whole range in this case.
> 
> s/we have to copy/copy/
> 
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> ---
>>   fs/dax.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 84 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index 9fb6218f42be..697a7b7bb96f 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -1050,6 +1050,61 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
>>          return rc;
>>   }
>>
>> +/**
>> + * dax_iomap_cow_copy(): Copy the data from source to destination before write.
> 
> s/():/() -/
> 
> ...to be kernel-doc compliant
> 
>> + * @pos:       address to do copy from.
>> + * @length:    size of copy operation.
>> + * @align_size:        aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
>> + * @srcmap:    iomap srcmap
>> + * @daddr:     destination address to copy to.
>> + *
>> + * This can be called from two places. Either during DAX write fault, to copy
>> + * the length size data to daddr. Or, while doing normal DAX write operation,
>> + * dax_iomap_actor() might call this to do the copy of either start or end
>> + * unaligned address. In this case the rest of the copy of aligned ranges is
> 
> Which "this", the latter, or the former? Looks like the latter.
> 
> "In the latter case the rest of the copy..."
> 
>> + * taken care by dax_iomap_actor() itself.
>> + * Also, note DAX fault will always result in aligned pos and pos + length.
> 
> Perhaps drop this sentence and just say:
> 
> "Either during DAX write fault (page aligned), to copy..."
> 
> ...in that earlier sentence so this comment flows better.
> 
>> + */
>> +static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size,
>> +               const struct iomap *srcmap, void *daddr)
>> +{
>> +       loff_t head_off = pos & (align_size - 1);
>> +       size_t size = ALIGN(head_off + length, align_size);
>> +       loff_t end = pos + length;
>> +       loff_t pg_end = round_up(end, align_size);
>> +       bool copy_all = head_off == 0 && end == pg_end;
>> +       void *saddr = 0;
>> +       int ret = 0;
>> +
>> +       ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (copy_all) {
>> +               ret = copy_mc_to_kernel(daddr, saddr, length);
>> +               return ret ? -EIO : 0;
>> +       }
>> +
>> +       /* Copy the head part of the range.  Note: we pass offset as length. */
> 
> I've re-read this a few times and this comment is not helping, why is
> the offset used as the copy length?

I forgot to update the comment.  It should be 'head_off', which is 
passed as the length for copy_mc_to_kernel().  It is the head part we 
need to COPY before write.

--
Thanks,
Ruan.

> 
>> +       if (head_off) {
>> +               ret = copy_mc_to_kernel(daddr, saddr, head_off);
>> +               if (ret)
>> +                       return -EIO;
>> +       }
>> +
>> +       /* Copy the tail part of the range */
>> +       if (end < pg_end) {
>> +               loff_t tail_off = head_off + length;
>> +               loff_t tail_len = pg_end - end;
>> +
>> +               ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
>> +                                       tail_len);
>> +               if (ret)
>> +                       return -EIO;
>> +       }
>> +       return 0;
>> +}
>> +
>>   /*
>>    * The user has performed a load from a hole in the file.  Allocating a new
>>    * page in the file would cause excessive storage usage for workloads with
>> @@ -1175,16 +1230,18 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>>                  struct iov_iter *iter)
>>   {
>>          const struct iomap *iomap = &iomi->iomap;
>> +       const struct iomap *srcmap = &iomi->srcmap;
>>          loff_t length = iomap_length(iomi);
>>          loff_t pos = iomi->pos;
>>          struct block_device *bdev = iomap->bdev;
>>          struct dax_device *dax_dev = iomap->dax_dev;
>>          loff_t end = pos + length, done = 0;
>> +       bool write = iov_iter_rw(iter) == WRITE;
>>          ssize_t ret = 0;
>>          size_t xfer;
>>          int id;
>>
>> -       if (iov_iter_rw(iter) == READ) {
>> +       if (!write) {
>>                  end = min(end, i_size_read(iomi->inode));
>>                  if (pos >= end)
>>                          return 0;
>> @@ -1193,7 +1250,12 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>>                          return iov_iter_zero(min(length, end - pos), iter);
>>          }
>>
>> -       if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
>> +       /*
>> +        * In DAX mode, we allow either pure overwrites of written extents, or
> 
> s/we allow/enforce/
> 
>> +        * writes to unwritten extents as part of a copy-on-write operation.
>> +        */
>> +       if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
>> +                       !(iomap->flags & IOMAP_F_SHARED)))
>>                  return -EIO;
>>
>>          /*
>> @@ -1232,6 +1294,14 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>>                          break;
>>                  }
>>
>> +               if (write &&
>> +                   srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
>> +                       ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
>> +                                                kaddr);
>> +                       if (ret)
>> +                               break;
>> +               }
>> +
>>                  map_len = PFN_PHYS(map_len);
>>                  kaddr += offset;
>>                  map_len -= offset;
>> @@ -1243,7 +1313,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>>                   * validated via access_ok() in either vfs_read() or
>>                   * vfs_write(), depending on which operation we are doing.
>>                   */
>> -               if (iov_iter_rw(iter) == WRITE)
>> +               if (write)
>>                          xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
>>                                          map_len, iter);
>>                  else
>> @@ -1385,6 +1455,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>>   {
>>          struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>>          const struct iomap *iomap = &iter->iomap;
>> +       const struct iomap *srcmap = &iter->srcmap;
>>          size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
>>          loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
>>          bool write = vmf->flags & FAULT_FLAG_WRITE;
>> @@ -1392,6 +1463,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>>          unsigned long entry_flags = pmd ? DAX_PMD : 0;
>>          int err = 0;
>>          pfn_t pfn;
>> +       void *kaddr;
>>
>>          if (!pmd && vmf->cow_page)
>>                  return dax_fault_cow_page(vmf, iter);
>> @@ -1404,18 +1476,25 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
>>                  return dax_pmd_load_hole(xas, vmf, iomap, entry);
>>          }
>>
>> -       if (iomap->type != IOMAP_MAPPED) {
>> +       if (iomap->type != IOMAP_MAPPED && !(iomap->flags & IOMAP_F_SHARED)) {
>>                  WARN_ON_ONCE(1);
>>                  return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
>>          }
>>
>> -       err = dax_iomap_direct_access(&iter->iomap, pos, size, NULL, &pfn);
>> +       err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
>>          if (err)
>>                  return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
>>
>>          *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags,
>>                                    write && !sync);
>>
>> +       if (write &&
>> +           srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
>> +               err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
>> +               if (err)
>> +                       return dax_fault_return(err);
>> +       }
>> +
>>          if (sync)
>>                  return dax_fault_synchronous_pfnp(pfnp, pfn);
>>
>> --
>> 2.32.0
>>
>>
>>



  reply	other threads:[~2021-08-20  5:59 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16  6:03 [PATCH v7 0/8] fsdax,xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
2021-08-16  6:03 ` [PATCH v7 1/8] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
2021-08-18 21:01   ` Dan Williams
2021-08-16  6:03 ` [PATCH v7 2/8] fsdax: Introduce dax_iomap_cow_copy() Shiyang Ruan
2021-08-19 22:35   ` Dan Williams
2021-08-20  5:59     ` ruansy.fnst [this message]
2021-08-16  6:03 ` [PATCH v7 3/8] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
2021-08-19 22:54   ` Dan Williams
2021-08-23 12:57     ` Christoph Hellwig
2021-08-27  3:22       ` Shiyang Ruan
2021-08-27  5:00         ` Dan Williams
2021-08-27  5:26           ` Shiyang Ruan
2021-08-16  6:03 ` [PATCH v7 4/8] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero Shiyang Ruan
2021-08-20  2:39   ` Dan Williams
2021-08-27  3:23     ` Shiyang Ruan
2021-08-16  6:03 ` [PATCH v7 5/8] iomap: Introduce iomap_iter2 for two files Shiyang Ruan
2021-08-23 12:58   ` Christoph Hellwig
2021-08-16  6:03 ` [PATCH v7 6/8] fsdax: Dedup file range to use a compare function Shiyang Ruan
2021-08-23 13:16   ` Christoph Hellwig
2021-08-16  6:03 ` [PATCH v7 7/8] fsdax: Introduce dax_iomap_ops for end of reflink Shiyang Ruan
2021-08-20  3:01   ` Dan Williams
2021-08-20  6:13     ` ruansy.fnst
2021-08-20 15:18       ` Dan Williams
2021-08-23 13:02         ` Christoph Hellwig
2021-08-27  3:29         ` Shiyang Ruan
2021-08-27  5:04           ` Dan Williams
2021-08-27  5:27             ` Shiyang Ruan
2021-08-16  6:03 ` [PATCH v7 8/8] fs/xfs: Add dax dedupe support Shiyang Ruan
2021-08-20  3:08   ` Dan Williams
2021-08-27  3:36     ` Shiyang Ruan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=052a38ba-9c47-fc0a-61bf-6e6c8765ca8b@fujitsu.com \
    --to=ruansy.fnst@fujitsu.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rgoldwyn@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).