linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@gmail.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	sfrench@samba.org, Anna Schumaker <anna.schumaker@netapp.com>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	fengxiaoli0714@gmail.com, fstests@vger.kernel.org,
	Murphy Zhou <xzhou@redhat.com>, CIFS <linux-cifs@vger.kernel.org>,
	linux-nfs <linux-nfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test
Date: Fri, 31 May 2019 11:35:43 -0400	[thread overview]
Message-ID: <CAABAsM6OtgsWS4VVTouDfPLePWE_JAyozwzUn3O-HSMF64J7wg@mail.gmail.com> (raw)
In-Reply-To: <CAN-5tyFoNJrQTn1ZkNsmaBd=yn2kWpZObGe8ka6CDFxcHaXP6w@mail.gmail.com>

On Fri, 31 May 2019 at 11:25, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Thu, May 30, 2019 at 12:02 PM Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> >
> > Hi everyone,
> >
> > Murphy Zhou sent a patch to generic/517 in fstests to fix a dedupe
> > failure he was seeing on NFS:
> >
> > On Thu, May 30, 2019 at 05:41:47PM +0800, Murphy Zhou wrote:
> > > NFSv4.2 could pass _require_scratch_dedupe, since the test offset and
> > > size are aligned, while generic/517 is performing unaligned dedupe.
> > > NFS does not support unaligned dedupe now, returns EINVAL.
> > >
> > > Signed-off-by: Murphy Zhou <xzhou@redhat.com>
> > > ---
> > >  tests/generic/517 | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/tests/generic/517 b/tests/generic/517
> > > index 601bb24e..23665782 100755
> > > --- a/tests/generic/517
> > > +++ b/tests/generic/517
> > > @@ -30,6 +30,7 @@ _cleanup()
> > >  _supported_fs generic
> > >  _supported_os Linux
> > >  _require_scratch_dedupe
> > > +$FSTYP == "nfs"  && _notrun "NFS can't handle unaligned deduplication"
> >
> > I was surprised to see a dedupe fix for NFS since (at least to my
> > knowledge) neither of these two network filesystems actually support
> > server-side deduplication commands, and therefore the
> > _require_scratch_dedupe should have _notrun the test.
> >
> > Then I looked at fs/nfs/nfs4file.c:
> >
> > static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
> >                 struct file *dst_file, loff_t dst_off, loff_t count,
> >                 unsigned int remap_flags)
> > {
> >         <local variable declarations>
> >
> >         if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> >                 return -EINVAL;
> >
> >         <check alignment, lock inodes, flush pending writes>
> >
> >         ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
> >
> > The NFS client code will accept REMAP_FILE_DEDUP through remap_flags,
> > which is how dedupe requests are sent to filesystems nowadays.  The nfs
> > client code does not itself compare the file contents, but it does issue
> > a CLONE command to the NFS server.  The end result, AFAICT, is that a
> > user program can write 'A's to file1, 'B's to file2, issue a dedup
> > ioctl to the kernel, and have a block of 'B's mapped into file1.  That's
> > broken behavior, according to the dedup ioctl manpage.
> >
> > Notice how remap_flags is checked but is not included in the
> > nfs42_proc_clone call?  That's how I conclude that the NFS client cannot
> > possibly be sending the dedup request to the server.
> >
> > The same goes for fs/cifs/cifsfs.c:
> >
> > static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
> >                 struct file *dst_file, loff_t destoff, loff_t len,
> >                 unsigned int remap_flags)
> > {
> >         <local variable declarations>
> >
> >         if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> >                 return -EINVAL;
> >
> >         <check files, lock inodes, flush pages>
> >
> >         if (target_tcon->ses->server->ops->duplicate_extents)
> >                 rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> >                         smb_file_src, smb_file_target, off, len, destoff);
> >         else
> >                 rc = -EOPNOTSUPP;
> >
> > Again, remap_flags is checked here but it has no influence over the
> > ->duplicate_extents call.
> >
> > Next I got to thinking that when I reworked the clone/dedupe code last
> > year, I didn't include REMAP_FILE_DEDUP support for cifs or nfs, because
> > as far as I knew, neither protocol supports a verb for deduplication.
> > The remap_flags checks were modified to allow REMAP_FILE_DEDUP in
> > commits ce96e888fe48e (NFS) and b073a08016a10 (CIFS) with this
> > justification (the cifs commit has a similar message):
> >
> > "Subject: Fix nfs4.2 return -EINVAL when do dedupe operation
> >
> > "dedupe_file_range operations is combiled into remap_file_range.
> > "    But in nfs42_remap_file_range, it's skiped for dedupe operations.
> > "    Before this patch:
> > "      # dd if=/dev/zero of=nfs/file bs=1M count=1
> > "      # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
> > "      XFS_IOC_FILE_EXTENT_SAME: Invalid argument
> > "    After this patch:
> > "      # dd if=/dev/zero of=nfs/file bs=1M count=1
> > "      # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file
> > "      deduped 4096/4096 bytes at offset 65536
> > "      4 KiB, 1 ops; 0.0046 sec (865.988 KiB/sec and 216.4971 ops/sec)"
> >
> > This sort of looks like monkeypatching to make an error message go away.
> > One could argue that this ought to return EOPNOSUPP instead of EINVAL,
> > and maybe that's what should've happened.
> >
> > So, uh, do NFS and CIFS both support server-side dedupe now, or are
> > these patches just plain wrong?
> >
> > No, they're just wrong, because I can corrupt files like so on NFS:
> >
> > $ rm -rf urk moo
> > $ xfs_io -f -c "pwrite -S 0x58 0 31048" urk
> > wrote 31048/31048 bytes at offset 0
> > 30 KiB, 8 ops; 0.0000 sec (569.417 MiB/sec and 153846.1538 ops/sec)
> > $ xfs_io -f -c "pwrite -S 0x59 0 31048" moo
> > wrote 31048/31048 bytes at offset 0
> > 30 KiB, 8 ops; 0.0001 sec (177.303 MiB/sec and 47904.1916 ops/sec)
> > $ md5sum urk moo
> > 37d3713e5f9c4fe0f8a1f813b27cb284  urk
> > a5b6f953f27aa17e42450ff4674fa2df  moo
> > $ xfs_io -c "dedupe urk 0 0 4096" moo
> > deduped 4096/4096 bytes at offset 0
> > 4 KiB, 1 ops; 0.0012 sec (3.054 MiB/sec and 781.8608 ops/sec)
> > $ md5sum urk moo
> > 37d3713e5f9c4fe0f8a1f813b27cb284  urk
> > 2c992d70131c489da954f1d96d8c456e  moo
> >
> > (Not sure about cifs, since I don't have a Windows Server handy)
> >
> > I'm not an expert in CIFS or NFS, so I'm asking: do either support
> > dedupe or is this a kernel bug?
>
> NFS does not support dedupe and only supports cloning (whole) files.

That is not quite true. It does support range based cloning, and can
even support cloning parts of a file onto itself (as long as the
source and target ranges do not overlap). However it does not support
the kind of conditional cloning that I understand from Darrick is
needed for dedup.

Cheers
  Trond

      reply	other threads:[~2019-05-31 15:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190530094147.14512-1-xzhou@redhat.com>
     [not found] ` <20190530152606.GA5383@magnolia>
2019-05-30 15:58   ` NFS & CIFS support dedupe now?? Was: Re: [PATCH] generic/517: notrun on NFS due to unaligned dedupe in test Darrick J. Wong
2019-05-31 10:48     ` Aurélien Aptel
2019-05-31 13:28       ` Tom Talpey
2019-05-31 15:24     ` Olga Kornievskaia
2019-05-31 15:35       ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAABAsM6OtgsWS4VVTouDfPLePWE_JAyozwzUn3O-HSMF64J7wg@mail.gmail.com \
    --to=trondmy@gmail.com \
    --cc=aglo@umich.edu \
    --cc=anna.schumaker@netapp.com \
    --cc=darrick.wong@oracle.com \
    --cc=fengxiaoli0714@gmail.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sfrench@samba.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=xzhou@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).