linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Anna Schumaker'" <schumaker.anna@gmail.com>,
	"'Kinglong Mee'" <kinglongmee@gmail.com>,
	"'Trond Myklebust'" <trondmy@hammerspace.com>,
	<devel@lists.nfs-ganesha.org>, <linux-nfs@vger.kernel.org>
Subject: RE: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK
Date: Tue, 11 Sep 2018 16:20:47 -0700	[thread overview]
Message-ID: <00e301d44a26$12cd63d0$38682b70$@mindspring.com> (raw)
In-Reply-To: <6f906850076c15aedff11d721fe91d3279935224.camel@gmail.com>

> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
> > On 2018/9/11 20:57, Trond Myklebust wrote:
> > > On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
> > > > The latest ganesha/gluster supports seek according to,
> > > >
> > > >
> > >
> > > =
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sectio
> > > n-15.11
> > > >
> > > >    From the given sa_offset, find the next data_content4 of type
> > > > sa_what
> > > >    in the file.  If the server can not find a corresponding =
sa_what,
> > > >    then the status will still be NFS4_OK, but sr_eof would be
> > > > TRUE.  If
> > > >    the server can find the sa_what, then the sr_offset is the
> > > > start of
> > > >    that content.  If the sa_offset is beyond the end of the =
file,
> > > > then
> > > >    SEEK MUST return NFS4ERR_NXIO.
> > > >
> > > > For a file's filemap as,
> > > >
> > > > Part    1: HOLE 0x0000000000000000 ---> 0x0000000000600000
> > > > Part    2: DATA 0x0000000000600000 ---> 0x0000000000700000
> > > > Part    3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
> > > > SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
> > > > sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, =
SEEK_HOLE)
> > > > gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
> > > >
> > > > If an application depends the lseek result for data searching, =
it
> > > > may enter infinite loop.
> > > >
> > > >         while (1) {
> > > >                 next_pos =3D lseek(fd, cur_pos, seek_type);
> > > >                 if (seek_type =3D=3D SEEK_DATA) {
> > > >                         seek_type =3D SEEK_HOLE;
> > > >                 } else {
> > > >                         seek_type =3D SEEK_DATA;
> > > >                 }
> > > >
> > > >                 if (next_pos =3D=3D -1) {
> > > >                         return ;
> > > >
> > > >                 cur_pos =3D next_pos;
> > > >         }
> > > >
> > > > The lseek syscall always gets 0x70000 from nfs client for those
> > > > two cases, but, if underlying filesystem is ext4/f2fs, or the =
nfs
> > > > server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
> > > >
> > > > I wanna to know,
> > > > should I fix the ganesha/gluster as knfsd return ENXIO for the
> > > > first case?
> > > > or should I fix the nfs client to return ENXIO for the first =
case?
> > > >
> > >
> > > It that test correct? The fallback implementation of SEEK_DATA
> > > assumes that the entire file is data, so lseek(SEEK_DATA) on any
> > > offset that is <=3D eof will be a no-op. The fallback =
implementation
> > > of SEEK_HOLE assumes that the first hole is at eof.
> >
> > I think that's non-nfsv4.2's logical.
> >
> > >
> > > IOW: unless the initial value for cur_pos is > eof, it looks to me
> > > as if the above test will loop infinitely given any filesystem =
that
> > > doesn't implement native support for SEEK_DATA/SEEK_HOLE.
> > >
> >
> > No, if underlying filesystem is ext4/f2fs, or the nfs server is =
knfsd,
> > the last lseek syscall always return ENXIO no matter the cur_pos is =
>
> > eof or not.
> >
> > A file ends with a hole as,
> > Part   22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
> > Part   23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
> >
> > # stat testfile
> >   File: testfile
> >   Size: 209715200       Blocks: 22640      IO Block: 4096   regular =
file
> > Device: 807h/2055d      Inode: 843122      Links: 2
> > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    =
root)
> > Access: 2018-09-11 20:01:41.881227061 +0800
> > Modify: 2018-09-11 20:01:41.976478311 +0800
> > Change: 2018-09-11 20:01:41.976478311 +0800
> >  Birth: -
> >
> > # strace filemap testfile
> > ... ...
> > lseek(3, 111149056, SEEK_HOLE)          =3D 112197632
> > lseek(3, 112197632, SEEK_DATA)          =3D -1 ENXIO (No such device =
or address)
> >
> > Right now, when knfsd gets the ENXIO error, it returns the error to
> > client directly, and return to syscall.
> > But, ganesha set the sr_eof to true and return NFS4_OK to client as
> > RFC description, nfs client skips the sr_eof and return a bad offset
> > to syscall.
>=20
> Would it make more sense to change Knfsd instead of the client?  I =
think I was
> trying to keep things simple when I wrote the code, so I just passed =
the result of
> the lseek system call back to the client.

Looking at the lseek(2) man page, it's not clear to me what should be =
returned if as in this example, there is a HOLE at the end of the file =
(i.e. filesize is larger than the range of the last DATA in the file). =
It sounds like ext4 returns ENXIO if a SEEK_DATA is done past the last =
data in the file.

       SEEK_DATA
              Adjust the file offset to the next location in the file =
greater than or equal to offset containing data.  If offset points to =
data, then the file offset  is  set
              to offset.

       SEEK_HOLE
              Adjust  the file offset to the next hole in the file =
greater than or equal to offset.  If offset points into the middle of a =
hole, then the file offset is set to
              offset.  If there is no hole past offset, then the file =
offset is adjusted to the end of the file (i.e., there is an implicit =
hole at the end of any file).

       In both of the above cases, lseek() fails if offset points past =
the end of the file.

       These operations allow applications to map holes in a sparsely =
allocated file.  This can be useful for applications such as file backup =
tools, which can save space when
       creating backups and preserve holes, if they have a mechanism for =
discovering holes.

       For the purposes of these operations, a hole is a sequence of =
zeros that (normally) has not been allocated in the underlying file =
storage.  However, a filesystem is not
       obliged to report holes, so these operations are not a guaranteed =
mechanism for mapping the storage space actually allocated to a file.   =
(Furthermore,  a  sequence  of
       zeros  that actually has been written to the underlying storage =
may not be reported as a hole.)  In the simplest implementation, a =
filesystem can support the operations
       by making SEEK_HOLE always return the offset of the end of the =
file, and making SEEK_DATA always return offset (i.e., even if the =
location referred to by  offset  is  a
       hole, it can be considered to consist of data that is a sequence =
of zeros).

The RFC text is pretty clear:

   SEEK is an operation that allows a client to determine the location
   of the next data_content4 in a file.  It allows an implementation of
   the emerging extension to lseek(2) to allow clients to determine the
   next hole whilst in data or the next data whilst in a hole.

   From the given sa_offset, find the next data_content4 of type sa_what
   in the file.  If the server can not find a corresponding sa_what,
   then the status will still be NFS4_OK, but sr_eof would be TRUE.  If
   the server can find the sa_what, then the sr_offset is the start of
   that content.  If the sa_offset is beyond the end of the file, then
   SEEK MUST return NFS4ERR_NXIO.

   All files MUST have a virtual hole at the end of the file.  I.e., if
   a filesystem does not support sparse files, then a compound with
   {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
   where 'X' was the size of the file.

Sa_offset is not past the end of the file, but there is no more DATA, so =
a seek DATA from 0x70000 (original file) should return sr_eof TRUE.

In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.

It certainly makes sense that you should be able to have a hole at the =
end of a file (pre-allocated disk blocks but no data written yet), and =
is in fact what fallocate(2) will do.

An NFS server could check the filesize and if sa_offset is < filesize =
and a SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and =
set sr_eof to TRUE.

The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any =
ENXIO result to NFS4_OK with sr_eof TRUE. It would be better for it to =
do the simple thing knfsd does of always passing along the ENXIO (this =
may be best if it is not possible to safely verify sa_offset really is < =
filesize).

So Anna, I think your knfsd implementation is ok.

Frank

  reply	other threads:[~2018-09-12 21:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-11 12:29 lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK Kinglong Mee
2018-09-11 12:57 ` Trond Myklebust
2018-09-11 14:47   ` Kinglong Mee
2018-09-11 15:43     ` Anna Schumaker
2018-09-11 23:20       ` Frank Filz [this message]
2018-09-12  1:31         ` [NFS-Ganesha-Devel] " Kinglong Mee
2018-09-12 11:58           ` Frank Filz
2018-09-13  0:03             ` Kinglong Mee
2020-09-14 15:02               ` Frank Filz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='00e301d44a26$12cd63d0$38682b70$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=devel@lists.nfs-ganesha.org \
    --cc=kinglongmee@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=schumaker.anna@gmail.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).