linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "crispyduck@outlook.at" <crispyduck@outlook.at>
To: "J. Bruce Fields" <bfields@fieldses.org>,
	Rick Macklem <rmacklem@uoguelph.ca>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: AW: Problems with NFS4.1 on ESXi
Date: Fri, 22 Apr 2022 18:43:23 +0000	[thread overview]
Message-ID: <AM9P191MB1665D6D3AF49AF0DFCAD31698EF79@AM9P191MB1665.EURP191.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20220422151534.GA29913@fieldses.org>

Output of exportfs -v:
sync,wdelay,hide,crossmnt,no_subtree_check,fsid=74345722,mountpoint,sec=sys,rw,secure,no_root_squash,no_all_squash

ESXi only supports NFS3 and NFS4.1, NFS4 is not supported, no idea why, think thy only somehow implemented 4.1 for session trunking and kerberos.

Br,
Andreas




Von: J. Bruce Fields <bfields@fieldses.org>
Gesendet: Freitag, 22. April 2022 17:15
An: Rick Macklem <rmacklem@uoguelph.ca>
Cc: crispyduck@outlook.at <crispyduck@outlook.at>; Chuck Lever III <chuck.lever@oracle.com>; Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Betreff: Re: Problems with NFS4.1 on ESXi 
 
On Thu, Apr 21, 2022 at 11:52:32PM +0000, Rick Macklem wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> [stuff snipped]
> > On Thu, Apr 21, 2022 at 12:40:49PM -0400, bfields wrote:
> > >
> > >
> > > Stale filehandles aren't normal, and suggest some bug or
> > > misconfiguration on the server side, either in NFS or the exported
> > > filesystem.
> > 
> > Actually, I should take that back: if one client removes files while a
> > second client is using them, it'd be normal for applications on that
> > second client to see ESTALE.
> I took a look at crispyduck's packet trace and here's what I saw:
> Packet#
> 48 Lookup of test-ovf.vmx
> 49 NFS_OK FH is 0x7c9ce14b (the hash)
> ...
> 51 Open Claim_FH fo 0x7c9ce14b
> 52 NFS_OK Open Stateid 0x35be
> ...
> 138 Rename test-ovf.vmx~ to test-ovf.vmx
> 139 NFS_OK
> ...
> 141 Close with PutFH 0x7c9ce14b
> 142 NFS4ERR_STALE for the PutFH
> 
> So, it seems that the Rename will delete the file (names another file to the
> same name "test-ovf.vmx".  Then the subsequent Close's PutFH fails,
> because the file for the FH has been deleted.

Actually (sorry I'm slow to understand this)--why would our 4.1 server
ever be returning STALE on a close?  We normally hold a reference to the
file.

Oh, wait, is subtree_check set on the export?  You don't want to do
that.  (The freebsd server probably doesn't even give that as an
option?)

--b.

> 
> Looks like yet another ESXi client bug to me?
> (I've seen assorted other ones, but not this one. I have no idea how this
>  might work on a FreeBSD server. I can only assume the RPC sequence
>  ends up different for FreeBSD for some reason? Maybe the Close gets
>  processed before the Rename? I didn't look at the Sequence args for
>  these RPCs to see if they use different slots.)
> 
> 
> > So it might be interesting to know what actually happens when VM
> > templates are imported.
> If you look at the packet trace, somewhat weird, like most things for this
> client. It does a Lookup of the same file name over and over again, for
> example.
> 
> > I suppose you could also try NFSv4.0 or try varying kernel versions to
> > try to narrow down the problem.
> I think it only does NFSv4.1.
> I've tried to contact the VMware engineers, but never had any luck.
> I wish they'd show up at a bakeathon, but...
> 
> > No easy ideas off the top of my head, sorry.
> I once posted a list of problems I had found with ESXi 6.5 to a FreeBSD
> mailing list and someone who worked for VMware cut/pasted it into their
> problem database.  They responded to him with "might be fixed in a future
> release" and, indeed, they were fixed in ESXi 6.7, so if you can get this to
> them, they might fix it?
> 
> rick
> 
> --b.
> 
> > Figuring out more than that would require more
> > investigation.
> >
> > --b.
> >
> > >
> > > Br,
> > > Andi
> > >
> > >
> > >
> > >
> > >
> > >
> > > Von: Chuck Lever III <chuck.lever@oracle.com>
> > > Gesendet: Donnerstag, 21. April 2022 16:58
> > > An: Andreas Nagy <crispyduck@outlook.at>
> > > Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
> > > Betreff: Re: Problems with NFS4.1 on ESXi
> > >
> > > Hi Andreas-
> > >
> > > > On Apr 21, 2022, at 12:55 AM, Andreas Nagy <crispyduck@outlook.at> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I hope this mailing list is the right place to discuss some problems with nfs4.1.
> > >
> > > Well, yes and no. This is an upstream developer mailing list,
> > > not really for user support.
> > >
> > > You seem to be asking about products that are currently supported,
> > > and I'm not sure if the Debian kernel is stock upstream 5.13 or
> > > something else. ZFS is not an upstream Linux filesystem and the
> > > ESXi NFS client is something we have little to no experience with.
> > >
> > > I recommend contacting the support desk for your products. If
> > > they find a specific problem with the Linux NFS server's
> > > implementation of the NFSv4.1 protocol, then come back here.
> > >
> > >
> > > > Switching from FreeBSD host as NFS server to a Proxmox environment also serving NFS I see some strange issues in combination with VMWare ESXi.
> > > >
> > > > After first thinking it works fine, I started to realize that there are problems with ESXi datastores on NFS4.1 when trying to import VMs (OVF).
> > > >
> > > > Importing ESXi OVF VM Templates fails nearly every time with a ESXi error message "postNFCData failed: Not Found". With NFS3 it is working fine.
> > > >
> > > > NFS server is running on a Proxmox host:
> > > >
> > > >  root@sepp-sto-01:~# hostnamectl
> > > >  Static hostname: sepp-sto-01
> > > >  Icon name: computer-server
> > > >  Chassis: server
> > > >  Machine ID: 028da2386e514db19a3793d876fadf12
> > > >  Boot ID: c5130c8524c64bc38994f6cdd170d9fd
> > > >  Operating System: Debian GNU/Linux 11 (bullseye)
> > > >  Kernel: Linux 5.13.19-4-pve
> > > >  Architecture: x86-64
> > > >
> > > >
> > > > File system is ZFS, but also tried it with others and it is the same behaivour.
> > > >
> > > >
> > > > ESXi version 7.2U3
> > > >
> > > > ESXi vmkernel.log:
> > > > 2022-04-19T17:46:38.933Z cpu0:262261)cswitch: L2Sec_EnforcePortCompliance:209: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]client vmk1 requested promiscuous mode on port 0x4000010, disallowed by vswitch policy
> > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)World: 12075: VC opID esxui-d6ab-f678 maps to vmkernel opID 936118c3
> > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)WARNING: NFS41: NFS41FileDoCloseFile:3128: file handle close on obj 0x4303fce02850 failed: Stale file handle
> > > > 2022-04-19T17:46:40.897Z cpu10:266351 opID=936118c3)WARNING: NFS41: NFS41FileOpCloseFile:3718: NFS41FileCloseFile failed: Stale file handle
> > > > 2022-04-19T17:46:41.164Z cpu4:266351 opID=936118c3)WARNING: NFS41: NFS41FileDoCloseFile:3128: file handle close on obj 0x4303fcdaa000 failed: Stale file handle
> > > > 2022-04-19T17:46:41.164Z cpu4:266351 opID=936118c3)WARNING: NFS41: NFS41FileOpCloseFile:3718: NFS41FileCloseFile failed: Stale file handle
> > > > 2022-04-19T17:47:25.166Z cpu18:262376)ScsiVmas: 1074: Inquiry for VPD page 00 to device mpx.vmhba32:C0:T0:L0 failed with error Not supported
> > > > 2022-04-19T17:47:25.167Z cpu18:262375)StorageDevice: 7059: End path evaluation for device mpx.vmhba32:C0:T0:L0
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)World: 12075: VC opID esxui-6787-f694 maps to vmkernel opID 9529ace7
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2465: Evicting VM with path:/vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: 209: Creating crypto hash
> > > > 2022-04-19T17:47:30.645Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2479: Could not find MemXferFS region for /vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2465: Evicting VM with path:/vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: 209: Creating crypto hash
> > > > 2022-04-19T17:47:30.693Z cpu4:264565 opID=9529ace7)VmMemXfer: vm 264565: 2479: Could not find MemXferFS region for /vmfs/volumes/9f10677f-697882ed-0000-000000000000/test-ovf/test-ovf.vmx
> > > >
> > > > tcpdump taken on the esxi with filter on the nfs server ip is attached here:
> > > > https://easyupload.io/xvtpt1
> > > >
> > > > I tried to analyze, but have no idea what exactly the problem is. Maybe it is some issue with the VMWare implementation?
> > > > Would be nice if someone with better NFS knowledge could have a look on the traces.
> > > >
> > > > Best regards,
> > > > cd
> > >
> > > --
> > > Chuck Lever
> > >

  reply	other threads:[~2022-04-22 18:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <AM9P191MB1665484E1EFD2088D22C2E2F8EF59@AM9P191MB1665.EURP191.PROD.OUTLOOK.COM>
2022-04-21  4:55 ` Problems with NFS4.1 on ESXi Andreas Nagy
2022-04-21 14:58   ` Chuck Lever III
2022-04-21 15:30     ` AW: " crispyduck
2022-04-21 16:40       ` J. Bruce Fields
     [not found]         ` <AM9P191MB16654F5B7541CD1E489D75608EF49@AM9P191MB1665.EURP191.PROD.OUTLOOK.COM>
2022-04-21 18:41           ` crispyduck
2022-04-21 18:54         ` J. Bruce Fields
2022-04-21 23:52           ` Rick Macklem
2022-04-21 23:58             ` Rick Macklem
2022-04-22 14:29             ` Chuck Lever III
2022-04-22 14:59               ` AW: " crispyduck
2022-04-22 15:02                 ` Chuck Lever III
2022-04-22 22:58               ` Rick Macklem
2022-04-22 15:15             ` J. Bruce Fields
2022-04-22 18:43               ` crispyduck [this message]
2022-04-22 23:03               ` Rick Macklem
2022-04-24 15:07                 ` J. Bruce Fields
2022-04-24 20:36                   ` Rick Macklem
2022-04-24 20:39                     ` Rick Macklem
2022-04-25  9:00                       ` AW: " crispyduck
2022-04-27  6:08                         ` crispyduck
2022-05-05  5:31                           ` andreas-nagy
2022-05-05 14:19                             ` Rick Macklem
2022-05-05 16:38                             ` Chuck Lever III
2022-05-07  1:53                               ` Chuck Lever III
2022-04-22 14:23           ` Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM9P191MB1665D6D3AF49AF0DFCAD31698EF79@AM9P191MB1665.EURP191.PROD.OUTLOOK.COM \
    --to=crispyduck@outlook.at \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rmacklem@uoguelph.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).