linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: hedrick@rutgers.edu
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Benjamin Coddington <bcodding@redhat.com>,
	Patrick Goetz <pgoetz@math.utexas.edu>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: safe versions of NFS
Date: Wed, 14 Apr 2021 10:24:59 -0400	[thread overview]
Message-ID: <1EC571D9-9B20-4D82-803E-7865AD9CFC86@rutgers.edu> (raw)
In-Reply-To: <D511E83E-1C9D-49C7-BEE4-A3E96009B3B6@oracle.com>

sure. We’re hoping to move to a new enough kernel that it’s OK. I wasn’t expecting this list to fix problems, but I thought there might be information in the community about the status in various kernel releases, to guide us in what we need to do. If not, we’ll do our own testing.

After looking at the features and our usage pattern, I’m not interested in turning off delegation. If 4.0 doesn’t have problems (and so far it seems that it doesn’t), I think 4.0 with delegations is going to work better for us than 4.1 without delegations. 

From a practical point of view getting a problem fixed is really difficult with intermittends, and the lead times for fixes to show up is in years. I think you’ll find that most sysadmins would rather find a workaround than fix the problem. I often don’t have vendor support, in part because when I do, difficult problems turn into an infinite discussion that almost never terminates with a fix. 

> On Apr 14, 2021, at 10:15 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> Good morning Charles,
> 
>> On Apr 13, 2021, at 3:40 PM, hedrick@rutgers.edu wrote:
>> 
>> This is from Centos 7.9, with all file systems mounted via NFS 4.0 (in theory — there may be 4.2 that were unmounted —lazy):
> 
> There isn't much linux-nfs@ can do about problems in distributor
> kernels, and it's very possible the issue is already fixed in a
> more recent kernel. You did suggest that v5.8 seemed more solid,
> for instance, and my 30-second Googling suggests that Ubuntu 18
> is based on 4.15, which is ancient history for us upstream code
> monkeys.
> 
> I recommend working with your Linux distributor to start root
> cause analysis and to let them document the issue properly. If
> they find that the problem is not already addressed in a newer
> kernel, then bring it back here to linux-nfs@.
> 
> 
> Sidebar: I find it interesting that the "just disable delegation"
> advice is still floating around the interwebs. And unfortunately
> it is probably still effective in some cases. The community
> should make an effort to nail those down and squash them, IMO.
> 
> 
>> 463 106.786384244 172.17.141.150 -> 172.17.11.218 TCP 66 nlogin > nfs [ACK] Seq=39641 Ack=46777 Win=24576 Len=0 TSval=520277320 TSecr=1478982393
>> 464 108.000270192 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xe696b554, [Check: RD LU MD XT DL]
>> 465 108.000361904 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xd61aa475, [Check: RD LU MD XT DL]
>> 466 108.000476711 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 464) ACCESS, [Allowed: RD LU MD XT DL]
>> 467 108.000495290 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=9189 Ack=8761 Win=16605 Len=0 TSval=520278534 TSecr=1478983647
>> 468 108.000591598 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 465) ACCESS, [Allowed: RD LU MD XT DL]
>> 469 108.000608160 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=9189 Ack=8917 Win=16605 Len=0 TSval=520278534 TSecr=1478983647
>> 470 118.952127064 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xef7d152e, [Check: RD LU MD XT DL]
>> 471 118.952356881 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 470) ACCESS, [Allowed: RD LU MD XT DL]
>> 472 118.952372768 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=9361 Ack=9073 Win=16605 Len=0 TSval=520289486 TSecr=1478994599
>> 473 119.999835420 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0x94a968d5, [Check: RD LU MD XT DL]
>> 474 120.000067817 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 473) ACCESS, [Allowed: RD LU MD XT DL]
>> 475 120.000082882 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=9533 Ack=9229 Win=16605 Len=0 TSval=520290533 TSecr=1478995646
>> 476 140.000587688 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xe696b554, [Check: RD LU MD XT DL]
>> 477 140.000688677 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xd61aa475, [Check: RD LU MD XT DL]
>> 478 140.000746915 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 476) ACCESS, [Allowed: RD LU MD XT DL]
>> 479 140.000759241 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=9877 Ack=9385 Win=16605 Len=0 TSval=520310534 TSecr=1479015647
>> 480 140.000830146 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 477) ACCESS, [Allowed: RD LU MD XT DL]
>> 481 140.000836443 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=9877 Ack=9541 Win=16605 Len=0 TSval=520310534 TSecr=1479015647
>> 482 148.442466129 172.17.141.150 -> 172.17.11.218 NFS 226 V4 Call RENEW CID: 0x04da
>> 483 148.442650203 172.17.11.218 -> 172.17.141.150 NFS 182 V4 Reply (Call In 482) RENEW
>> 484 148.442664846 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=10037 Ack=9657 Win=16605 Len=0 TSval=520318976 TSecr=1479024089
>> 485 149.953317362 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xef7d152e, [Check: RD LU MD XT DL]
>> 486 149.953550872 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 485) ACCESS, [Allowed: RD LU MD XT DL]
>> 487 149.953565993 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=10209 Ack=9813 Win=16605 Len=0 TSval=520320487 TSecr=1479025600
>> 488 162.000571296 172.17.141.150 -> 172.17.11.218 NFS 226 V4 Call ACCESS FH: 0xcd1903be, [Check: RD LU MD XT DL]
>> 489 162.000794395 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 488) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
>> 490 162.000825598 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=10369 Ack=9969 Win=16605 Len=0 TSval=520332534 TSecr=1479037647
>> 491 162.000998283 172.17.141.150 -> 172.17.11.218 NFS 238 V4 Call ACCESS FH: 0xeabd4697, [Check: RD LU MD XT DL]
>> 492 162.001218772 172.17.11.218 -> 172.17.141.150 NFS 222 V4 Reply (Call In 491) ACCESS, [Allowed: RD LU MD XT DL]
>> 493 162.040415201 172.17.141.150 -> 172.17.11.218 TCP 66 rndc > nfs [ACK] Seq=10541 Ack=10125 Win=16605 Len=0 TSval=520332574 TSecr=1479037647
>> 494 166.874398617 172.17.141.150 -> 172.17.11.218 TCP 66 [TCP Keep-Alive] nlogin > nfs [ACK] Seq=39640 Ack=46777 Win=24576 Len=0 TSval=520337408 TSecr=1478982393
>> 495 166.874438892 172.17.141.150 -> 172.17.11.218 NFS 250 V4 Call SEQUENCE
>> 496 166.874506845 172.17.11.218 -> 172.17.141.150 TCP 66 [TCP Dup ACK 462#1] nfs > nlogin [ACK] Seq=46777 Ack=39641 Win=24559 Len=0 TSval=1479042521 TSecr=520277320
>> 497 166.874720218 172.17.11.218 -> 172.17.141.150 NFS 218 V4 Reply (Call In 495) SEQUENCE
>> 498 166.874730215 172.17.141.150 -> 172.17.11.218 TCP 66 nlogin > nfs [ACK] Seq=39825 Ack=46929 Win=24575 Len=0 TSval=520337408 TSecr=1479042521
>> 499 166.874987010 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 500 166.875172744 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 499) TEST_STATEID
>> 501 166.875309487 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x534735df/
>> 502 166.875655661 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 501) OPEN StateID: 0xd7ae
>> 503 166.875801366 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 504 166.876042044 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 503) TEST_STATEID
>> 505 166.876210946 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xdabfc399/
>> 506 166.876485761 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 505) OPEN StateID: 0x9578
>> 507 166.876607463 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xdabfc399/
>> 508 166.876820365 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 507) OPEN StateID: 0xfa83
>> 509 166.876941430 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 510 166.877123487 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 509) TEST_STATEID
>> 511 166.877205876 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x968ca393/
>> 512 166.877464268 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 511) OPEN StateID: 0x25d5
>> 513 166.877600104 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 514 166.877841822 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 513) TEST_STATEID
>> 515 166.877997847 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xde4187cf/
>> 516 166.878265626 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 515) OPEN StateID: 0xd5ce
>> 517 166.878393548 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 518 166.878603997 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 517) TEST_STATEID
>> 519 166.878692334 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x9f6703e9/
>> 520 166.878920958 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 519) OPEN StateID: 0x69a1
>> 521 166.878964818 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 522 166.879156141 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 521) TEST_STATEID
>> 523 166.879195140 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xe5c84183/
>> 524 166.879435831 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 523) OPEN StateID: 0x6069
>> 525 166.879518910 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 526 166.879709592 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 525) TEST_STATEID
>> 527 166.879796731 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x927973ae/
>> 528 166.880024682 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 527) OPEN StateID: 0xf420
>> 529 166.880070944 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x927973ae/
>> 530 166.880265884 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 529) OPEN StateID: 0xec63
>> 531 166.880301034 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 532 166.880511051 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 531) TEST_STATEID
>> 533 166.880575938 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x05761d23/
>> 534 166.880798417 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 533) OPEN StateID: 0xb199
>> 535 166.880840801 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 536 166.881008021 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 535) TEST_STATEID
>> 537 166.881043797 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xb6b205f7/
>> 538 166.881270127 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 537) OPEN StateID: 0x49df
>> 539 166.881304710 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xb6b205f7/
>> 540 166.881498628 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 539) OPEN StateID: 0xf0d5
>> 541 166.881545126 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 542 166.881732646 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 541) TEST_STATEID
>> 543 166.881775578 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x0183cd1e/
>> 544 166.881978864 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 543) OPEN StateID: 0x546d
>> 545 166.882021595 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 546 166.882209030 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 545) TEST_STATEID
>> 547 166.882252306 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x673ef4c3/
>> 548 166.882484514 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 547) OPEN StateID: 0xa46d
>> 549 166.882523043 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x673ef4c3/
>> 550 166.882710061 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 549) OPEN StateID: 0xaa9a
>> 551 166.882750420 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 552 166.882933338 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 551) TEST_STATEID
>> 553 166.882961488 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x3699764a/
>> 554 166.883192776 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 553) OPEN StateID: 0x3c37
>> 555 166.883223581 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 556 166.883407176 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 555) TEST_STATEID
>> 557 166.883468198 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x94fd1187/
>> 558 166.883679012 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 557) OPEN StateID: 0xbedf
>> 559 166.883719911 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 560 166.883910224 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 559) TEST_STATEID
>> 561 166.883937791 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xcd96c73b/
>> 562 166.884165115 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 561) OPEN StateID: 0xbf6c
>> 563 166.884194351 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 564 166.884378426 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 563) TEST_STATEID
>> 565 166.884433848 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xd24bb22d/
>> 566 166.884661584 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 565) OPEN StateID: 0xaaf5
>> 567 166.884719445 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 568 166.884904098 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 567) TEST_STATEID
>> 569 166.884952255 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x5e594598/
>> 570 166.885154240 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 569) OPEN StateID: 0x11cb
>> 571 166.885206342 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 572 166.885389478 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 571) TEST_STATEID
>> 573 166.885454506 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xa4fd80c1/
>> 574 166.885686638 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 573) OPEN StateID: 0x9363
>> 575 166.885745762 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 576 166.885933232 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 575) TEST_STATEID
>> 577 166.885980692 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x80e4743a/
>> 578 166.886212637 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 577) OPEN StateID: 0x63af
>> 579 166.886272585 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 580 166.886457823 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 579) TEST_STATEID
>> 581 166.886514332 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xb3e2d284/
>> 582 166.886742488 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 581) OPEN StateID: 0x52d9
>> 583 166.886803826 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 584 166.886989516 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 583) TEST_STATEID
>> 585 166.887100351 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x95137efa/
>> 586 166.887304377 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 585) OPEN StateID: 0xb26e
>> 587 166.887395696 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 588 166.887587544 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 587) TEST_STATEID
>> 589 166.887703373 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x1d70394e/
>> 590 166.887919160 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 589) OPEN StateID: 0x753c
>> 591 166.887999278 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 592 166.888190709 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 591) TEST_STATEID
>> 593 166.888298867 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x680da7ff/
>> 594 166.888530666 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 593) OPEN StateID: 0x097c
>> 595 166.888605330 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 596 166.888795331 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 595) TEST_STATEID
>> 597 166.888902566 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x0a52a987/
>> 598 166.889113308 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 597) OPEN StateID: 0x7781
>> 599 166.889162100 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 600 166.889353078 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 599) TEST_STATEID
>> 601 166.889461157 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x56462642/
>> 602 166.889699242 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 601) OPEN StateID: 0xb209
>> 603 166.889772552 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 604 166.889963172 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 603) TEST_STATEID
>> 605 166.890070241 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x850c0567/
>> 606 166.890272350 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 605) OPEN StateID: 0x8134
>> 607 166.890335412 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 608 166.890542793 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 607) TEST_STATEID
>> 609 166.890650208 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x31aa2390/
>> 610 166.890886053 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 609) OPEN StateID: 0x1f30
>> 611 166.890959992 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 612 166.891158969 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 611) TEST_STATEID
>> 613 166.891264349 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x2ae7f451/
>> 614 166.891516937 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 613) OPEN StateID: 0x3fce
>> 615 166.891591778 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 616 166.891788024 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 615) TEST_STATEID
>> 617 166.891902329 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xa5cb13d3/
>> 618 166.892117369 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 617) OPEN StateID: 0x03d6
>> 619 166.892175933 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 620 166.892398086 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 619) TEST_STATEID
>> 621 166.892447518 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x3f5cfcdb/
>> 622 166.892671544 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 621) OPEN StateID: 0x8bff
>> 623 166.892716533 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 624 166.892901971 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 623) TEST_STATEID
>> 625 166.892940753 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x40b4d194/
>> 626 166.893167930 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 625) OPEN StateID: 0xe3e8
>> 627 166.893215973 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 628 166.893398652 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 627) TEST_STATEID
>> 629 166.893445197 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x4643d6fc/
>> 630 166.893682547 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 629) OPEN StateID: 0xe194
>> 631 166.893731829 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 632 166.893915920 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 631) TEST_STATEID
>> 633 166.893960986 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xd246cbd3/
>> 634 166.894191435 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 633) OPEN StateID: 0x22f5
>> 635 166.894240889 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 636 166.894425035 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 635) TEST_STATEID
>> 637 166.894470784 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xde40a6e1/
>> 638 166.894695618 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 637) OPEN StateID: 0x9ce2
>> 639 166.894744788 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 640 166.894929372 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 639) TEST_STATEID
>> 641 166.894967975 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x0f26a4dc/
>> 642 166.895197152 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 641) OPEN StateID: 0x879c
>> 643 166.895245038 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 644 166.895429467 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 643) TEST_STATEID
>> 645 166.895471234 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x82162062/
>> 646 166.895694775 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 645) OPEN StateID: 0xab68
>> 647 166.895744208 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 648 166.895929221 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 647) TEST_STATEID
>> 649 166.895973162 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0xb8b3b57f/
>> 650 166.896197535 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 649) OPEN StateID: 0xfb0b
>> 651 166.896245960 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 652 166.896430271 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 651) TEST_STATEID
>> 653 166.896475752 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x2e6c2b31/
>> 654 166.896705501 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 653) OPEN StateID: 0x8e00
>> 655 166.896754419 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 656 166.896939911 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 655) TEST_STATEID
>> 657 166.896983440 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x547cf5ea/
>> 658 166.897209526 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 657) OPEN StateID: 0x0532
>> 659 166.897258079 172.17.141.150 -> 172.17.11.218 NFS 274 V4 Call TEST_STATEID
>> 660 166.897443527 172.17.11.218 -> 172.17.141.150 NFS 234 V4 Reply (Call In 659) TEST_STATEID
>> 661 166.897484081 172.17.141.150 -> 172.17.11.218 NFS 334 V4 Call OPEN DH: 0x33c176ce/
>> 662 166.897693578 172.17.11.218 -> 172.17.141.150 NFS 454 V4 Reply (Call In 661) OPEN StateID: 0x3648
>> 663 166.937386876 172.17.141.150 -> 172.17.11.218 TCP 66 nlogin > nfs [ACK] Seq=59461 Ack=70165 Win=24576 Len=0 TSval=520337471 TSecr=1479042544
>> ^C663 packets captured
>> [hedrick@camaro ~]$ 
>> 
>> Here’s the corresponding section of /var/log/messages
>> 
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: nfs4_reclaim_open_state: 1 callbacks suppressed
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> Apr 13 15:36:05 camaro.lcsr.rutgers.edu kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>> 
>> 
>> 
>> 
>>> On Apr 13, 2021, at 1:24 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Apr 13, 2021, at 12:23 PM, Benjamin Coddington <bcodding@redhat.com> wrote:
>>>> 
>>>> (resending this as it bounced off the list - I accidentally embedded HTML)
>>>> 
>>>> Yes, if you're pretty sure your hostnames are all different, the client_ids
>>>> should be different.  For v4.0 you can turn on debugging (rpcdebug -m nfs -s
>>>> proc) and see the client_id in the kernel log in lines that look like: "NFS
>>>> call setclientid auth=%s, '%s'\n", which will happen at mount time, but it
>>>> doesn't look like we have any debugging for v4.1 and v4.2 for EXCHANGE_ID.
>>>> 
>>>> You can extract it via the crash utility, or via systemtap, or by doing a
>>>> wire capture, but nothing that's easily translated to running across a large
>>>> number of machines.  There's probably other ways, perhaps we should tack
>>>> that string into the tracepoints for exchange_id and setclientid.
>>>> 
>>>> If you're interested in troubleshooting, wire capture's usually the most
>>>> informative.  If the lockup events all happen at the same time, there
>>>> might be some network event that is triggering the issue.
>>>> 
>>>> You should expect NFSv4.1 to be rock-solid.  Its rare we have reports
>>>> that it isn't, and I'd love to know why you're having these problems.
>>> 
>>> I echo that: NFSv4.1 protocol and implementation are mature, so if
>>> there are operational problems, it should be root-caused.
>>> 
>>> NFSv4.1 uses a uniform client ID. That should be the "good" one,
>>> not the NFSv4.0 one that has a non-zero probability of collision.
>>> 
>>> Charles, please let us know if there are particular workloads that
>>> trigger the lock reclaim failure. A narrow reproducer would help
>>> get to the root issue quickly.
>>> 
>>> 
>>>> Ben
>>>> 
>>>> On 13 Apr 2021, at 11:38, hedrick@rutgers.edu wrote:
>>>> 
>>>>> The server is ubuntu 20, with a ZFS file system.
>>>>> 
>>>>> I don’t set the unique ID. Documentation claims that it is set from the hostname. They will surely be unique, or the whole world would blow up. How can I check the actual unique ID being used? The kernel reports a blank one, but I think that just means to use the hostname. We could obviously set a unique one if that would be useful.
>>>>> 
>>>>>> On Apr 13, 2021, at 11:35 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
>>>>>> 
>>>>>> It would be interesting to know why your clients are failing to reclaim their locks.  Something is misconfigured.  What server are you using, and is there anything fancy on the server-side (like HA)?  Is it possible that you have clients with the same nfs4_unique_id?
>>>>>> 
>>>>>> Ben
>>>>>> 
>>>>>> On 13 Apr 2021, at 11:17, hedrick@rutgers.edu wrote:
>>>>>> 
>>>>>>> many, though not all, of the problems are “lock reclaim failed”.
>>>>>>> 
>>>>>>>> On Apr 13, 2021, at 10:52 AM, Patrick Goetz <pgoetz@math.utexas.edu> wrote:
>>>>>>>> 
>>>>>>>> I use NFS 4.2 with Ubuntu 18/20 workstations and Ubuntu 18/20 servers and haven't had any problems.
>>>>>>>> 
>>>>>>>> Check your configuration files; the last time I experienced something like this it's because I inadvertently used the same fsid on two different exports. Also recommend exporting top level directories only.  Bind mount everything you want to export into /srv/nfs and only export those directories. According to Bruce F. this doesn't buy you any security (I still don't understand why), but it makes for a cleaner system configuration.
>>>>>>>> 
>>>>>>>> On 4/13/21 9:33 AM, hedrick@rutgers.edu wrote:
>>>>>>>>> I am in charge of a large computer science dept computing infrastructure. We have a variety of student and develo9pment users. If there are problems we’ll see them.
>>>>>>>>> We use an Ubuntu 20 server, with NVMe storage.
>>>>>>>>> I’ve just had to move Centos 7 and Ubuntu 18 to use NFS 4.0. We had hangs with NFS 4.1 and 4.2. Files would appear to be locked, although eventually the lock would time out. It’s too soon to be sure that moving back to NFS 4.0 will fix it. Next is either NFS 3 or disabling delegations on the server.
>>>>>>>>> Are there known versions of NFS that are safe to use in production for various kernel versions? The one we’re most interested in is Ubuntu 20, which can be anything from 5.4 to 5.8.
>>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Chuck Lever
>>> 
>>> 
>>> 
>> 
> 
> --
> Chuck Lever
> 
> 
> 


      reply	other threads:[~2021-04-14 14:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 14:33 hedrick
2021-04-13 14:52 ` Patrick Goetz
2021-04-13 15:17   ` hedrick
2021-04-13 15:35     ` Benjamin Coddington
     [not found]       ` <22DE8966-253D-49A7-936D-F0A0B5246BE6@rutgers.edu>
2021-04-13 16:23         ` Benjamin Coddington
2021-04-13 17:24           ` Chuck Lever III
2021-04-13 17:48             ` hedrick
2021-04-13 18:20             ` hedrick
2021-04-13 19:40             ` hedrick
2021-04-13 19:48               ` hedrick
2021-10-05 19:46                 ` more problems with NFS. sort of repeatable problem with vmplayer Charles Hedrick
2021-10-11 14:45                   ` J. Bruce Fields
2021-10-11 23:44                   ` NeilBrown
2021-10-14 22:55                     ` Charles Hedrick
2021-04-14 14:15               ` safe versions of NFS Chuck Lever III
2021-04-14 14:24                 ` hedrick [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1EC571D9-9B20-4D82-803E-7865AD9CFC86@rutgers.edu \
    --to=hedrick@rutgers.edu \
    --cc=bcodding@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=pgoetz@math.utexas.edu \
    --subject='Re: safe versions of NFS' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).