linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* general protection fault, probably for non-canonical address in nfsd
@ 2020-06-07 15:32 Hans-Peter Jansen
  2020-06-07 16:01 ` Anthony Joseph Messina
  2020-06-08 19:27 ` Chuck Lever
  0 siblings, 2 replies; 7+ messages in thread
From: Hans-Peter Jansen @ 2020-06-07 15:32 UTC (permalink / raw)
  To: linux-nfs

Hi,

after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular 
crashes of nfsd here:

2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated mount request from 192.168.3.16:303 for /work (/work)
2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated mount request from 192.168.3.16:304 for /work/vmware (/work)
2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated mount request from 192.168.3.16:305 for /work/vSphere (/work)
2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general protection fault, probably for non-canonical address 0xb9159d506ba40000: 0000 [#1] SMP PTI
2020-06-07T01:32:43.606284+02:00 server kernel: [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O      5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware name: System manufacturer System Product Name/P7F-E, BIOS 0906    09/20/2010
2020-06-07T01:32:43.606287+02:00 server kernel: [51901.089247] RIP: 0010:cgroup_sk_free+0x26/0x80
2020-06-07T01:32:43.606288+02:00 server kernel: [51901.089257] Code: 00 00 00 00 66 66 66 66 90 53 48 8b 07 48 c7 c3 30 72 07 b6 a8 01 75 07 48 85 c0 48 0f 45 d8 48 8b 83 18 09 00 00 a8 03 
75 1a <65> 48 ff 08 f6 43 7c 01 74 02 5b c3 48 8b 43 18 a8 03 75 26 65 48
2020-06-07T01:32:43.606290+02:00 server kernel: [51901.089276] RSP: 0018:ffffb248c21e7e10 EFLAGS: 00010246
2020-06-07T01:32:43.606291+02:00 server kernel: [51901.089280] RAX: b91603a504000000 RBX: ffff99ab141a0000 RCX: 0000000000000021
2020-06-07T01:32:43.606292+02:00 server kernel: [51901.089284] RDX: ffffffffb6135ec4 RSI: 0000000000010080 RDI: ffff99a7159c1490
2020-06-07T01:32:43.606293+02:00 server kernel: [51901.089287] RBP: ffff99a7159c1200 R08: ffff99ab67a60c60 R09: 000000000002eb00
2020-06-07T01:32:43.606294+02:00 server kernel: [51901.089291] R10: ffffb248c0087dc0 R11: 00000000000000c6 R12: 0000000000000000
2020-06-07T01:32:43.606295+02:00 server kernel: [51901.089294] R13: 0000000000000103 R14: ffff99aae4934238 R15: ffff99ab31902000
2020-06-07T01:32:43.606296+02:00 server kernel: [51901.089299] FS:  0000000000000000(0000) GS:ffff99ab67a40000(0000) knlGS:0000000000000000
2020-06-07T01:32:43.606297+02:00 server kernel: [51901.089303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2020-06-07T01:32:43.606303+02:00 server kernel: [51901.089305] CR2: 00000000008e0000 CR3: 00000004df60a000 CR4: 00000000000026e0
2020-06-07T01:32:43.606304+02:00 server kernel: [51901.089307] Call Trace:
2020-06-07T01:32:43.606305+02:00 server kernel: [51901.089315]  __sk_destruct+0x10d/0x1d0
2020-06-07T01:32:43.606306+02:00 server kernel: [51901.089319]  inet_release+0x34/0x60
2020-06-07T01:32:43.606307+02:00 server kernel: [51901.089325]  __sock_release+0x81/0xb0
2020-06-07T01:32:43.606308+02:00 server kernel: [51901.089358]  svc_sock_free+0x38/0x60 [sunrpc]
2020-06-07T01:32:43.606308+02:00 server kernel: [51901.089374]  svc_xprt_put+0x99/0xe0 [sunrpc]
2020-06-07T01:32:43.606310+02:00 server kernel: [51901.089389]  svc_recv+0x9c0/0xa40 [sunrpc]
2020-06-07T01:32:43.606310+02:00 server kernel: [51901.089410]  ? nfsd_destroy+0x60/0x60 [nfsd]
2020-06-07T01:32:43.606311+02:00 server kernel: [51901.089417]  nfsd+0xd1/0x150 [nfsd]
2020-06-07T01:32:43.606312+02:00 server kernel: [51901.089420]  kthread+0x10d/0x130
2020-06-07T01:32:43.606313+02:00 server kernel: [51901.089423]  ? kthread_park+0x90/0x90
2020-06-07T01:32:43.606314+02:00 server kernel: [51901.089426]  ret_from_fork+0x35/0x40

A vSphere 5.5 host accesses this linux server with nfs v3 for backup
purposes (a Veeam backup server want to store a new backup here). 

The kernel is tainted due to vboxdrv. The OS is openSUSE Leap 15.1,
with the kernel and Virtualbox replaced with uptodate versions from 
proper rpm packages (built on that very vSphere host in a OBS server
VM..).

I used to be subscribed to this ML, but that subscription has been 
lost 04/09, thus I cannot reply properly to the general prot. fault
thread, started 05/12 from syzbot with Bruce looking into it.

It seems somewhat related.

Interestingly, we're using a couple of NFS v4 mounts for subsets of
home here, and mount /work and other shares from various 
Tumbleweed systems with NFS v4 here without any undesired effects. 

Since the kernel upgrade, every time, this Veeam thing triggers these 
v3 mounts, the crash happens. I've disabled this backup target for now 
until the problem is resolved, because it effectively prevents further 
nfs accesses to this server, and blocks our desktops until the server 
is rebooted.

A cursory look into 5.6.{15,16} changelogs seems to imply, that this
issue is still pending. 

Let me know, if I can provide any further info's.

Thanks,
Pete



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault, probably for non-canonical address in nfsd
  2020-06-07 15:32 general protection fault, probably for non-canonical address in nfsd Hans-Peter Jansen
@ 2020-06-07 16:01 ` Anthony Joseph Messina
  2020-06-07 17:44   ` Hans-Peter Jansen
  2020-06-08 19:27 ` Chuck Lever
  1 sibling, 1 reply; 7+ messages in thread
From: Anthony Joseph Messina @ 2020-06-07 16:01 UTC (permalink / raw)
  To: linux-nfs, Hans-Peter Jansen

[-- Attachment #1: Type: text/plain, Size: 5290 bytes --]

On Sunday, June 7, 2020 10:32:44 AM CDT Hans-Peter Jansen wrote:
> Hi,
> 
> after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular
> crashes of nfsd here:
> 
> 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated
> mount request from 192.168.3.16:303 for /work (/work)
> 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated
> mount request from 192.168.3.16:304 for /work/vmware (/work)
> 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated
> mount request from 192.168.3.16:305 for /work/vSphere (/work)
> 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general
> protection fault, probably for non-canonical address 0xb9159d506ba40000:
> 0000 [#1] SMP PTI 2020-06-07T01:32:43.606284+02:00 server kernel:
> [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O     
> 5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
> 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware
> name: System manufacturer System Product Name/P7F-E, BIOS 0906   
> 09/20/2010 2020-06-07T01:32:43.606287+02:00 server kernel: [51901.089247]
> RIP: 0010:cgroup_sk_free+0x26/0x80 2020-06-07T01:32:43.606288+02:00 server
> kernel: [51901.089257] Code: 00 00 00 00 66 66 66 66 90 53 48 8b 07 48 c7
> c3 30 72 07 b6 a8 01 75 07 48 85 c0 48 0f 45 d8 48 8b 83 18 09 00 00 a8 03
> 75 1a <65> 48 ff 08 f6 43 7c 01 74 02 5b c3 48 8b 43 18 a8 03 75 26 65 48
> 2020-06-07T01:32:43.606290+02:00 server kernel: [51901.089276] RSP:
> 0018:ffffb248c21e7e10 EFLAGS: 00010246 2020-06-07T01:32:43.606291+02:00
> server kernel: [51901.089280] RAX: b91603a504000000 RBX: ffff99ab141a0000
> RCX: 0000000000000021 2020-06-07T01:32:43.606292+02:00 server kernel:
> [51901.089284] RDX: ffffffffb6135ec4 RSI: 0000000000010080 RDI:
> ffff99a7159c1490 2020-06-07T01:32:43.606293+02:00 server kernel:
> [51901.089287] RBP: ffff99a7159c1200 R08: ffff99ab67a60c60 R09:
> 000000000002eb00 2020-06-07T01:32:43.606294+02:00 server kernel:
> [51901.089291] R10: ffffb248c0087dc0 R11: 00000000000000c6 R12:
> 0000000000000000 2020-06-07T01:32:43.606295+02:00 server kernel:
> [51901.089294] R13: 0000000000000103 R14: ffff99aae4934238 R15:
> ffff99ab31902000 2020-06-07T01:32:43.606296+02:00 server kernel:
> [51901.089299] FS:  0000000000000000(0000) GS:ffff99ab67a40000(0000)
> knlGS:0000000000000000 2020-06-07T01:32:43.606297+02:00 server kernel:
> [51901.089303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2020-06-07T01:32:43.606303+02:00 server kernel: [51901.089305] CR2:
> 00000000008e0000 CR3: 00000004df60a000 CR4: 00000000000026e0
> 2020-06-07T01:32:43.606304+02:00 server kernel: [51901.089307] Call Trace:
> 2020-06-07T01:32:43.606305+02:00 server kernel: [51901.089315] 
> __sk_destruct+0x10d/0x1d0 2020-06-07T01:32:43.606306+02:00 server kernel:
> [51901.089319]  inet_release+0x34/0x60 2020-06-07T01:32:43.606307+02:00
> server kernel: [51901.089325]  __sock_release+0x81/0xb0
> 2020-06-07T01:32:43.606308+02:00 server kernel: [51901.089358] 
> svc_sock_free+0x38/0x60 [sunrpc] 2020-06-07T01:32:43.606308+02:00 server
> kernel: [51901.089374]  svc_xprt_put+0x99/0xe0 [sunrpc]
> 2020-06-07T01:32:43.606310+02:00 server kernel: [51901.089389] 
> svc_recv+0x9c0/0xa40 [sunrpc] 2020-06-07T01:32:43.606310+02:00 server
> kernel: [51901.089410]  ? nfsd_destroy+0x60/0x60 [nfsd]
> 2020-06-07T01:32:43.606311+02:00 server kernel: [51901.089417] 
> nfsd+0xd1/0x150 [nfsd] 2020-06-07T01:32:43.606312+02:00 server kernel:
> [51901.089420]  kthread+0x10d/0x130 2020-06-07T01:32:43.606313+02:00 server
> kernel: [51901.089423]  ? kthread_park+0x90/0x90
> 2020-06-07T01:32:43.606314+02:00 server kernel: [51901.089426] 
> ret_from_fork+0x35/0x40
> 
> A vSphere 5.5 host accesses this linux server with nfs v3 for backup
> purposes (a Veeam backup server want to store a new backup here).
> 
> The kernel is tainted due to vboxdrv. The OS is openSUSE Leap 15.1,
> with the kernel and Virtualbox replaced with uptodate versions from
> proper rpm packages (built on that very vSphere host in a OBS server
> VM..).
> 
> I used to be subscribed to this ML, but that subscription has been
> lost 04/09, thus I cannot reply properly to the general prot. fault
> thread, started 05/12 from syzbot with Bruce looking into it.
> 
> It seems somewhat related.
> 
> Interestingly, we're using a couple of NFS v4 mounts for subsets of
> home here, and mount /work and other shares from various
> Tumbleweed systems with NFS v4 here without any undesired effects.
> 
> Since the kernel upgrade, every time, this Veeam thing triggers these
> v3 mounts, the crash happens. I've disabled this backup target for now
> until the problem is resolved, because it effectively prevents further
> nfs accesses to this server, and blocks our desktops until the server
> is rebooted.
> 
> A cursory look into 5.6.{15,16} changelogs seems to imply, that this
> issue is still pending.
> 
> Let me know, if I can provide any further info's.
> 
> Thanks,
> Pete

I see similar issues in Fedora kernels 5.6.14 through 5.6.16
https://bugzilla.redhat.com/show_bug.cgi?id=1839287

On the client I mount /home with sec=krb5p, and /mnt/koji with sec=krb5

-- 
Anthony - https://messinet.com
F9B6 560E 68EA 037D 8C3D  D1C9 FF31 3BDB D9D8 99B6

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault, probably for non-canonical address in nfsd
  2020-06-07 16:01 ` Anthony Joseph Messina
@ 2020-06-07 17:44   ` Hans-Peter Jansen
  2020-06-08 15:28     ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Hans-Peter Jansen @ 2020-06-07 17:44 UTC (permalink / raw)
  To: linux-nfs, Anthony Joseph Messina

Am Sonntag, 7. Juni 2020, 18:01:55 CEST schrieb Anthony Joseph Messina:
> On Sunday, June 7, 2020 10:32:44 AM CDT Hans-Peter Jansen wrote:
> > Hi,
> > 
> > after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular
> > crashes of nfsd here:
> > 
> > 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated
> > mount request from 192.168.3.16:303 for /work (/work)
> > 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated
> > mount request from 192.168.3.16:304 for /work/vmware (/work)
> > 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated
> > mount request from 192.168.3.16:305 for /work/vSphere (/work)
> > 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general
> > protection fault, probably for non-canonical address 0xb9159d506ba40000:
> > 0000 [#1] SMP PTI 2020-06-07T01:32:43.606284+02:00 server kernel:
> > [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O
> > 5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
> > 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware
> > name: System manufacturer System Product Name/P7F-E, BIOS 0906
> 
> I see similar issues in Fedora kernels 5.6.14 through 5.6.16
> https://bugzilla.redhat.com/show_bug.cgi?id=1839287
> 
> On the client I mount /home with sec=krb5p, and /mnt/koji with sec=krb5

Thanks for confirmation. 

Apart from the hassle with server reboots, this issue has some DOS potential, 
I'm afraid.

Cheers,
Pete



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault, probably for non-canonical address in nfsd
  2020-06-07 17:44   ` Hans-Peter Jansen
@ 2020-06-08 15:28     ` Chuck Lever
  2020-06-08 17:53       ` Hans-Peter Jansen
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2020-06-08 15:28 UTC (permalink / raw)
  To: Hans-Peter Jansen, Anthony Joseph Messina; +Cc: Linux NFS Mailing List



> On Jun 7, 2020, at 1:44 PM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> 
> Am Sonntag, 7. Juni 2020, 18:01:55 CEST schrieb Anthony Joseph Messina:
>> On Sunday, June 7, 2020 10:32:44 AM CDT Hans-Peter Jansen wrote:
>>> Hi,
>>> 
>>> after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular
>>> crashes of nfsd here:
>>> 
>>> 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated
>>> mount request from 192.168.3.16:303 for /work (/work)
>>> 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated
>>> mount request from 192.168.3.16:304 for /work/vmware (/work)
>>> 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated
>>> mount request from 192.168.3.16:305 for /work/vSphere (/work)
>>> 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general
>>> protection fault, probably for non-canonical address 0xb9159d506ba40000:
>>> 0000 [#1] SMP PTI 2020-06-07T01:32:43.606284+02:00 server kernel:
>>> [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O
>>> 5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
>>> 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware
>>> name: System manufacturer System Product Name/P7F-E, BIOS 0906
>> 
>> I see similar issues in Fedora kernels 5.6.14 through 5.6.16
>> https://bugzilla.redhat.com/show_bug.cgi?id=1839287
>> 
>> On the client I mount /home with sec=krb5p, and /mnt/koji with sec=krb5
> 
> Thanks for confirmation. 
> 
> Apart from the hassle with server reboots, this issue has some DOS potential, 
> I'm afraid.

If you have a reproducer (even a partial one) then bisecting between a
known good kernel and v5.6.14 (or 16) would be helpful.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault, probably for non-canonical address in nfsd
  2020-06-08 15:28     ` Chuck Lever
@ 2020-06-08 17:53       ` Hans-Peter Jansen
  2020-06-08 18:31         ` Anthony Joseph Messina
  0 siblings, 1 reply; 7+ messages in thread
From: Hans-Peter Jansen @ 2020-06-08 17:53 UTC (permalink / raw)
  To: Anthony Joseph Messina, Chuck Lever; +Cc: Linux NFS Mailing List

Am Montag, 8. Juni 2020, 17:28:53 CEST schrieb Chuck Lever:
> > On Jun 7, 2020, at 1:44 PM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> > 
> > Am Sonntag, 7. Juni 2020, 18:01:55 CEST schrieb Anthony Joseph Messina:
> >> On Sunday, June 7, 2020 10:32:44 AM CDT Hans-Peter Jansen wrote:
> >>> Hi,
> >>> 
> >>> after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular
> >>> crashes of nfsd here:
> >>> 
> >>> 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated
> >>> mount request from 192.168.3.16:303 for /work (/work)
> >>> 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated
> >>> mount request from 192.168.3.16:304 for /work/vmware (/work)
> >>> 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated
> >>> mount request from 192.168.3.16:305 for /work/vSphere (/work)
> >>> 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general
> >>> protection fault, probably for non-canonical address 0xb9159d506ba40000:
> >>> 0000 [#1] SMP PTI 2020-06-07T01:32:43.606284+02:00 server kernel:
> >>> [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O
> >>> 5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
> >>> 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware
> >>> name: System manufacturer System Product Name/P7F-E, BIOS 0906
> >> 
> >> I see similar issues in Fedora kernels 5.6.14 through 5.6.16
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1839287
> >> 
> >> On the client I mount /home with sec=krb5p, and /mnt/koji with sec=krb5
> > 
> > Thanks for confirmation.
> > 
> > Apart from the hassle with server reboots, this issue has some DOS
> > potential, I'm afraid.
> 
> If you have a reproducer (even a partial one) then bisecting between a
> known good kernel and v5.6.14 (or 16) would be helpful.

I would love to bisect, but this is my primary production machine, that needs 
to be up as much as possible. Apart from that, I'm about to leave the site for 
a week and been severely time constrained for the next couple of weeks..

Sorry. 

Anthony?
--
Pete



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault, probably for non-canonical address in nfsd
  2020-06-08 17:53       ` Hans-Peter Jansen
@ 2020-06-08 18:31         ` Anthony Joseph Messina
  0 siblings, 0 replies; 7+ messages in thread
From: Anthony Joseph Messina @ 2020-06-08 18:31 UTC (permalink / raw)
  To: Chuck Lever, Hans-Peter Jansen; +Cc: Linux NFS Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 2741 bytes --]

On Monday, June 8, 2020 12:53:26 PM CDT Hans-Peter Jansen wrote:
> Am Montag, 8. Juni 2020, 17:28:53 CEST schrieb Chuck Lever:
> > > On Jun 7, 2020, at 1:44 PM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> > > 
> > > Am Sonntag, 7. Juni 2020, 18:01:55 CEST schrieb Anthony Joseph Messina:
> > >> On Sunday, June 7, 2020 10:32:44 AM CDT Hans-Peter Jansen wrote:
> > >>> Hi,
> > >>> 
> > >>> after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from
> > >>> regular
> > >>> crashes of nfsd here:
> > >>> 
> > >>> 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]:
> > >>> authenticated
> > >>> mount request from 192.168.3.16:303 for /work (/work)
> > >>> 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]:
> > >>> authenticated
> > >>> mount request from 192.168.3.16:304 for /work/vmware (/work)
> > >>> 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]:
> > >>> authenticated
> > >>> mount request from 192.168.3.16:305 for /work/vSphere (/work)
> > >>> 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general
> > >>> protection fault, probably for non-canonical address
> > >>> 0xb9159d506ba40000:
> > >>> 0000 [#1] SMP PTI 2020-06-07T01:32:43.606284+02:00 server kernel:
> > >>> [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O
> > >>> 5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
> > >>> 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234]
> > >>> Hardware
> > >>> name: System manufacturer System Product Name/P7F-E, BIOS 0906
> > >> 
> > >> I see similar issues in Fedora kernels 5.6.14 through 5.6.16
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1839287
> > >> 
> > >> On the client I mount /home with sec=krb5p, and /mnt/koji with sec=krb5
> > > 
> > > Thanks for confirmation.
> > > 
> > > Apart from the hassle with server reboots, this issue has some DOS
> > > potential, I'm afraid.
> > 
> > If you have a reproducer (even a partial one) then bisecting between a
> > known good kernel and v5.6.14 (or 16) would be helpful.
> 
> I would love to bisect, but this is my primary production machine, that
> needs to be up as much as possible. Apart from that, I'm about to leave the
> site for a week and been severely time constrained for the next couple of
> weeks..
> 
> Sorry.
> 
> Anthony?

Unfortunately, this is also my main workstation and I have no experience 
building custom kernels.  The diff in net/sunrpc between v5.6.13 and v5.6.14 
is relatively small, though it may not point to the root issue.  I'm typically 
only able to "follow along" code like this to spot issues, being a nurse, not 
a kernel programmer.

Thank you for your help.  -A

-- 
Anthony - https://messinet.com
F9B6 560E 68EA 037D 8C3D  D1C9 FF31 3BDB D9D8 99B6

[-- Attachment #1.2: net-sunrpc-v5.6.13_v5.6.14.txt --]
[-- Type: text/plain, Size: 17109 bytes --]

 net/sunrpc/auth_gss/auth_gss.c        | 12 ++++------
 net/sunrpc/auth_gss/gss_krb5_crypto.c |  8 +++----
 net/sunrpc/auth_gss/gss_krb5_wrap.c   | 44 +++++++++++++++++++++++------------
 net/sunrpc/auth_gss/gss_mech_switch.c |  3 ++-
 net/sunrpc/auth_gss/svcauth_gss.c     | 10 +++-----
 net/sunrpc/clnt.c                     |  5 ++++
 net/sunrpc/xdr.c                      | 41 ++++++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/backchannel.c     |  2 +-
 net/sunrpc/xprtrdma/frwr_ops.c        | 14 +++++++----
 net/sunrpc/xprtrdma/transport.c       |  2 +-
 net/sunrpc/xprtrdma/verbs.c           | 15 +++++-------
 net/sunrpc/xprtrdma/xprt_rdma.h       |  5 ++--
 12 files changed, 108 insertions(+), 53 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 2dc740acb3bf..a7ad150fd4ee 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -2030,7 +2030,6 @@ gss_unwrap_resp_priv(struct rpc_task *task, struct rpc_cred *cred,
 	struct xdr_buf *rcv_buf = &rqstp->rq_rcv_buf;
 	struct kvec *head = rqstp->rq_rcv_buf.head;
 	struct rpc_auth *auth = cred->cr_auth;
-	unsigned int savedlen = rcv_buf->len;
 	u32 offset, opaque_len, maj_stat;
 	__be32 *p;
 
@@ -2041,9 +2040,9 @@ gss_unwrap_resp_priv(struct rpc_task *task, struct rpc_cred *cred,
 	offset = (u8 *)(p) - (u8 *)head->iov_base;
 	if (offset + opaque_len > rcv_buf->len)
 		goto unwrap_failed;
-	rcv_buf->len = offset + opaque_len;
 
-	maj_stat = gss_unwrap(ctx->gc_gss_ctx, offset, rcv_buf);
+	maj_stat = gss_unwrap(ctx->gc_gss_ctx, offset,
+			      offset + opaque_len, rcv_buf);
 	if (maj_stat == GSS_S_CONTEXT_EXPIRED)
 		clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags);
 	if (maj_stat != GSS_S_COMPLETE)
@@ -2057,10 +2056,9 @@ gss_unwrap_resp_priv(struct rpc_task *task, struct rpc_cred *cred,
 	 */
 	xdr_init_decode(xdr, rcv_buf, p, rqstp);
 
-	auth->au_rslack = auth->au_verfsize + 2 +
-			  XDR_QUADLEN(savedlen - rcv_buf->len);
-	auth->au_ralign = auth->au_verfsize + 2 +
-			  XDR_QUADLEN(savedlen - rcv_buf->len);
+	auth->au_rslack = auth->au_verfsize + 2 + ctx->gc_gss_ctx->slack;
+	auth->au_ralign = auth->au_verfsize + 2 + ctx->gc_gss_ctx->align;
+
 	return 0;
 unwrap_failed:
 	trace_rpcgss_unwrap_failed(task);
diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c b/net/sunrpc/auth_gss/gss_krb5_crypto.c
index 6f2d30d7b766..e7180da1fc6a 100644
--- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
+++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
@@ -851,8 +851,8 @@ gss_krb5_aes_encrypt(struct krb5_ctx *kctx, u32 offset,
 }
 
 u32
-gss_krb5_aes_decrypt(struct krb5_ctx *kctx, u32 offset, struct xdr_buf *buf,
-		     u32 *headskip, u32 *tailskip)
+gss_krb5_aes_decrypt(struct krb5_ctx *kctx, u32 offset, u32 len,
+		     struct xdr_buf *buf, u32 *headskip, u32 *tailskip)
 {
 	struct xdr_buf subbuf;
 	u32 ret = 0;
@@ -881,7 +881,7 @@ gss_krb5_aes_decrypt(struct krb5_ctx *kctx, u32 offset, struct xdr_buf *buf,
 
 	/* create a segment skipping the header and leaving out the checksum */
 	xdr_buf_subsegment(buf, &subbuf, offset + GSS_KRB5_TOK_HDR_LEN,
-				    (buf->len - offset - GSS_KRB5_TOK_HDR_LEN -
+				    (len - offset - GSS_KRB5_TOK_HDR_LEN -
 				     kctx->gk5e->cksumlength));
 
 	nblocks = (subbuf.len + blocksize - 1) / blocksize;
@@ -926,7 +926,7 @@ gss_krb5_aes_decrypt(struct krb5_ctx *kctx, u32 offset, struct xdr_buf *buf,
 		goto out_err;
 
 	/* Get the packet's hmac value */
-	ret = read_bytes_from_xdr_buf(buf, buf->len - kctx->gk5e->cksumlength,
+	ret = read_bytes_from_xdr_buf(buf, len - kctx->gk5e->cksumlength,
 				      pkt_hmac, kctx->gk5e->cksumlength);
 	if (ret)
 		goto out_err;
diff --git a/net/sunrpc/auth_gss/gss_krb5_wrap.c b/net/sunrpc/auth_gss/gss_krb5_wrap.c
index 6c1920eed771..cf0fd170ac18 100644
--- a/net/sunrpc/auth_gss/gss_krb5_wrap.c
+++ b/net/sunrpc/auth_gss/gss_krb5_wrap.c
@@ -261,7 +261,9 @@ gss_wrap_kerberos_v1(struct krb5_ctx *kctx, int offset,
 }
 
 static u32
-gss_unwrap_kerberos_v1(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
+gss_unwrap_kerberos_v1(struct krb5_ctx *kctx, int offset, int len,
+		       struct xdr_buf *buf, unsigned int *slack,
+		       unsigned int *align)
 {
 	int			signalg;
 	int			sealalg;
@@ -279,12 +281,13 @@ gss_unwrap_kerberos_v1(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
 	u32			conflen = kctx->gk5e->conflen;
 	int			crypt_offset;
 	u8			*cksumkey;
+	unsigned int		saved_len = buf->len;
 
 	dprintk("RPC:       gss_unwrap_kerberos\n");
 
 	ptr = (u8 *)buf->head[0].iov_base + offset;
 	if (g_verify_token_header(&kctx->mech_used, &bodysize, &ptr,
-					buf->len - offset))
+					len - offset))
 		return GSS_S_DEFECTIVE_TOKEN;
 
 	if ((ptr[0] != ((KG_TOK_WRAP_MSG >> 8) & 0xff)) ||
@@ -324,6 +327,7 @@ gss_unwrap_kerberos_v1(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
 	    (!kctx->initiate && direction != 0))
 		return GSS_S_BAD_SIG;
 
+	buf->len = len;
 	if (kctx->enctype == ENCTYPE_ARCFOUR_HMAC) {
 		struct crypto_sync_skcipher *cipher;
 		int err;
@@ -376,11 +380,15 @@ gss_unwrap_kerberos_v1(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
 	data_len = (buf->head[0].iov_base + buf->head[0].iov_len) - data_start;
 	memmove(orig_start, data_start, data_len);
 	buf->head[0].iov_len -= (data_start - orig_start);
-	buf->len -= (data_start - orig_start);
+	buf->len = len - (data_start - orig_start);
 
 	if (gss_krb5_remove_padding(buf, blocksize))
 		return GSS_S_DEFECTIVE_TOKEN;
 
+	/* slack must include room for krb5 padding */
+	*slack = XDR_QUADLEN(saved_len - buf->len);
+	/* The GSS blob always precedes the RPC message payload */
+	*align = *slack;
 	return GSS_S_COMPLETE;
 }
 
@@ -486,7 +494,9 @@ gss_wrap_kerberos_v2(struct krb5_ctx *kctx, u32 offset,
 }
 
 static u32
-gss_unwrap_kerberos_v2(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
+gss_unwrap_kerberos_v2(struct krb5_ctx *kctx, int offset, int len,
+		       struct xdr_buf *buf, unsigned int *slack,
+		       unsigned int *align)
 {
 	time64_t	now;
 	u8		*ptr;
@@ -532,7 +542,7 @@ gss_unwrap_kerberos_v2(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
 	if (rrc != 0)
 		rotate_left(offset + 16, buf, rrc);
 
-	err = (*kctx->gk5e->decrypt_v2)(kctx, offset, buf,
+	err = (*kctx->gk5e->decrypt_v2)(kctx, offset, len, buf,
 					&headskip, &tailskip);
 	if (err)
 		return GSS_S_FAILURE;
@@ -542,7 +552,7 @@ gss_unwrap_kerberos_v2(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
 	 * it against the original
 	 */
 	err = read_bytes_from_xdr_buf(buf,
-				buf->len - GSS_KRB5_TOK_HDR_LEN - tailskip,
+				len - GSS_KRB5_TOK_HDR_LEN - tailskip,
 				decrypted_hdr, GSS_KRB5_TOK_HDR_LEN);
 	if (err) {
 		dprintk("%s: error %u getting decrypted_hdr\n", __func__, err);
@@ -568,18 +578,19 @@ gss_unwrap_kerberos_v2(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
 	 * Note that buf->head[0].iov_len may indicate the available
 	 * head buffer space rather than that actually occupied.
 	 */
-	movelen = min_t(unsigned int, buf->head[0].iov_len, buf->len);
+	movelen = min_t(unsigned int, buf->head[0].iov_len, len);
 	movelen -= offset + GSS_KRB5_TOK_HDR_LEN + headskip;
-	if (offset + GSS_KRB5_TOK_HDR_LEN + headskip + movelen >
-	    buf->head[0].iov_len)
-		return GSS_S_FAILURE;
+	BUG_ON(offset + GSS_KRB5_TOK_HDR_LEN + headskip + movelen >
+							buf->head[0].iov_len);
 	memmove(ptr, ptr + GSS_KRB5_TOK_HDR_LEN + headskip, movelen);
 	buf->head[0].iov_len -= GSS_KRB5_TOK_HDR_LEN + headskip;
-	buf->len -= GSS_KRB5_TOK_HDR_LEN + headskip;
+	buf->len = len - GSS_KRB5_TOK_HDR_LEN + headskip;
 
 	/* Trim off the trailing "extra count" and checksum blob */
-	buf->len -= ec + GSS_KRB5_TOK_HDR_LEN + tailskip;
+	xdr_buf_trim(buf, ec + GSS_KRB5_TOK_HDR_LEN + tailskip);
 
+	*align = XDR_QUADLEN(GSS_KRB5_TOK_HDR_LEN + headskip);
+	*slack = *align + XDR_QUADLEN(ec + GSS_KRB5_TOK_HDR_LEN + tailskip);
 	return GSS_S_COMPLETE;
 }
 
@@ -603,7 +614,8 @@ gss_wrap_kerberos(struct gss_ctx *gctx, int offset,
 }
 
 u32
-gss_unwrap_kerberos(struct gss_ctx *gctx, int offset, struct xdr_buf *buf)
+gss_unwrap_kerberos(struct gss_ctx *gctx, int offset,
+		    int len, struct xdr_buf *buf)
 {
 	struct krb5_ctx	*kctx = gctx->internal_ctx_id;
 
@@ -613,9 +625,11 @@ gss_unwrap_kerberos(struct gss_ctx *gctx, int offset, struct xdr_buf *buf)
 	case ENCTYPE_DES_CBC_RAW:
 	case ENCTYPE_DES3_CBC_RAW:
 	case ENCTYPE_ARCFOUR_HMAC:
-		return gss_unwrap_kerberos_v1(kctx, offset, buf);
+		return gss_unwrap_kerberos_v1(kctx, offset, len, buf,
+					      &gctx->slack, &gctx->align);
 	case ENCTYPE_AES128_CTS_HMAC_SHA1_96:
 	case ENCTYPE_AES256_CTS_HMAC_SHA1_96:
-		return gss_unwrap_kerberos_v2(kctx, offset, buf);
+		return gss_unwrap_kerberos_v2(kctx, offset, len, buf,
+					      &gctx->slack, &gctx->align);
 	}
 }
diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c b/net/sunrpc/auth_gss/gss_mech_switch.c
index db550bfc2642..69316ab1b9fa 100644
--- a/net/sunrpc/auth_gss/gss_mech_switch.c
+++ b/net/sunrpc/auth_gss/gss_mech_switch.c
@@ -411,10 +411,11 @@ gss_wrap(struct gss_ctx	*ctx_id,
 u32
 gss_unwrap(struct gss_ctx	*ctx_id,
 	   int			offset,
+	   int			len,
 	   struct xdr_buf	*buf)
 {
 	return ctx_id->mech_type->gm_ops
-		->gss_unwrap(ctx_id, offset, buf);
+		->gss_unwrap(ctx_id, offset, len, buf);
 }
 
 
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 65b67b257302..322fd48887f9 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -900,7 +900,7 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct g
 	if (svc_getnl(&buf->head[0]) != seq)
 		goto out;
 	/* trim off the mic and padding at the end before returning */
-	buf->len -= 4 + round_up_to_quad(mic.len);
+	xdr_buf_trim(buf, round_up_to_quad(mic.len) + 4);
 	stat = 0;
 out:
 	kfree(mic.data);
@@ -928,7 +928,7 @@ static int
 unwrap_priv_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gss_ctx *ctx)
 {
 	u32 priv_len, maj_stat;
-	int pad, saved_len, remaining_len, offset;
+	int pad, remaining_len, offset;
 
 	clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
 
@@ -948,12 +948,8 @@ unwrap_priv_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gs
 	buf->len -= pad;
 	fix_priv_head(buf, pad);
 
-	/* Maybe it would be better to give gss_unwrap a length parameter: */
-	saved_len = buf->len;
-	buf->len = priv_len;
-	maj_stat = gss_unwrap(ctx, 0, buf);
+	maj_stat = gss_unwrap(ctx, 0, priv_len, buf);
 	pad = priv_len - buf->len;
-	buf->len = saved_len;
 	buf->len -= pad;
 	/* The upper layers assume the buffer is aligned on 4-byte boundaries.
 	 * In the krb5p case, at least, the data ends up offset, so we need to
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 7324b21f923e..3ceaefb2f0bc 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2416,6 +2416,11 @@ rpc_check_timeout(struct rpc_task *task)
 {
 	struct rpc_clnt	*clnt = task->tk_client;
 
+	if (RPC_SIGNALLED(task)) {
+		rpc_call_rpcerror(task, -ERESTARTSYS);
+		return;
+	}
+
 	if (xprt_adjust_timeout(task->tk_rqstp) == 0)
 		return;
 
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index e5497dc2475b..f6da616267ce 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1150,6 +1150,47 @@ xdr_buf_subsegment(struct xdr_buf *buf, struct xdr_buf *subbuf,
 }
 EXPORT_SYMBOL_GPL(xdr_buf_subsegment);
 
+/**
+ * xdr_buf_trim - lop at most "len" bytes off the end of "buf"
+ * @buf: buf to be trimmed
+ * @len: number of bytes to reduce "buf" by
+ *
+ * Trim an xdr_buf by the given number of bytes by fixing up the lengths. Note
+ * that it's possible that we'll trim less than that amount if the xdr_buf is
+ * too small, or if (for instance) it's all in the head and the parser has
+ * already read too far into it.
+ */
+void xdr_buf_trim(struct xdr_buf *buf, unsigned int len)
+{
+	size_t cur;
+	unsigned int trim = len;
+
+	if (buf->tail[0].iov_len) {
+		cur = min_t(size_t, buf->tail[0].iov_len, trim);
+		buf->tail[0].iov_len -= cur;
+		trim -= cur;
+		if (!trim)
+			goto fix_len;
+	}
+
+	if (buf->page_len) {
+		cur = min_t(unsigned int, buf->page_len, trim);
+		buf->page_len -= cur;
+		trim -= cur;
+		if (!trim)
+			goto fix_len;
+	}
+
+	if (buf->head[0].iov_len) {
+		cur = min_t(size_t, buf->head[0].iov_len, trim);
+		buf->head[0].iov_len -= cur;
+		trim -= cur;
+	}
+fix_len:
+	buf->len -= (len - trim);
+}
+EXPORT_SYMBOL_GPL(xdr_buf_trim);
+
 static void __read_bytes_from_xdr_buf(struct xdr_buf *subbuf, void *obj, unsigned int len)
 {
 	unsigned int this_len;
diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 1a0ae0c61353..4b43910a6ed2 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -115,7 +115,7 @@ int xprt_rdma_bc_send_reply(struct rpc_rqst *rqst)
 	if (rc < 0)
 		goto failed_marshal;
 
-	if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req))
+	if (rpcrdma_post_sends(r_xprt, req))
 		goto drop_connection;
 	return 0;
 
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 125297c9aa3e..79059d48f52b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -372,18 +372,22 @@ static void frwr_wc_fastreg(struct ib_cq *cq, struct ib_wc *wc)
 }
 
 /**
- * frwr_send - post Send WR containing the RPC Call message
- * @ia: interface adapter
- * @req: Prepared RPC Call
+ * frwr_send - post Send WRs containing the RPC Call message
+ * @r_xprt: controlling transport instance
+ * @req: prepared RPC Call
  *
  * For FRWR, chain any FastReg WRs to the Send WR. Only a
  * single ib_post_send call is needed to register memory
  * and then post the Send WR.
  *
- * Returns the result of ib_post_send.
+ * Returns the return code from ib_post_send.
+ *
+ * Caller must hold the transport send lock to ensure that the
+ * pointers to the transport's rdma_cm_id and QP are stable.
  */
-int frwr_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
+int frwr_send(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
 {
+	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	struct ib_send_wr *post_wr;
 	struct rpcrdma_mr *mr;
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 3cfeba68ee9a..46e7949788e1 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -694,7 +694,7 @@ xprt_rdma_send_request(struct rpc_rqst *rqst)
 		goto drop_connection;
 	rqst->rq_xtime = ktime_get();
 
-	if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req))
+	if (rpcrdma_post_sends(r_xprt, req))
 		goto drop_connection;
 
 	rqst->rq_xmit_bytes_sent += rqst->rq_snd_buf.len;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 353f61ac8d51..a48b99f3682c 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1502,20 +1502,17 @@ static void rpcrdma_regbuf_free(struct rpcrdma_regbuf *rb)
 }
 
 /**
- * rpcrdma_ep_post - Post WRs to a transport's Send Queue
- * @ia: transport's device information
- * @ep: transport's RDMA endpoint information
+ * rpcrdma_post_sends - Post WRs to a transport's Send Queue
+ * @r_xprt: controlling transport instance
  * @req: rpcrdma_req containing the Send WR to post
  *
  * Returns 0 if the post was successful, otherwise -ENOTCONN
  * is returned.
  */
-int
-rpcrdma_ep_post(struct rpcrdma_ia *ia,
-		struct rpcrdma_ep *ep,
-		struct rpcrdma_req *req)
+int rpcrdma_post_sends(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
 {
 	struct ib_send_wr *send_wr = &req->rl_wr;
+	struct rpcrdma_ep *ep = &r_xprt->rx_ep;
 	int rc;
 
 	if (!ep->rep_send_count || kref_read(&req->rl_kref) > 1) {
@@ -1526,8 +1523,8 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
 		--ep->rep_send_count;
 	}
 
-	rc = frwr_send(ia, req);
-	trace_xprtrdma_post_send(req, rc);
+	trace_xprtrdma_post_send(req);
+	rc = frwr_send(r_xprt, req);
 	if (rc)
 		return -ENOTCONN;
 	return 0;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 37d5080c250b..600574a0d838 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -469,8 +469,7 @@ void rpcrdma_ep_destroy(struct rpcrdma_xprt *r_xprt);
 int rpcrdma_ep_connect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 void rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 
-int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
-				struct rpcrdma_req *);
+int rpcrdma_post_sends(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req);
 void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp);
 
 /*
@@ -544,7 +543,7 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt,
 				struct rpcrdma_mr_seg *seg,
 				int nsegs, bool writing, __be32 xid,
 				struct rpcrdma_mr *mr);
-int frwr_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req);
+int frwr_send(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req);
 void frwr_reminv(struct rpcrdma_rep *rep, struct list_head *mrs);
 void frwr_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req);
 void frwr_unmap_async(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req);

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: general protection fault, probably for non-canonical address in nfsd
  2020-06-07 15:32 general protection fault, probably for non-canonical address in nfsd Hans-Peter Jansen
  2020-06-07 16:01 ` Anthony Joseph Messina
@ 2020-06-08 19:27 ` Chuck Lever
  1 sibling, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2020-06-08 19:27 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: Linux NFS Mailing List



> On Jun 7, 2020, at 11:32 AM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> 
> Hi,
> 
> after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular 
> crashes of nfsd here:
> 
> 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated mount request from 192.168.3.16:303 for /work (/work)
> 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated mount request from 192.168.3.16:304 for /work/vmware (/work)
> 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated mount request from 192.168.3.16:305 for /work/vSphere (/work)
> 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general protection fault, probably for non-canonical address 0xb9159d506ba40000: 0000 [#1] SMP PTI
> 2020-06-07T01:32:43.606284+02:00 server kernel: [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G           O      5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased)
> 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware name: System manufacturer System Product Name/P7F-E, BIOS 0906    09/20/2010
> 2020-06-07T01:32:43.606287+02:00 server kernel: [51901.089247] RIP: 0010:cgroup_sk_free+0x26/0x80
> 2020-06-07T01:32:43.606288+02:00 server kernel: [51901.089257] Code: 00 00 00 00 66 66 66 66 90 53 48 8b 07 48 c7 c3 30 72 07 b6 a8 01 75 07 48 85 c0 48 0f 45 d8 48 8b 83 18 09 00 00 a8 03 
> 75 1a <65> 48 ff 08 f6 43 7c 01 74 02 5b c3 48 8b 43 18 a8 03 75 26 65 48
> 2020-06-07T01:32:43.606290+02:00 server kernel: [51901.089276] RSP: 0018:ffffb248c21e7e10 EFLAGS: 00010246
> 2020-06-07T01:32:43.606291+02:00 server kernel: [51901.089280] RAX: b91603a504000000 RBX: ffff99ab141a0000 RCX: 0000000000000021
> 2020-06-07T01:32:43.606292+02:00 server kernel: [51901.089284] RDX: ffffffffb6135ec4 RSI: 0000000000010080 RDI: ffff99a7159c1490
> 2020-06-07T01:32:43.606293+02:00 server kernel: [51901.089287] RBP: ffff99a7159c1200 R08: ffff99ab67a60c60 R09: 000000000002eb00
> 2020-06-07T01:32:43.606294+02:00 server kernel: [51901.089291] R10: ffffb248c0087dc0 R11: 00000000000000c6 R12: 0000000000000000
> 2020-06-07T01:32:43.606295+02:00 server kernel: [51901.089294] R13: 0000000000000103 R14: ffff99aae4934238 R15: ffff99ab31902000
> 2020-06-07T01:32:43.606296+02:00 server kernel: [51901.089299] FS:  0000000000000000(0000) GS:ffff99ab67a40000(0000) knlGS:0000000000000000
> 2020-06-07T01:32:43.606297+02:00 server kernel: [51901.089303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2020-06-07T01:32:43.606303+02:00 server kernel: [51901.089305] CR2: 00000000008e0000 CR3: 00000004df60a000 CR4: 00000000000026e0
> 2020-06-07T01:32:43.606304+02:00 server kernel: [51901.089307] Call Trace:
> 2020-06-07T01:32:43.606305+02:00 server kernel: [51901.089315]  __sk_destruct+0x10d/0x1d0
> 2020-06-07T01:32:43.606306+02:00 server kernel: [51901.089319]  inet_release+0x34/0x60
> 2020-06-07T01:32:43.606307+02:00 server kernel: [51901.089325]  __sock_release+0x81/0xb0
> 2020-06-07T01:32:43.606308+02:00 server kernel: [51901.089358]  svc_sock_free+0x38/0x60 [sunrpc]
> 2020-06-07T01:32:43.606308+02:00 server kernel: [51901.089374]  svc_xprt_put+0x99/0xe0 [sunrpc]
> 2020-06-07T01:32:43.606310+02:00 server kernel: [51901.089389]  svc_recv+0x9c0/0xa40 [sunrpc]
> 2020-06-07T01:32:43.606310+02:00 server kernel: [51901.089410]  ? nfsd_destroy+0x60/0x60 [nfsd]
> 2020-06-07T01:32:43.606311+02:00 server kernel: [51901.089417]  nfsd+0xd1/0x150 [nfsd]
> 2020-06-07T01:32:43.606312+02:00 server kernel: [51901.089420]  kthread+0x10d/0x130
> 2020-06-07T01:32:43.606313+02:00 server kernel: [51901.089423]  ? kthread_park+0x90/0x90
> 2020-06-07T01:32:43.606314+02:00 server kernel: [51901.089426]  ret_from_fork+0x35/0x40
> 
> A vSphere 5.5 host accesses this linux server with nfs v3 for backup
> purposes (a Veeam backup server want to store a new backup here). 
> 
> The kernel is tainted due to vboxdrv. The OS is openSUSE Leap 15.1,
> with the kernel and Virtualbox replaced with uptodate versions from 
> proper rpm packages (built on that very vSphere host in a OBS server
> VM..).
> 
> I used to be subscribed to this ML, but that subscription has been 
> lost 04/09, thus I cannot reply properly to the general prot. fault
> thread, started 05/12 from syzbot with Bruce looking into it.
> 
> It seems somewhat related.

Your backtrace doesn't look anything like the syzbot crashes Bruce
is looking at, and there are no fs/nfsd/ changes between v5.6.11 and
v5.6.14. His crashes appear to be related entirely to the order of
destruction of net namespaces and NFS server data structures --
nothing at the socket layer.

The net/sunrpc/ changes in that commit range have nothing to do with
socket allocation. However, this:

   [51901.089247] RIP: 0010:cgroup_sk_free+0x26/0x80

suggests something else. There is a cgroup/sk related change in that
commit range:

e2d928d5ee43 ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups")

I'm not sure how to help you further, since you are not available to
test this theory for a few weeks. The best I can suggest for others is
to stick with v5.6.11-based kernels until someone with a reproducer
can bisect between .11 and .14 to confirm the theory.


> Interestingly, we're using a couple of NFS v4 mounts for subsets of
> home here, and mount /work and other shares from various 
> Tumbleweed systems with NFS v4 here without any undesired effects. 
> 
> Since the kernel upgrade, every time, this Veeam thing triggers these 
> v3 mounts, the crash happens. I've disabled this backup target for now 
> until the problem is resolved, because it effectively prevents further 
> nfs accesses to this server, and blocks our desktops until the server 
> is rebooted.
> 
> A cursory look into 5.6.{15,16} changelogs seems to imply, that this
> issue is still pending. 
> 
> Let me know, if I can provide any further info's.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-08 19:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 15:32 general protection fault, probably for non-canonical address in nfsd Hans-Peter Jansen
2020-06-07 16:01 ` Anthony Joseph Messina
2020-06-07 17:44   ` Hans-Peter Jansen
2020-06-08 15:28     ` Chuck Lever
2020-06-08 17:53       ` Hans-Peter Jansen
2020-06-08 18:31         ` Anthony Joseph Messina
2020-06-08 19:27 ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).