kernel-tls-handshake.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* problems getting rpc over tls to work
@ 2023-03-28 12:27 Jeff Layton
  2023-03-28 12:55 ` Jeff Layton
  2023-03-28 13:29 ` Chuck Lever III
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 12:27 UTC (permalink / raw)
  To: Chuck Lever; +Cc: kernel-tls-handshake

Hi Chuck!

I have started the packaging work for Fedora for ktls-utils:

    https://bugzilla.redhat.com/show_bug.cgi?id=2182151

I also built packages for this in copr:

    https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/

...and built some interim nfs-utils packages with the requisite exportfs
patches:

    https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/

I built a kernel from your topic-rpc-with-tls-upcall branch and
installed the kernel on a client and server, along with ktls-utils and
the updated nfs-utils on the server. I set up tlshd to run at boot on
both hosts. The server exports with a bog-standard set of options:

    /export		*(rw,insecure,no_root_squash)

I then tried to mount it with tls:

    $ sudo mount knfsd:/export /mnt/knfsd -o xprtsec=tls

I see the initial NULL requests go out, and then I see the client send
an encrypted frame to the server, and the server just shuts down the
socket at that point (FIN, ACK).

I assume that I must have something configured wrong. What am I missing?

Eventually, after a couple of failed mount attempts, I also hit this on
the client:

[  375.561304] BUG: kernel NULL pointer dereference, address:
0000000000000030
[  375.564637] #PF: supervisor read access in kernel mode
[  375.566439] #PF: error_code(0x0000) - not-present page
[  375.567930] PGD 0 P4D 0 
[  375.568733] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  375.569993] CPU: 2 PID: 9 Comm: kworker/u16:0 Tainted: G            E
6.3.0-rc2+ #151
[  375.572214] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.1-2.fc37 04/01/2014
[  375.574538] Workqueue: xprtiod xs_tls_connect [sunrpc]
[  375.576087] RIP: 0010:handshake_req_cancel+0x12/0x1c0
[  375.578255] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
[  375.583180] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
[  375.585416] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
0000000000000000
[  375.588226] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
0000000000000000
[  375.590389] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
0000000000000001
[  375.592601] R10: 0000000000000002 R11: 0000000000000001 R12:
ffff970288985480
[  375.594885] R13: 0000000000000000 R14: 0000000004208160 R15:
ffff970289cb4800
[  375.597051] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
knlGS:0000000000000000
[  375.599810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  375.601625] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
00000000003506e0
[  375.603785] Call Trace:
[  375.604754]  <TASK>
[  375.605651]  xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
[  375.608998]  ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
[  375.610900]  xs_tls_connect+0x14a/0x5f0 [sunrpc]
[  375.612530]  process_one_work+0x1c8/0x3c0
[  375.613907]  worker_thread+0x4d/0x380
[  375.615189]  ? __pfx_worker_thread+0x10/0x10
[  375.616631]  kthread+0xe9/0x110
[  375.617788]  ? __pfx_kthread+0x10/0x10
[  375.619091]  ret_from_fork+0x2c/0x50
[  375.620347]  </TASK>
[  375.621242] Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E)
nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) ext4(E)
crc16(E) mbcache(E) jbd2(E) snd_hda_codec_generic(E) snd_hda_intel(E)
snd_intel_dspcfg(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E)
snd_pcm(E) kvm_amd(E) snd_timer(E) kvm(E) psmouse(E) snd(E) evdev(E)
irqbypass(E) virtio_balloon(E) soundcore(E) pcspkr(E) button(E) loop(E)
drm(E) configfs(E) zram(E) zsmalloc(E) xfs(E) libcrc32c(E)
crc32c_generic(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E)
ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) virtio_net(E)
virtio_blk(E) net_failover(E) failover(E) virtio_console(E)
aesni_intel(E) serio_raw(E) crypto_simd(E) cryptd(E) virtio_pci(E)
virtio(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
virtio_ring(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E)
dm_multipath(E) dm_mod(E) scsi_mod(E) scsi_common(E) autofs4(E)
[  375.646698] CR2: 0000000000000030
[  375.647894] ---[ end trace 0000000000000000 ]---
[  375.649403] RIP: 0010:handshake_req_cancel+0x12/0x1c0
[  375.651062] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
[  375.654664] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
[  375.655447] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
0000000000000000
[  375.656436] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
0000000000000000
[  375.657425] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
0000000000000001
[  375.658392] R10: 0000000000000002 R11: 0000000000000001 R12:
ffff970288985480
[  375.659360] R13: 0000000000000000 R14: 0000000004208160 R15:
ffff970289cb4800
[  375.660324] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
knlGS:0000000000000000
[  375.661479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  375.662285] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
00000000003506e0
[  375.663278] note: kworker/u16:0[9] exited with irqs disabled


...faddr2line says:

[jlayton@tleilax linux]$ ./scripts/faddr2line --list vmlinux
handshake_req_cancel+0x12/0x1c0
handshake_req_cancel+0x12/0x1c0:

read_pnet at include/net/net_namespace.h:383
 378 	}
 379 	
 380 	static inline struct net *read_pnet(const possible_net_t *pnet)
 381 	{
 382 	#ifdef CONFIG_NET_NS
>383<		return pnet->net;
 384 	#else
 385 		return &init_net;
 386 	#endif
 387 	}
 388 	

(inlined by) sock_net at include/net/sock.h:649
 644 		__rcu_assign_sk_user_data_with_flags(sk, ptr, 0)
 645 	
 646 	static inline
 647 	struct net *sock_net(const struct sock *sk)
 648 	{
>649<		return read_pnet(&sk->sk_net);
 650 	}
 651 	
 652 	static inline
 653 	void sock_net_set(struct sock *sk, struct net *net)
 654 	{

(inlined by) handshake_req_cancel at net/handshake/request.c:281
 276 		struct handshake_net *hn;
 277 		struct sock *sk;
 278 		struct net *net;
 279 	
 280 		sk = sock->sk;
>281<		net = sock_net(sk);
 282 		req = handshake_req_hash_lookup(sk);
 283 		if (!req) {
 284 			trace_handshake_cancel_none(net, req, sk);
 285 			return true;
 286 		}


I'm guessing sk was NULL in handshake_req_cancel?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 12:27 problems getting rpc over tls to work Jeff Layton
@ 2023-03-28 12:55 ` Jeff Layton
  2023-03-28 14:04   ` Chuck Lever III
  2023-03-28 13:29 ` Chuck Lever III
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 12:55 UTC (permalink / raw)
  To: Chuck Lever; +Cc: kernel-tls-handshake

On Tue, 2023-03-28 at 08:27 -0400, Jeff Layton wrote:
> Hi Chuck!
> 
> I have started the packaging work for Fedora for ktls-utils:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> 
> I also built packages for this in copr:
> 
>     https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> 
> ...and built some interim nfs-utils packages with the requisite exportfs
> patches:
> 
>     https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> 
> I built a kernel from your topic-rpc-with-tls-upcall branch and
> installed the kernel on a client and server, along with ktls-utils and
> the updated nfs-utils on the server. I set up tlshd to run at boot on
> both hosts. The server exports with a bog-standard set of options:
> 
>     /export		*(rw,insecure,no_root_squash)
> 
> I then tried to mount it with tls:
> 
>     $ sudo mount knfsd:/export /mnt/knfsd -o xprtsec=tls
> 
> I see the initial NULL requests go out, and then I see the client send
> an encrypted frame to the server, and the server just shuts down the
> socket at that point (FIN, ACK).
> 
> I assume that I must have something configured wrong. What am I missing?
> 
> Eventually, after a couple of failed mount attempts, I also hit this on
> the client:
> 
> [  375.561304] BUG: kernel NULL pointer dereference, address:
> 0000000000000030
> [  375.564637] #PF: supervisor read access in kernel mode
> [  375.566439] #PF: error_code(0x0000) - not-present page
> [  375.567930] PGD 0 P4D 0 
> [  375.568733] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  375.569993] CPU: 2 PID: 9 Comm: kworker/u16:0 Tainted: G            E
> 6.3.0-rc2+ #151
> [  375.572214] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.1-2.fc37 04/01/2014
> [  375.574538] Workqueue: xprtiod xs_tls_connect [sunrpc]
> [  375.576087] RIP: 0010:handshake_req_cancel+0x12/0x1c0
> [  375.578255] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
> ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
> [  375.583180] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
> [  375.585416] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
> 0000000000000000
> [  375.588226] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
> 0000000000000000
> [  375.590389] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
> 0000000000000001
> [  375.592601] R10: 0000000000000002 R11: 0000000000000001 R12:
> ffff970288985480
> [  375.594885] R13: 0000000000000000 R14: 0000000004208160 R15:
> ffff970289cb4800
> [  375.597051] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
> knlGS:0000000000000000
> [  375.599810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  375.601625] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
> 00000000003506e0
> [  375.603785] Call Trace:
> [  375.604754]  <TASK>
> [  375.605651]  xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> [  375.608998]  ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> [  375.610900]  xs_tls_connect+0x14a/0x5f0 [sunrpc]
> [  375.612530]  process_one_work+0x1c8/0x3c0
> [  375.613907]  worker_thread+0x4d/0x380
> [  375.615189]  ? __pfx_worker_thread+0x10/0x10
> [  375.616631]  kthread+0xe9/0x110
> [  375.617788]  ? __pfx_kthread+0x10/0x10
> [  375.619091]  ret_from_fork+0x2c/0x50
> [  375.620347]  </TASK>
> [  375.621242] Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E)
> nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) ext4(E)
> crc16(E) mbcache(E) jbd2(E) snd_hda_codec_generic(E) snd_hda_intel(E)
> snd_intel_dspcfg(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E)
> snd_pcm(E) kvm_amd(E) snd_timer(E) kvm(E) psmouse(E) snd(E) evdev(E)
> irqbypass(E) virtio_balloon(E) soundcore(E) pcspkr(E) button(E) loop(E)
> drm(E) configfs(E) zram(E) zsmalloc(E) xfs(E) libcrc32c(E)
> crc32c_generic(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E)
> ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) virtio_net(E)
> virtio_blk(E) net_failover(E) failover(E) virtio_console(E)
> aesni_intel(E) serio_raw(E) crypto_simd(E) cryptd(E) virtio_pci(E)
> virtio(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
> virtio_ring(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E)
> dm_multipath(E) dm_mod(E) scsi_mod(E) scsi_common(E) autofs4(E)
> [  375.646698] CR2: 0000000000000030
> [  375.647894] ---[ end trace 0000000000000000 ]---
> [  375.649403] RIP: 0010:handshake_req_cancel+0x12/0x1c0
> [  375.651062] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
> ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
> [  375.654664] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
> [  375.655447] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
> 0000000000000000
> [  375.656436] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
> 0000000000000000
> [  375.657425] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
> 0000000000000001
> [  375.658392] R10: 0000000000000002 R11: 0000000000000001 R12:
> ffff970288985480
> [  375.659360] R13: 0000000000000000 R14: 0000000004208160 R15:
> ffff970289cb4800
> [  375.660324] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
> knlGS:0000000000000000
> [  375.661479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  375.662285] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
> 00000000003506e0
> [  375.663278] note: kworker/u16:0[9] exited with irqs disabled
> 
> 
> ...faddr2line says:
> 
> [jlayton@tleilax linux]$ ./scripts/faddr2line --list vmlinux
> handshake_req_cancel+0x12/0x1c0
> handshake_req_cancel+0x12/0x1c0:
> 
> read_pnet at include/net/net_namespace.h:383
>  378 	}
>  379 	
>  380 	static inline struct net *read_pnet(const possible_net_t *pnet)
>  381 	{
>  382 	#ifdef CONFIG_NET_NS
> > 383<		return pnet->net;
>  384 	#else
>  385 		return &init_net;
>  386 	#endif
>  387 	}
>  388 	
> 
> (inlined by) sock_net at include/net/sock.h:649
>  644 		__rcu_assign_sk_user_data_with_flags(sk, ptr, 0)
>  645 	
>  646 	static inline
>  647 	struct net *sock_net(const struct sock *sk)
>  648 	{
> > 649<		return read_pnet(&sk->sk_net);
>  650 	}
>  651 	
>  652 	static inline
>  653 	void sock_net_set(struct sock *sk, struct net *net)
>  654 	{
> 
> (inlined by) handshake_req_cancel at net/handshake/request.c:281
>  276 		struct handshake_net *hn;
>  277 		struct sock *sk;
>  278 		struct net *net;
>  279 	
>  280 		sk = sock->sk;
> > 281<		net = sock_net(sk);
>  282 		req = handshake_req_hash_lookup(sk);
>  283 		if (!req) {
>  284 			trace_handshake_cancel_none(net, req, sk);
>  285 			return true;
>  286 		}
> 
> 
> I'm guessing sk was NULL in handshake_req_cancel?

Ahh, I think this must be the issue:

Mar 28 08:37:46 knfsd tlshd[1324]: Default certificate not found: Key file does not have key “x509.certificate” in group “authenticate.server”
Mar 28 08:37:46 knfsd tlshd[1324]: Handshake with 'nfsclnt.poochiereds.net' (192.168.1.136) failed

I'll go over the docs more carefully and try again. I wonder...should we have the ktls-utils package install a self-signed cert by default?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 12:27 problems getting rpc over tls to work Jeff Layton
  2023-03-28 12:55 ` Jeff Layton
@ 2023-03-28 13:29 ` Chuck Lever III
  2023-03-28 13:51   ` Jeff Layton
  2023-03-28 13:55   ` Chuck Lever III
  1 sibling, 2 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 13:29 UTC (permalink / raw)
  To: Jeff Layton; +Cc: kernel-tls-handshake



> On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> Hi Chuck!
> 
> I have started the packaging work for Fedora for ktls-utils:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> 
> I also built packages for this in copr:
> 
>    https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> 
> ...and built some interim nfs-utils packages with the requisite exportfs
> patches:
> 
>    https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/

Note that the nfs-utils changes aren't necessary to support
the kernel server in "opportunistic" mode -- the server will
use RPC-with-TLS if a client requests it, but otherwise does
not restrict access.

Client side also has no nfs-utils requirements at this time,
since the new mount options are handled by the kernel.


> I built a kernel from your topic-rpc-with-tls-upcall branch and
> installed the kernel on a client and server, along with ktls-utils and
> the updated nfs-utils on the server. I set up tlshd to run at boot on
> both hosts. The server exports with a bog-standard set of options:
> 
>    /export		*(rw,insecure,no_root_squash)
> 
> I then tried to mount it with tls:
> 
>    $ sudo mount knfsd:/export /mnt/knfsd -o xprtsec=tls
> 
> I see the initial NULL requests go out, and then I see the client send
> an encrypted frame to the server, and the server just shuts down the
> socket at that point (FIN, ACK).
> 
> I assume that I must have something configured wrong. What am I missing?

The starting move is to crank up the debug settings in /etc/tlshd.conf...


> Eventually, after a couple of failed mount attempts, I also hit this on
> the client:
> 
> [  375.561304] BUG: kernel NULL pointer dereference, address:
> 0000000000000030
> [  375.564637] #PF: supervisor read access in kernel mode
> [  375.566439] #PF: error_code(0x0000) - not-present page
> [  375.567930] PGD 0 P4D 0 
> [  375.568733] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  375.569993] CPU: 2 PID: 9 Comm: kworker/u16:0 Tainted: G            E
> 6.3.0-rc2+ #151
> [  375.572214] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.1-2.fc37 04/01/2014
> [  375.574538] Workqueue: xprtiod xs_tls_connect [sunrpc]
> [  375.576087] RIP: 0010:handshake_req_cancel+0x12/0x1c0
> [  375.578255] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
> ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
> [  375.583180] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
> [  375.585416] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
> 0000000000000000
> [  375.588226] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
> 0000000000000000
> [  375.590389] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
> 0000000000000001
> [  375.592601] R10: 0000000000000002 R11: 0000000000000001 R12:
> ffff970288985480
> [  375.594885] R13: 0000000000000000 R14: 0000000004208160 R15:
> ffff970289cb4800
> [  375.597051] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
> knlGS:0000000000000000
> [  375.599810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  375.601625] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
> 00000000003506e0
> [  375.603785] Call Trace:
> [  375.604754]  <TASK>
> [  375.605651]  xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> [  375.608998]  ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> [  375.610900]  xs_tls_connect+0x14a/0x5f0 [sunrpc]
> [  375.612530]  process_one_work+0x1c8/0x3c0
> [  375.613907]  worker_thread+0x4d/0x380
> [  375.615189]  ? __pfx_worker_thread+0x10/0x10
> [  375.616631]  kthread+0xe9/0x110
> [  375.617788]  ? __pfx_kthread+0x10/0x10
> [  375.619091]  ret_from_fork+0x2c/0x50
> [  375.620347]  </TASK>
> [  375.621242] Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E)
> nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) ext4(E)
> crc16(E) mbcache(E) jbd2(E) snd_hda_codec_generic(E) snd_hda_intel(E)
> snd_intel_dspcfg(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E)
> snd_pcm(E) kvm_amd(E) snd_timer(E) kvm(E) psmouse(E) snd(E) evdev(E)
> irqbypass(E) virtio_balloon(E) soundcore(E) pcspkr(E) button(E) loop(E)
> drm(E) configfs(E) zram(E) zsmalloc(E) xfs(E) libcrc32c(E)
> crc32c_generic(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E)
> ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) virtio_net(E)
> virtio_blk(E) net_failover(E) failover(E) virtio_console(E)
> aesni_intel(E) serio_raw(E) crypto_simd(E) cryptd(E) virtio_pci(E)
> virtio(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
> virtio_ring(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E)
> dm_multipath(E) dm_mod(E) scsi_mod(E) scsi_common(E) autofs4(E)
> [  375.646698] CR2: 0000000000000030
> [  375.647894] ---[ end trace 0000000000000000 ]---
> [  375.649403] RIP: 0010:handshake_req_cancel+0x12/0x1c0
> [  375.651062] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
> ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
> [  375.654664] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
> [  375.655447] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
> 0000000000000000
> [  375.656436] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
> 0000000000000000
> [  375.657425] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
> 0000000000000001
> [  375.658392] R10: 0000000000000002 R11: 0000000000000001 R12:
> ffff970288985480
> [  375.659360] R13: 0000000000000000 R14: 0000000004208160 R15:
> ffff970289cb4800
> [  375.660324] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
> knlGS:0000000000000000
> [  375.661479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  375.662285] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
> 00000000003506e0
> [  375.663278] note: kworker/u16:0[9] exited with irqs disabled
> 
> 
> ...faddr2line says:
> 
> [jlayton@tleilax linux]$ ./scripts/faddr2line --list vmlinux
> handshake_req_cancel+0x12/0x1c0
> handshake_req_cancel+0x12/0x1c0:
> 
> read_pnet at include/net/net_namespace.h:383
> 378 	}
> 379 	
> 380 	static inline struct net *read_pnet(const possible_net_t *pnet)
> 381 	{
> 382 	#ifdef CONFIG_NET_NS
>> 383<		return pnet->net;
> 384 	#else
> 385 		return &init_net;
> 386 	#endif
> 387 	}
> 388 	
> 
> (inlined by) sock_net at include/net/sock.h:649
> 644 		__rcu_assign_sk_user_data_with_flags(sk, ptr, 0)
> 645 	
> 646 	static inline
> 647 	struct net *sock_net(const struct sock *sk)
> 648 	{
>> 649<		return read_pnet(&sk->sk_net);
> 650 	}
> 651 	
> 652 	static inline
> 653 	void sock_net_set(struct sock *sk, struct net *net)
> 654 	{
> 
> (inlined by) handshake_req_cancel at net/handshake/request.c:281
> 276 		struct handshake_net *hn;
> 277 		struct sock *sk;
> 278 		struct net *net;
> 279 	
> 280 		sk = sock->sk;
>> 281<		net = sock_net(sk);
> 282 		req = handshake_req_hash_lookup(sk);
> 283 		if (!req) {
> 284 			trace_handshake_cancel_none(net, req, sk);
> 285 			return true;
> 286 		}
> 
> 
> I'm guessing sk was NULL in handshake_req_cancel?

Jakub asked me to remove the NULL check there. But I think
req_cancel needs to handle this case, which might happen
due to a race.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 13:29 ` Chuck Lever III
@ 2023-03-28 13:51   ` Jeff Layton
  2023-03-28 13:55   ` Chuck Lever III
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 13:51 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: kernel-tls-handshake

On Tue, 2023-03-28 at 13:29 +0000, Chuck Lever III wrote:
> 
> > On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > Hi Chuck!
> > 
> > I have started the packaging work for Fedora for ktls-utils:
> > 
> >    https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> > 
> > I also built packages for this in copr:
> > 
> >    https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> > 
> > ...and built some interim nfs-utils packages with the requisite exportfs
> > patches:
> > 
> >    https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> 
> Note that the nfs-utils changes aren't necessary to support
> the kernel server in "opportunistic" mode -- the server will
> use RPC-with-TLS if a client requests it, but otherwise does
> not restrict access.
> 

Opportunistic mode doesn't seem to work for me. I created a self-signed
cert and tried to use it, but the client rejects it with this:

    Mar 28 09:01:20 nfsclnt tlshd[1092]: Certificate signer not found.

Is there a way to make it not try to validate the cert chain? Otherwise,
I guess I'll need to set up a CA and such.

> Client side also has no nfs-utils requirements at this time,
> since the new mount options are handled by the kernel.
> 
> 
> > I built a kernel from your topic-rpc-with-tls-upcall branch and
> > installed the kernel on a client and server, along with ktls-utils and
> > the updated nfs-utils on the server. I set up tlshd to run at boot on
> > both hosts. The server exports with a bog-standard set of options:
> > 
> >    /export		*(rw,insecure,no_root_squash)
> > 
> > I then tried to mount it with tls:
> > 
> >    $ sudo mount knfsd:/export /mnt/knfsd -o xprtsec=tls
> > 
> > I see the initial NULL requests go out, and then I see the client send
> > an encrypted frame to the server, and the server just shuts down the
> > socket at that point (FIN, ACK).
> > 
> > I assume that I must have something configured wrong. What am I missing?
> 
> The starting move is to crank up the debug settings in /etc/tlshd.conf...
> 
> 

Thanks. I'll try that next time.

> > Eventually, after a couple of failed mount attempts, I also hit this on
> > the client:
> > 
> > [  375.561304] BUG: kernel NULL pointer dereference, address:
> > 0000000000000030
> > [  375.564637] #PF: supervisor read access in kernel mode
> > [  375.566439] #PF: error_code(0x0000) - not-present page
> > [  375.567930] PGD 0 P4D 0 
> > [  375.568733] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  375.569993] CPU: 2 PID: 9 Comm: kworker/u16:0 Tainted: G            E
> > 6.3.0-rc2+ #151
> > [  375.572214] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > 1.16.1-2.fc37 04/01/2014
> > [  375.574538] Workqueue: xprtiod xs_tls_connect [sunrpc]
> > [  375.576087] RIP: 0010:handshake_req_cancel+0x12/0x1c0
> > [  375.578255] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
> > 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
> > ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
> > [  375.583180] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
> > [  375.585416] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
> > 0000000000000000
> > [  375.588226] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
> > 0000000000000000
> > [  375.590389] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
> > 0000000000000001
> > [  375.592601] R10: 0000000000000002 R11: 0000000000000001 R12:
> > ffff970288985480
> > [  375.594885] R13: 0000000000000000 R14: 0000000004208160 R15:
> > ffff970289cb4800
> > [  375.597051] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
> > knlGS:0000000000000000
> > [  375.599810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  375.601625] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
> > 00000000003506e0
> > [  375.603785] Call Trace:
> > [  375.604754]  <TASK>
> > [  375.605651]  xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> > [  375.608998]  ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> > [  375.610900]  xs_tls_connect+0x14a/0x5f0 [sunrpc]
> > [  375.612530]  process_one_work+0x1c8/0x3c0
> > [  375.613907]  worker_thread+0x4d/0x380
> > [  375.615189]  ? __pfx_worker_thread+0x10/0x10
> > [  375.616631]  kthread+0xe9/0x110
> > [  375.617788]  ? __pfx_kthread+0x10/0x10
> > [  375.619091]  ret_from_fork+0x2c/0x50
> > [  375.620347]  </TASK>
> > [  375.621242] Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E)
> > nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) ext4(E)
> > crc16(E) mbcache(E) jbd2(E) snd_hda_codec_generic(E) snd_hda_intel(E)
> > snd_intel_dspcfg(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E)
> > snd_pcm(E) kvm_amd(E) snd_timer(E) kvm(E) psmouse(E) snd(E) evdev(E)
> > irqbypass(E) virtio_balloon(E) soundcore(E) pcspkr(E) button(E) loop(E)
> > drm(E) configfs(E) zram(E) zsmalloc(E) xfs(E) libcrc32c(E)
> > crc32c_generic(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E)
> > ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) virtio_net(E)
> > virtio_blk(E) net_failover(E) failover(E) virtio_console(E)
> > aesni_intel(E) serio_raw(E) crypto_simd(E) cryptd(E) virtio_pci(E)
> > virtio(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
> > virtio_ring(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E)
> > dm_multipath(E) dm_mod(E) scsi_mod(E) scsi_common(E) autofs4(E)
> > [  375.646698] CR2: 0000000000000030
> > [  375.647894] ---[ end trace 0000000000000000 ]---
> > [  375.649403] RIP: 0010:handshake_req_cancel+0x12/0x1c0
> > [  375.651062] Code: d0 5b ff eb 92 0f 1f 00 90 90 90 90 90 90 90 90 90
> > 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 6f 18 48 89
> > ef <4c> 8b 6d 30 e8 35 fe ff ff 48 85 c0 0f 84 3e 01 00 00 4c 89 ef 48
> > [  375.654664] RSP: 0018:ffffb87540053d28 EFLAGS: 00010246
> > [  375.655447] RAX: 0000000000000000 RBX: ffff970289cb4800 RCX:
> > 0000000000000000
> > [  375.656436] RDX: 0000000000000001 RSI: ffffb87540053d00 RDI:
> > 0000000000000000
> > [  375.657425] RBP: 0000000000000000 R08: ffff970280e544a8 R09:
> > 0000000000000001
> > [  375.658392] R10: 0000000000000002 R11: 0000000000000001 R12:
> > ffff970288985480
> > [  375.659360] R13: 0000000000000000 R14: 0000000004208160 R15:
> > ffff970289cb4800
> > [  375.660324] FS:  0000000000000000(0000) GS:ffff9703f7c80000(0000)
> > knlGS:0000000000000000
> > [  375.661479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  375.662285] CR2: 0000000000000030 CR3: 000000010aa04000 CR4:
> > 00000000003506e0
> > [  375.663278] note: kworker/u16:0[9] exited with irqs disabled
> > 
> > 
> > ...faddr2line says:
> > 
> > [jlayton@tleilax linux]$ ./scripts/faddr2line --list vmlinux
> > handshake_req_cancel+0x12/0x1c0
> > handshake_req_cancel+0x12/0x1c0:
> > 
> > read_pnet at include/net/net_namespace.h:383
> > 378 	}
> > 379 	
> > 380 	static inline struct net *read_pnet(const possible_net_t *pnet)
> > 381 	{
> > 382 	#ifdef CONFIG_NET_NS
> > > 383<		return pnet->net;
> > 384 	#else
> > 385 		return &init_net;
> > 386 	#endif
> > 387 	}
> > 388 	
> > 
> > (inlined by) sock_net at include/net/sock.h:649
> > 644 		__rcu_assign_sk_user_data_with_flags(sk, ptr, 0)
> > 645 	
> > 646 	static inline
> > 647 	struct net *sock_net(const struct sock *sk)
> > 648 	{
> > > 649<		return read_pnet(&sk->sk_net);
> > 650 	}
> > 651 	
> > 652 	static inline
> > 653 	void sock_net_set(struct sock *sk, struct net *net)
> > 654 	{
> > 
> > (inlined by) handshake_req_cancel at net/handshake/request.c:281
> > 276 		struct handshake_net *hn;
> > 277 		struct sock *sk;
> > 278 		struct net *net;
> > 279 	
> > 280 		sk = sock->sk;
> > > 281<		net = sock_net(sk);
> > 282 		req = handshake_req_hash_lookup(sk);
> > 283 		if (!req) {
> > 284 			trace_handshake_cancel_none(net, req, sk);
> > 285 			return true;
> > 286 		}
> > 
> > 
> > I'm guessing sk was NULL in handshake_req_cancel?
> 
> Jakub asked me to remove the NULL check there. But I think
> req_cancel needs to handle this case, which might happen
> due to a race.
> 

If there's a race there, is that sufficient? Could it go NULL after you
check but before the call to sock_net? Maybe I need to better understand
the race.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 13:29 ` Chuck Lever III
  2023-03-28 13:51   ` Jeff Layton
@ 2023-03-28 13:55   ` Chuck Lever III
  2023-03-28 14:13     ` Jeff Layton
  1 sibling, 1 reply; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 13:55 UTC (permalink / raw)
  To: Jeff Layton; +Cc: kernel-tls-handshake



> On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
>> 
>> Hi Chuck!
>> 
>> I have started the packaging work for Fedora for ktls-utils:
>> 
>>   https://bugzilla.redhat.com/show_bug.cgi?id=2182151
>> 
>> I also built packages for this in copr:
>> 
>>   https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
>> 
>> ...and built some interim nfs-utils packages with the requisite exportfs
>> patches:
>> 
>>   https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> 
> Note that the nfs-utils changes aren't necessary to support
> the kernel server in "opportunistic" mode -- the server will
> use RPC-with-TLS if a client requests it, but otherwise does
> not restrict access.
> 
> Client side also has no nfs-utils requirements at this time,
> since the new mount options are handled by the kernel.

In case I wasn't clear:

This was meant as a suggestion. If you want to simplify your
test set-up a bit, the nfs-utils piece isn't needed at this
point. But feel free to include it if you like!


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 12:55 ` Jeff Layton
@ 2023-03-28 14:04   ` Chuck Lever III
  2023-03-28 14:23     ` Benjamin Coddington
  2023-03-28 14:29     ` Jeff Layton
  0 siblings, 2 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 14:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: kernel-tls-handshake



> On Mar 28, 2023, at 8:55 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> I wonder...should we have the ktls-utils package install a self-signed cert by default?

So this idea is intriguing, I had some similar thoughts.

I'm not sure what the security implications of all this are.
We'd first need to look at other certificate-based packages
in Fedora to see if they offer a similar quick-setup. The
cert would have to be created at install time.


> I created a self-signed
> cert and tried to use it, but the client rejects it with this:
> 
>    Mar 28 09:01:20 nfsclnt tlshd[1092]: Certificate signer not found.
> 
> Is there a way to make it not try to validate the cert chain?

Olga also found that self-signed server certs are not
working as we'd like. tlshd had a mechanism to force the
clients not to check the signer, but it was removed
because it was deemed insecure.

I'd like to find a way to make self-signed work seamlessly.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 13:55   ` Chuck Lever III
@ 2023-03-28 14:13     ` Jeff Layton
  2023-03-28 14:25       ` Olga Kornievskaia
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 14:13 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: kernel-tls-handshake

On Tue, 2023-03-28 at 13:55 +0000, Chuck Lever III wrote:
> 
> > On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > 
> > 
> > 
> > > On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > Hi Chuck!
> > > 
> > > I have started the packaging work for Fedora for ktls-utils:
> > > 
> > >   https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> > > 
> > > I also built packages for this in copr:
> > > 
> > >   https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> > > 
> > > ...and built some interim nfs-utils packages with the requisite exportfs
> > > patches:
> > > 
> > >   https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> > 
> > Note that the nfs-utils changes aren't necessary to support
> > the kernel server in "opportunistic" mode -- the server will
> > use RPC-with-TLS if a client requests it, but otherwise does
> > not restrict access.
> > 
> > Client side also has no nfs-utils requirements at this time,
> > since the new mount options are handled by the kernel.
> 
> In case I wasn't clear:
> 
> This was meant as a suggestion. If you want to simplify your
> test set-up a bit, the nfs-utils piece isn't needed at this
> point. But feel free to include it if you like!
> 

Understood. I needed to build it for the server side anyway, so I
figured I might as well. Eventually I'd like to set up a Fedora COPR
repo that has all of the packages we need to test this, but I need to
sort through the certificate handling here first.

Are there docs on how to administer gnutls? For instance, I guess I'll
want to set up my own CA and issue client and server certs. How do I
make gnutls trust a new CA?
--
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:04   ` Chuck Lever III
@ 2023-03-28 14:23     ` Benjamin Coddington
  2023-03-28 14:29     ` Jeff Layton
  1 sibling, 0 replies; 26+ messages in thread
From: Benjamin Coddington @ 2023-03-28 14:23 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, kernel-tls-handshake

On 28 Mar 2023, at 10:04, Chuck Lever III wrote:

> I'd like to find a way to make self-signed work seamlessly.

You can add them to the trust store, or bring back the option to skip verification.

Ben


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:13     ` Jeff Layton
@ 2023-03-28 14:25       ` Olga Kornievskaia
  2023-03-28 14:38         ` Jeff Layton
  0 siblings, 1 reply; 26+ messages in thread
From: Olga Kornievskaia @ 2023-03-28 14:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever III, kernel-tls-handshake

On Tue, Mar 28, 2023 at 10:14 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2023-03-28 at 13:55 +0000, Chuck Lever III wrote:
> >
> > > On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > >
> > >
> > >
> > > > On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > Hi Chuck!
> > > >
> > > > I have started the packaging work for Fedora for ktls-utils:
> > > >
> > > >   https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> > > >
> > > > I also built packages for this in copr:
> > > >
> > > >   https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> > > >
> > > > ...and built some interim nfs-utils packages with the requisite exportfs
> > > > patches:
> > > >
> > > >   https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> > >
> > > Note that the nfs-utils changes aren't necessary to support
> > > the kernel server in "opportunistic" mode -- the server will
> > > use RPC-with-TLS if a client requests it, but otherwise does
> > > not restrict access.
> > >
> > > Client side also has no nfs-utils requirements at this time,
> > > since the new mount options are handled by the kernel.
> >
> > In case I wasn't clear:
> >
> > This was meant as a suggestion. If you want to simplify your
> > test set-up a bit, the nfs-utils piece isn't needed at this
> > point. But feel free to include it if you like!
> >
>
> Understood. I needed to build it for the server side anyway, so I
> figured I might as well. Eventually I'd like to set up a Fedora COPR
> repo that has all of the packages we need to test this, but I need to
> sort through the certificate handling here first.
>
> Are there docs on how to administer gnutls? For instance, I guess I'll
> want to set up my own CA and issue client and server certs. How do I
> make gnutls trust a new CA?

Hi Jeff,

To get self-signed certificates to work you need to (on the client's
machine) copy your server's cert.pem file into
/etc/pki/ca-trust/source/anchors and then run the “update-ca-trust
extract”.

> --
> Jeff Layton <jlayton@kernel.org>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:04   ` Chuck Lever III
  2023-03-28 14:23     ` Benjamin Coddington
@ 2023-03-28 14:29     ` Jeff Layton
  2023-03-28 14:39       ` Olga Kornievskaia
  2023-03-28 14:41       ` Chuck Lever III
  1 sibling, 2 replies; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 14:29 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: kernel-tls-handshake

On Tue, 2023-03-28 at 14:04 +0000, Chuck Lever III wrote:
> 
> > On Mar 28, 2023, at 8:55 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > I wonder...should we have the ktls-utils package install a self-signed cert by default?
> 
> So this idea is intriguing, I had some similar thoughts.
> 
> I'm not sure what the security implications of all this are.
> We'd first need to look at other certificate-based packages
> in Fedora to see if they offer a similar quick-setup. The
> cert would have to be created at install time.
> 
> 

I think when apache is installed, a self-signed cert is created. You
don't have to use it, but it's what gets initially installed.

> > I created a self-signed
> > cert and tried to use it, but the client rejects it with this:
> > 
> >    Mar 28 09:01:20 nfsclnt tlshd[1092]: Certificate signer not found.
> > 
> > Is there a way to make it not try to validate the cert chain?
> 
> Olga also found that self-signed server certs are not
> working as we'd like. tlshd had a mechanism to force the
> clients not to check the signer, but it was removed
> because it was deemed insecure.
> 
> I'd like to find a way to make self-signed work seamlessly.
> 

Ditto. A lot of people are going to want to use TLS opportunistically
without deploying their own CA and issuing "real" certificates.

It's true that it is less secure than having full chain-of-trust, but
this seems like a case of "perfect being the enemy of good". If we don't
allow for self-signed certificates, then we've created a rather large
hurdle for anyone who wants to deploy this.

One thing we could do is reinstate the tlshd option, but still allow it
to check the signature. Then it could log something if that check fails
but still allow the connection.

We should of course document why using that option is not ideal, but
ripping it out entirely seems rather draconian. That's just going to
drive people to not use TLS at all because of the hassle factor.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:25       ` Olga Kornievskaia
@ 2023-03-28 14:38         ` Jeff Layton
  2023-03-28 14:44           ` Olga Kornievskaia
  2023-03-28 15:48           ` Jeff Layton
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 14:38 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Chuck Lever III, kernel-tls-handshake

On Tue, 2023-03-28 at 10:25 -0400, Olga Kornievskaia wrote:
> On Tue, Mar 28, 2023 at 10:14 AM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Tue, 2023-03-28 at 13:55 +0000, Chuck Lever III wrote:
> > > 
> > > > On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > 
> > > > > Hi Chuck!
> > > > > 
> > > > > I have started the packaging work for Fedora for ktls-utils:
> > > > > 
> > > > >   https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> > > > > 
> > > > > I also built packages for this in copr:
> > > > > 
> > > > >   https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> > > > > 
> > > > > ...and built some interim nfs-utils packages with the requisite exportfs
> > > > > patches:
> > > > > 
> > > > >   https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> > > > 
> > > > Note that the nfs-utils changes aren't necessary to support
> > > > the kernel server in "opportunistic" mode -- the server will
> > > > use RPC-with-TLS if a client requests it, but otherwise does
> > > > not restrict access.
> > > > 
> > > > Client side also has no nfs-utils requirements at this time,
> > > > since the new mount options are handled by the kernel.
> > > 
> > > In case I wasn't clear:
> > > 
> > > This was meant as a suggestion. If you want to simplify your
> > > test set-up a bit, the nfs-utils piece isn't needed at this
> > > point. But feel free to include it if you like!
> > > 
> > 
> > Understood. I needed to build it for the server side anyway, so I
> > figured I might as well. Eventually I'd like to set up a Fedora COPR
> > repo that has all of the packages we need to test this, but I need to
> > sort through the certificate handling here first.
> > 
> > Are there docs on how to administer gnutls? For instance, I guess I'll
> > want to set up my own CA and issue client and server certs. How do I
> > make gnutls trust a new CA?
> 
> Hi Jeff,
> 
> To get self-signed certificates to work you need to (on the client's
> machine) copy your server's cert.pem file into
> /etc/pki/ca-trust/source/anchors and then run the “update-ca-trust
> extract”.
> 
> 

Many thanks, Olga! That got me further:

    Mar 28 10:35:05 nfsclnt tlshd[1498]: Handshake with nfsd.poochiereds.net (192.168.1.140) was successful

The mount still isn't working yet, but I think I'm getting closer. I'll
keep poking at it.

Thanks!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:29     ` Jeff Layton
@ 2023-03-28 14:39       ` Olga Kornievskaia
  2023-03-28 14:45         ` Chuck Lever III
  2023-03-28 14:41       ` Chuck Lever III
  1 sibling, 1 reply; 26+ messages in thread
From: Olga Kornievskaia @ 2023-03-28 14:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever III, kernel-tls-handshake

On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2023-03-28 at 14:04 +0000, Chuck Lever III wrote:
> >
> > > On Mar 28, 2023, at 8:55 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > I wonder...should we have the ktls-utils package install a self-signed cert by default?
> >
> > So this idea is intriguing, I had some similar thoughts.
> >
> > I'm not sure what the security implications of all this are.
> > We'd first need to look at other certificate-based packages
> > in Fedora to see if they offer a similar quick-setup. The
> > cert would have to be created at install time.
> >
> >
>
> I think when apache is installed, a self-signed cert is created. You
> don't have to use it, but it's what gets initially installed.

The problem I see with the plan is that the client (which will be
installing ktlsd) needs the server's certificate (not its own). So
installing a self-signed certificate helps with having one but is far
from having a no hassle install. I think having clear steps about how
to get server's cert installed into the client's trusted CA chain in
the man page would go a long way.

> > > I created a self-signed
> > > cert and tried to use it, but the client rejects it with this:
> > >
> > >    Mar 28 09:01:20 nfsclnt tlshd[1092]: Certificate signer not found.
> > >
> > > Is there a way to make it not try to validate the cert chain?
> >
> > Olga also found that self-signed server certs are not
> > working as we'd like. tlshd had a mechanism to force the
> > clients not to check the signer, but it was removed
> > because it was deemed insecure.
> >
> > I'd like to find a way to make self-signed work seamlessly.
> >
>
> Ditto. A lot of people are going to want to use TLS opportunistically
> without deploying their own CA and issuing "real" certificates.
>
> It's true that it is less secure than having full chain-of-trust, but
> this seems like a case of "perfect being the enemy of good". If we don't
> allow for self-signed certificates, then we've created a rather large
> hurdle for anyone who wants to deploy this.
>
> One thing we could do is reinstate the tlshd option, but still allow it
> to check the signature. Then it could log something if that check fails
> but still allow the connection.
>
> We should of course document why using that option is not ideal, but
> ripping it out entirely seems rather draconian. That's just going to
> drive people to not use TLS at all because of the hassle factor.

I would argue that "no verification" option should only be allowed in
some extreme cases. Like say having an option that explicitly says
it's running in a debug mode and say on the foreground only (-d -f
--noverify). Having such options might clearly state the intent is to
debug only and not run for any user usage.

I also don't see a real reason for "noverify" option except to remove
frustrations during the setup.

> --
> Jeff Layton <jlayton@kernel.org>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:29     ` Jeff Layton
  2023-03-28 14:39       ` Olga Kornievskaia
@ 2023-03-28 14:41       ` Chuck Lever III
  1 sibling, 0 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 14:41 UTC (permalink / raw)
  To: Jeff Layton; +Cc: kernel-tls-handshake



> On Mar 28, 2023, at 10:29 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Tue, 2023-03-28 at 14:04 +0000, Chuck Lever III wrote:
>> 
>>> On Mar 28, 2023, at 8:55 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> I wonder...should we have the ktls-utils package install a self-signed cert by default?
>> 
>> So this idea is intriguing, I had some similar thoughts.
>> 
>> I'm not sure what the security implications of all this are.
>> We'd first need to look at other certificate-based packages
>> in Fedora to see if they offer a similar quick-setup. The
>> cert would have to be created at install time.
> 
> I think when apache is installed, a self-signed cert is created. You
> don't have to use it, but it's what gets initially installed.

If apache does it, then it sounds OK to do.


>>> I created a self-signed
>>> cert and tried to use it, but the client rejects it with this:
>>> 
>>>   Mar 28 09:01:20 nfsclnt tlshd[1092]: Certificate signer not found.
>>> 
>>> Is there a way to make it not try to validate the cert chain?
>> 
>> Olga also found that self-signed server certs are not
>> working as we'd like. tlshd had a mechanism to force the
>> clients not to check the signer, but it was removed
>> because it was deemed insecure.
>> 
>> I'd like to find a way to make self-signed work seamlessly.
> 
> Ditto. A lot of people are going to want to use TLS opportunistically
> without deploying their own CA and issuing "real" certificates.

Yer preachin' to the choir, son.


> It's true that it is less secure than having full chain-of-trust, but
> this seems like a case of "perfect being the enemy of good". If we don't
> allow for self-signed certificates, then we've created a rather large
> hurdle for anyone who wants to deploy this.
> 
> One thing we could do is reinstate the tlshd option, but still allow it
> to check the signature. Then it could log something if that check fails
> but still allow the connection.
> 
> We should of course document why using that option is not ideal, but
> ripping it out entirely seems rather draconian. That's just going to
> drive people to not use TLS at all because of the hassle factor.

I'd prefer that no client-side administration is necessary to make
this work. Adding the server's self-signed cert on all clients
is not what I had in mind, as that is the kind of "key distribution
hassle" that RPC-with-TLS was intended to eliminate.

(But I'm glad that gets you closer to working).

--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:38         ` Jeff Layton
@ 2023-03-28 14:44           ` Olga Kornievskaia
  2023-03-28 14:47             ` Chuck Lever III
  2023-03-28 15:48           ` Jeff Layton
  1 sibling, 1 reply; 26+ messages in thread
From: Olga Kornievskaia @ 2023-03-28 14:44 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever III, kernel-tls-handshake

On Tue, Mar 28, 2023 at 10:38 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2023-03-28 at 10:25 -0400, Olga Kornievskaia wrote:
> > On Tue, Mar 28, 2023 at 10:14 AM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Tue, 2023-03-28 at 13:55 +0000, Chuck Lever III wrote:
> > > >
> > > > > On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > >
> > > > > > Hi Chuck!
> > > > > >
> > > > > > I have started the packaging work for Fedora for ktls-utils:
> > > > > >
> > > > > >   https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> > > > > >
> > > > > > I also built packages for this in copr:
> > > > > >
> > > > > >   https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> > > > > >
> > > > > > ...and built some interim nfs-utils packages with the requisite exportfs
> > > > > > patches:
> > > > > >
> > > > > >   https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> > > > >
> > > > > Note that the nfs-utils changes aren't necessary to support
> > > > > the kernel server in "opportunistic" mode -- the server will
> > > > > use RPC-with-TLS if a client requests it, but otherwise does
> > > > > not restrict access.
> > > > >
> > > > > Client side also has no nfs-utils requirements at this time,
> > > > > since the new mount options are handled by the kernel.
> > > >
> > > > In case I wasn't clear:
> > > >
> > > > This was meant as a suggestion. If you want to simplify your
> > > > test set-up a bit, the nfs-utils piece isn't needed at this
> > > > point. But feel free to include it if you like!
> > > >
> > >
> > > Understood. I needed to build it for the server side anyway, so I
> > > figured I might as well. Eventually I'd like to set up a Fedora COPR
> > > repo that has all of the packages we need to test this, but I need to
> > > sort through the certificate handling here first.
> > >
> > > Are there docs on how to administer gnutls? For instance, I guess I'll
> > > want to set up my own CA and issue client and server certs. How do I
> > > make gnutls trust a new CA?
> >
> > Hi Jeff,
> >
> > To get self-signed certificates to work you need to (on the client's
> > machine) copy your server's cert.pem file into
> > /etc/pki/ca-trust/source/anchors and then run the “update-ca-trust
> > extract”.
> >
> >
>
> Many thanks, Olga! That got me further:
>
>     Mar 28 10:35:05 nfsclnt tlshd[1498]: Handshake with nfsd.poochiereds.net (192.168.1.140) was successful
>
> The mount still isn't working yet, but I think I'm getting closer. I'll
> keep poking at it.

I went thru several iterations before I got that working. If you are
doing mutual authentication then the client's self-cert needs to be
added to the server's CA chain in the same manner.

My next stumble which Chuck helped me was that negotiated cipher was
ChaCha20Poly which I didn't have enabled in my kernel. So look that
you have CONFIG_CRYPTO_CHACHA20POLY1305  compiled in the kernel.

>
> Thanks!
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:39       ` Olga Kornievskaia
@ 2023-03-28 14:45         ` Chuck Lever III
  2023-03-28 14:50           ` Olga Kornievskaia
  2023-03-28 15:03           ` Jeff Layton
  0 siblings, 2 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 14:45 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Jeff Layton, kernel-tls-handshake



> On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
>> 
>> It's true that it is less secure than having full chain-of-trust, but
>> this seems like a case of "perfect being the enemy of good". If we don't
>> allow for self-signed certificates, then we've created a rather large
>> hurdle for anyone who wants to deploy this.
>> 
>> One thing we could do is reinstate the tlshd option, but still allow it
>> to check the signature. Then it could log something if that check fails
>> but still allow the connection.
>> 
>> We should of course document why using that option is not ideal, but
>> ripping it out entirely seems rather draconian. That's just going to
>> drive people to not use TLS at all because of the hassle factor.
> 
> I would argue that "no verification" option should only be allowed in
> some extreme cases. Like say having an option that explicitly says
> it's running in a debug mode and say on the foreground only (-d -f
> --noverify). Having such options might clearly state the intent is to
> debug only and not run for any user usage.
> 
> I also don't see a real reason for "noverify" option except to remove
> frustrations during the setup.

I might put it this way: we don't want to have customers installing
something on their clients whose out-of-the-shrinkwrap configuration
is less than secure. "no verification" is less than secure.

My preference would be to have some kind of way to get self-signed
certs working with no client-side configuration needed. If the
client mounts with "xprtsec=tls" it should work. Do we need to
plumb that into our handshake upcall and make "anonymous"
handshakes explicitly allow unrecognized signers?

--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:44           ` Olga Kornievskaia
@ 2023-03-28 14:47             ` Chuck Lever III
  0 siblings, 0 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 14:47 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Jeff Layton, kernel-tls-handshake



> On Mar 28, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> My next stumble which Chuck helped me was that negotiated cipher was
> ChaCha20Poly which I didn't have enabled in my kernel. So look that
> you have CONFIG_CRYPTO_CHACHA20POLY1305  compiled in the kernel.

Yeah, I think tlshd's ability to detect what is supported by the
local kernel is still not perfect. To that end I was thinking of
adding a configuration option to /etc/tlshd.conf to enable and
disable these algorithms.

Does anyone know if ChaCha and Poly1305 is going to be enabled
in present or future Fedora/OpenSuSE kernels?


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:45         ` Chuck Lever III
@ 2023-03-28 14:50           ` Olga Kornievskaia
  2023-03-28 15:06             ` Jeff Layton
  2023-03-28 15:03           ` Jeff Layton
  1 sibling, 1 reply; 26+ messages in thread
From: Olga Kornievskaia @ 2023-03-28 14:50 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, kernel-tls-handshake

On Tue, Mar 28, 2023 at 10:45 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> >>
> >> It's true that it is less secure than having full chain-of-trust, but
> >> this seems like a case of "perfect being the enemy of good". If we don't
> >> allow for self-signed certificates, then we've created a rather large
> >> hurdle for anyone who wants to deploy this.
> >>
> >> One thing we could do is reinstate the tlshd option, but still allow it
> >> to check the signature. Then it could log something if that check fails
> >> but still allow the connection.
> >>
> >> We should of course document why using that option is not ideal, but
> >> ripping it out entirely seems rather draconian. That's just going to
> >> drive people to not use TLS at all because of the hassle factor.
> >
> > I would argue that "no verification" option should only be allowed in
> > some extreme cases. Like say having an option that explicitly says
> > it's running in a debug mode and say on the foreground only (-d -f
> > --noverify). Having such options might clearly state the intent is to
> > debug only and not run for any user usage.
> >
> > I also don't see a real reason for "noverify" option except to remove
> > frustrations during the setup.
>
> I might put it this way: we don't want to have customers installing
> something on their clients whose out-of-the-shrinkwrap configuration
> is less than secure. "no verification" is less than secure.
>
> My preference would be to have some kind of way to get self-signed
> certs working with no client-side configuration needed. If the
> client mounts with "xprtsec=tls" it should work. Do we need to
> plumb that into our handshake upcall and make "anonymous"
> handshakes explicitly allow unrecognized signers?

My vote is not allow for insecure installs (ever).

Perhaps ktlsd install on the client can prompt the user asking for
location of either server's self-signed cert or server's CA and this
way it would have everything that's needed before using it?

>
> --
> Chuck Lever
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:45         ` Chuck Lever III
  2023-03-28 14:50           ` Olga Kornievskaia
@ 2023-03-28 15:03           ` Jeff Layton
  2023-03-28 15:05             ` Chuck Lever III
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 15:03 UTC (permalink / raw)
  To: Chuck Lever III, Olga Kornievskaia; +Cc: kernel-tls-handshake

On Tue, 2023-03-28 at 14:45 +0000, Chuck Lever III wrote:
> 
> > On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> > 
> > On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > It's true that it is less secure than having full chain-of-trust, but
> > > this seems like a case of "perfect being the enemy of good". If we don't
> > > allow for self-signed certificates, then we've created a rather large
> > > hurdle for anyone who wants to deploy this.
> > > 
> > > One thing we could do is reinstate the tlshd option, but still allow it
> > > to check the signature. Then it could log something if that check fails
> > > but still allow the connection.
> > > 
> > > We should of course document why using that option is not ideal, but
> > > ripping it out entirely seems rather draconian. That's just going to
> > > drive people to not use TLS at all because of the hassle factor.
> > 
> > I would argue that "no verification" option should only be allowed in
> > some extreme cases. Like say having an option that explicitly says
> > it's running in a debug mode and say on the foreground only (-d -f
> > --noverify). Having such options might clearly state the intent is to
> > debug only and not run for any user usage.
> > 
> > I also don't see a real reason for "noverify" option except to remove
> > frustrations during the setup.
> 
> I might put it this way: we don't want to have customers installing
> something on their clients whose out-of-the-shrinkwrap configuration
> is less than secure. "no verification" is less than secure.
> 
> My preference would be to have some kind of way to get self-signed
> certs working with no client-side configuration needed. If the
> client mounts with "xprtsec=tls" it should work. Do we need to
> plumb that into our handshake upcall and make "anonymous"
> handshakes explicitly allow unrecognized signers?
> 

Since the client is the side that's rejecting things, having a mount
option that allows you to relax that check seems like the right
approach.

How about a new xprtsec= option? Maybe "xprtsec=nvtls" (no verify TLS)?
That would allow things to work out of the box, but still leave
xprtsec=tls as the more secure method.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 15:03           ` Jeff Layton
@ 2023-03-28 15:05             ` Chuck Lever III
  2023-03-28 15:15               ` Jeff Layton
  2023-03-28 15:19               ` Olga Kornievskaia
  0 siblings, 2 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 15:05 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Olga Kornievskaia, kernel-tls-handshake



> On Mar 28, 2023, at 11:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Tue, 2023-03-28 at 14:45 +0000, Chuck Lever III wrote:
>> 
>>> On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> 
>>> On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
>>>> 
>>>> It's true that it is less secure than having full chain-of-trust, but
>>>> this seems like a case of "perfect being the enemy of good". If we don't
>>>> allow for self-signed certificates, then we've created a rather large
>>>> hurdle for anyone who wants to deploy this.
>>>> 
>>>> One thing we could do is reinstate the tlshd option, but still allow it
>>>> to check the signature. Then it could log something if that check fails
>>>> but still allow the connection.
>>>> 
>>>> We should of course document why using that option is not ideal, but
>>>> ripping it out entirely seems rather draconian. That's just going to
>>>> drive people to not use TLS at all because of the hassle factor.
>>> 
>>> I would argue that "no verification" option should only be allowed in
>>> some extreme cases. Like say having an option that explicitly says
>>> it's running in a debug mode and say on the foreground only (-d -f
>>> --noverify). Having such options might clearly state the intent is to
>>> debug only and not run for any user usage.
>>> 
>>> I also don't see a real reason for "noverify" option except to remove
>>> frustrations during the setup.
>> 
>> I might put it this way: we don't want to have customers installing
>> something on their clients whose out-of-the-shrinkwrap configuration
>> is less than secure. "no verification" is less than secure.
>> 
>> My preference would be to have some kind of way to get self-signed
>> certs working with no client-side configuration needed. If the
>> client mounts with "xprtsec=tls" it should work. Do we need to
>> plumb that into our handshake upcall and make "anonymous"
>> handshakes explicitly allow unrecognized signers?
>> 
> 
> Since the client is the side that's rejecting things, having a mount
> option that allows you to relax that check seems like the right
> approach.
> 
> How about a new xprtsec= option? Maybe "xprtsec=nvtls" (no verify TLS)?
> That would allow things to work out of the box, but still leave
> xprtsec=tls as the more secure method.

Nah. xprtsec=tls is supposed to be less secure: no authentication,
just encryption. The secure method is xprtsec=mtls.

IMO xprtsec=tls needs to skip the signer check. I think I can make
tlshd do that.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:50           ` Olga Kornievskaia
@ 2023-03-28 15:06             ` Jeff Layton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 15:06 UTC (permalink / raw)
  To: Olga Kornievskaia, Chuck Lever III; +Cc: kernel-tls-handshake

On Tue, 2023-03-28 at 10:50 -0400, Olga Kornievskaia wrote:
> On Tue, Mar 28, 2023 at 10:45 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> > 
> > 
> > 
> > > On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> > > 
> > > On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > It's true that it is less secure than having full chain-of-trust, but
> > > > this seems like a case of "perfect being the enemy of good". If we don't
> > > > allow for self-signed certificates, then we've created a rather large
> > > > hurdle for anyone who wants to deploy this.
> > > > 
> > > > One thing we could do is reinstate the tlshd option, but still allow it
> > > > to check the signature. Then it could log something if that check fails
> > > > but still allow the connection.
> > > > 
> > > > We should of course document why using that option is not ideal, but
> > > > ripping it out entirely seems rather draconian. That's just going to
> > > > drive people to not use TLS at all because of the hassle factor.
> > > 
> > > I would argue that "no verification" option should only be allowed in
> > > some extreme cases. Like say having an option that explicitly says
> > > it's running in a debug mode and say on the foreground only (-d -f
> > > --noverify). Having such options might clearly state the intent is to
> > > debug only and not run for any user usage.
> > > 
> > > I also don't see a real reason for "noverify" option except to remove
> > > frustrations during the setup.
> > 
> > I might put it this way: we don't want to have customers installing
> > something on their clients whose out-of-the-shrinkwrap configuration
> > is less than secure. "no verification" is less than secure.
> > 
> > My preference would be to have some kind of way to get self-signed
> > certs working with no client-side configuration needed. If the
> > client mounts with "xprtsec=tls" it should work. Do we need to
> > plumb that into our handshake upcall and make "anonymous"
> > handshakes explicitly allow unrecognized signers?
> 
> My vote is not allow for insecure installs (ever).
> 


Is it really better to force people into plaintext connections? I very
much disagree here. Raise your hand if you've never used cURL with
"--insecure" or told Mozilla to accept a bogus cert.

> Perhaps ktlsd install on the client can prompt the user asking for
> location of either server's self-signed cert or server's CA and this
> way it would have everything that's needed before using it?
> 
> 

NAK. Interactive package installs are no bueno.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 15:05             ` Chuck Lever III
@ 2023-03-28 15:15               ` Jeff Layton
  2023-03-28 15:19               ` Olga Kornievskaia
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 15:15 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Olga Kornievskaia, kernel-tls-handshake

On Tue, 2023-03-28 at 15:05 +0000, Chuck Lever III wrote:
> 
> > On Mar 28, 2023, at 11:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Tue, 2023-03-28 at 14:45 +0000, Chuck Lever III wrote:
> > > 
> > > > On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> > > > 
> > > > On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > 
> > > > > It's true that it is less secure than having full chain-of-trust, but
> > > > > this seems like a case of "perfect being the enemy of good". If we don't
> > > > > allow for self-signed certificates, then we've created a rather large
> > > > > hurdle for anyone who wants to deploy this.
> > > > > 
> > > > > One thing we could do is reinstate the tlshd option, but still allow it
> > > > > to check the signature. Then it could log something if that check fails
> > > > > but still allow the connection.
> > > > > 
> > > > > We should of course document why using that option is not ideal, but
> > > > > ripping it out entirely seems rather draconian. That's just going to
> > > > > drive people to not use TLS at all because of the hassle factor.
> > > > 
> > > > I would argue that "no verification" option should only be allowed in
> > > > some extreme cases. Like say having an option that explicitly says
> > > > it's running in a debug mode and say on the foreground only (-d -f
> > > > --noverify). Having such options might clearly state the intent is to
> > > > debug only and not run for any user usage.
> > > > 
> > > > I also don't see a real reason for "noverify" option except to remove
> > > > frustrations during the setup.
> > > 
> > > I might put it this way: we don't want to have customers installing
> > > something on their clients whose out-of-the-shrinkwrap configuration
> > > is less than secure. "no verification" is less than secure.
> > > 
> > > My preference would be to have some kind of way to get self-signed
> > > certs working with no client-side configuration needed. If the
> > > client mounts with "xprtsec=tls" it should work. Do we need to
> > > plumb that into our handshake upcall and make "anonymous"
> > > handshakes explicitly allow unrecognized signers?
> > > 
> > 
> > Since the client is the side that's rejecting things, having a mount
> > option that allows you to relax that check seems like the right
> > approach.
> > 
> > How about a new xprtsec= option? Maybe "xprtsec=nvtls" (no verify TLS)?
> > That would allow things to work out of the box, but still leave
> > xprtsec=tls as the more secure method.
> 
> Nah. xprtsec=tls is supposed to be less secure: no authentication,
> just encryption. The secure method is xprtsec=mtls.
> 
> IMO xprtsec=tls needs to skip the signer check. I think I can make
> tlshd do that.
> 
> 

It's your call, but allowing the client to check the certificate without
requiring the server to do so seems like it'd be a good thing to allow.
Maybe there should be a new option for that instead then?

Either way, I'm not sure skipping the signer check altogether is the
best thing. It'd probably be good to check it, and just not fail the
connection if it fails. Have it log a message on each handshake instead
so that the admin is aware that the endpoint is not verified.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 15:05             ` Chuck Lever III
  2023-03-28 15:15               ` Jeff Layton
@ 2023-03-28 15:19               ` Olga Kornievskaia
  2023-03-28 15:30                 ` Olga Kornievskaia
  1 sibling, 1 reply; 26+ messages in thread
From: Olga Kornievskaia @ 2023-03-28 15:19 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, kernel-tls-handshake

On Tue, Mar 28, 2023 at 11:06 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Mar 28, 2023, at 11:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Tue, 2023-03-28 at 14:45 +0000, Chuck Lever III wrote:
> >>
> >>> On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> >>>>
> >>>> It's true that it is less secure than having full chain-of-trust, but
> >>>> this seems like a case of "perfect being the enemy of good". If we don't
> >>>> allow for self-signed certificates, then we've created a rather large
> >>>> hurdle for anyone who wants to deploy this.
> >>>>
> >>>> One thing we could do is reinstate the tlshd option, but still allow it
> >>>> to check the signature. Then it could log something if that check fails
> >>>> but still allow the connection.
> >>>>
> >>>> We should of course document why using that option is not ideal, but
> >>>> ripping it out entirely seems rather draconian. That's just going to
> >>>> drive people to not use TLS at all because of the hassle factor.
> >>>
> >>> I would argue that "no verification" option should only be allowed in
> >>> some extreme cases. Like say having an option that explicitly says
> >>> it's running in a debug mode and say on the foreground only (-d -f
> >>> --noverify). Having such options might clearly state the intent is to
> >>> debug only and not run for any user usage.
> >>>
> >>> I also don't see a real reason for "noverify" option except to remove
> >>> frustrations during the setup.
> >>
> >> I might put it this way: we don't want to have customers installing
> >> something on their clients whose out-of-the-shrinkwrap configuration
> >> is less than secure. "no verification" is less than secure.
> >>
> >> My preference would be to have some kind of way to get self-signed
> >> certs working with no client-side configuration needed. If the
> >> client mounts with "xprtsec=tls" it should work. Do we need to
> >> plumb that into our handshake upcall and make "anonymous"
> >> handshakes explicitly allow unrecognized signers?
> >>
> >
> > Since the client is the side that's rejecting things, having a mount
> > option that allows you to relax that check seems like the right
> > approach.
> >
> > How about a new xprtsec= option? Maybe "xprtsec=nvtls" (no verify TLS)?
> > That would allow things to work out of the box, but still leave
> > xprtsec=tls as the more secure method.
>
> Nah. xprtsec=tls is supposed to be less secure: no authentication,
> just encryption. The secure method is xprtsec=mtls.

What's the point of "no authentication". I thought the server is
always authenticated.

> IMO xprtsec=tls needs to skip the signer check. I think I can make
> tlshd do that.

I guess in that case, I (grudgingly) agree with something like
xprtsec=anonymous/nvtls".

>
>
> --
> Chuck Lever
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 15:19               ` Olga Kornievskaia
@ 2023-03-28 15:30                 ` Olga Kornievskaia
  2023-03-28 15:48                   ` Chuck Lever III
  0 siblings, 1 reply; 26+ messages in thread
From: Olga Kornievskaia @ 2023-03-28 15:30 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, kernel-tls-handshake

On Tue, Mar 28, 2023 at 11:19 AM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Tue, Mar 28, 2023 at 11:06 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> >
> >
> > > On Mar 28, 2023, at 11:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Tue, 2023-03-28 at 14:45 +0000, Chuck Lever III wrote:
> > >>
> > >>> On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> > >>>
> > >>> On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> > >>>>
> > >>>> It's true that it is less secure than having full chain-of-trust, but
> > >>>> this seems like a case of "perfect being the enemy of good". If we don't
> > >>>> allow for self-signed certificates, then we've created a rather large
> > >>>> hurdle for anyone who wants to deploy this.
> > >>>>
> > >>>> One thing we could do is reinstate the tlshd option, but still allow it
> > >>>> to check the signature. Then it could log something if that check fails
> > >>>> but still allow the connection.
> > >>>>
> > >>>> We should of course document why using that option is not ideal, but
> > >>>> ripping it out entirely seems rather draconian. That's just going to
> > >>>> drive people to not use TLS at all because of the hassle factor.
> > >>>
> > >>> I would argue that "no verification" option should only be allowed in
> > >>> some extreme cases. Like say having an option that explicitly says
> > >>> it's running in a debug mode and say on the foreground only (-d -f
> > >>> --noverify). Having such options might clearly state the intent is to
> > >>> debug only and not run for any user usage.
> > >>>
> > >>> I also don't see a real reason for "noverify" option except to remove
> > >>> frustrations during the setup.
> > >>
> > >> I might put it this way: we don't want to have customers installing
> > >> something on their clients whose out-of-the-shrinkwrap configuration
> > >> is less than secure. "no verification" is less than secure.
> > >>
> > >> My preference would be to have some kind of way to get self-signed
> > >> certs working with no client-side configuration needed. If the
> > >> client mounts with "xprtsec=tls" it should work. Do we need to
> > >> plumb that into our handshake upcall and make "anonymous"
> > >> handshakes explicitly allow unrecognized signers?
> > >>
> > >
> > > Since the client is the side that's rejecting things, having a mount
> > > option that allows you to relax that check seems like the right
> > > approach.
> > >
> > > How about a new xprtsec= option? Maybe "xprtsec=nvtls" (no verify TLS)?
> > > That would allow things to work out of the box, but still leave
> > > xprtsec=tls as the more secure method.
> >
> > Nah. xprtsec=tls is supposed to be less secure: no authentication,
> > just encryption. The secure method is xprtsec=mtls.
>
> What's the point of "no authentication". I thought the server is
> always authenticated.

Sorry Ok we are discussing no authentication. But my point was "TLS"
in its know doesn't mean less secure and always does server side
authentication. In the early days of TLS, you could choose to do pure
Diffie hellman and that was "no authentication" but that's no longer
an option.

HTTPS explicitly prompts that user to do manual verification (ie when
it couldn't verify using existing CAs). It never allows for "no
verification" which we are discussing here.

> > IMO xprtsec=tls needs to skip the signer check. I think I can make
> > tlshd do that.
>
> I guess in that case, I (grudgingly) agree with something like
> xprtsec=anonymous/nvtls".
>
> >
> >
> > --
> > Chuck Lever
> >
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 14:38         ` Jeff Layton
  2023-03-28 14:44           ` Olga Kornievskaia
@ 2023-03-28 15:48           ` Jeff Layton
  2023-03-28 16:06             ` Chuck Lever III
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2023-03-28 15:48 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Chuck Lever III, kernel-tls-handshake

[-- Attachment #1: Type: text/plain, Size: 7799 bytes --]

On Tue, 2023-03-28 at 10:38 -0400, Jeff Layton wrote:
> On Tue, 2023-03-28 at 10:25 -0400, Olga Kornievskaia wrote:
> > On Tue, Mar 28, 2023 at 10:14 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > On Tue, 2023-03-28 at 13:55 +0000, Chuck Lever III wrote:
> > > > 
> > > > > On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > Hi Chuck!
> > > > > > 
> > > > > > I have started the packaging work for Fedora for ktls-utils:
> > > > > > 
> > > > > >   https://bugzilla.redhat.com/show_bug.cgi?id=2182151
> > > > > > 
> > > > > > I also built packages for this in copr:
> > > > > > 
> > > > > >   https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
> > > > > > 
> > > > > > ...and built some interim nfs-utils packages with the requisite exportfs
> > > > > > patches:
> > > > > > 
> > > > > >   https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
> > > > > 
> > > > > Note that the nfs-utils changes aren't necessary to support
> > > > > the kernel server in "opportunistic" mode -- the server will
> > > > > use RPC-with-TLS if a client requests it, but otherwise does
> > > > > not restrict access.
> > > > > 
> > > > > Client side also has no nfs-utils requirements at this time,
> > > > > since the new mount options are handled by the kernel.
> > > > 
> > > > In case I wasn't clear:
> > > > 
> > > > This was meant as a suggestion. If you want to simplify your
> > > > test set-up a bit, the nfs-utils piece isn't needed at this
> > > > point. But feel free to include it if you like!
> > > > 
> > > 
> > > Understood. I needed to build it for the server side anyway, so I
> > > figured I might as well. Eventually I'd like to set up a Fedora COPR
> > > repo that has all of the packages we need to test this, but I need to
> > > sort through the certificate handling here first.
> > > 
> > > Are there docs on how to administer gnutls? For instance, I guess I'll
> > > want to set up my own CA and issue client and server certs. How do I
> > > make gnutls trust a new CA?
> > 
> > Hi Jeff,
> > 
> > To get self-signed certificates to work you need to (on the client's
> > machine) copy your server's cert.pem file into
> > /etc/pki/ca-trust/source/anchors and then run the “update-ca-trust
> > extract”.
> > 
> > 
> 
> Many thanks, Olga! That got me further:
> 
>     Mar 28 10:35:05 nfsclnt tlshd[1498]: Handshake with nfsd.poochiereds.net (192.168.1.140) was successful
> 
> The mount still isn't working yet, but I think I'm getting closer. I'll
> keep poking at it.
> 

OK! I cranked up the debugging. Here's the kernel tracepoints during
this time:

          <idle>-0       [007] ..s2.  3657.494946: svc_xprt_enqueue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL pid=1051
            nfsd-1051    [005] .....  3657.494980: svc_xprt_dequeue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL wakeup-us=45
            nfsd-1051    [005] .....  3657.495071: svcsock_new_socket: type=STREAM family=AF_INET
            nfsd-1051    [005] .....  3657.495085: svc_xprt_enqueue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL pid=1050
            nfsd-1051    [005] .....  3657.495086: svc_xprt_accept: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL protocol=tcp service=nfsd
            nfsd-1051    [005] .....  3657.495092: svc_xprt_enqueue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL pid=1049
            nfsd-1051    [005] .....  3657.495095: svc_xprt_dequeue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL wakeup-us=158
            nfsd-1051    [005] .....  3657.495101: svcsock_marker: addr=192.168.1.136:818 length=40 (last)
            nfsd-1051    [005] .....  3657.495104: svcsock_tcp_recv: addr=192.168.1.136:818 result=40 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL
            nfsd-1051    [005] .....  3657.495111: svc_xprt_enqueue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL pid=1048
            nfsd-1051    [005] .....  3657.495112: svc_xdr_recvfrom: xid=0xd1e2303e head=[000000005f040892,40] page=0 tail=[0000000000000000,0] len=40
            nfsd-1050    [007] .....  3657.495121: svc_xprt_dequeue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL wakeup-us=44
            nfsd-1051    [005] .....  3657.495125: svc_tls_start: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL
            nfsd-1051    [005] .....  3657.495128: svc_process: addr=192.168.1.136:818 xid=0xd1e2303e service=nfsd vers=4 proc=NULL
            nfsd-1051    [005] .....  3657.495132: svc_xdr_sendto: xid=0xd1e2303e head=[000000005ccd151e,32] page=0(0) tail=[0000000000000000,0] len=32
            nfsd-1051    [005] .....  3657.495133: svc_stats_latency: xid=0xd1e2303e server=192.168.1.140:2049 client=192.168.1.136:818 proc=NULL execute-us=21
            nfsd-1050    [007] .....  3657.495146: svcsock_accept_err: addr=listener service=nfsd status=-11
            nfsd-1048    [004] .....  3657.495147: svc_xprt_dequeue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE wakeup-us=42
            nfsd-1048    [004] .....  3657.495151: svc_tls_upcall: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
            nfsd-1051    [005] .....  3657.495163: svcsock_tcp_send: addr=192.168.1.136:818 result=36 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
            nfsd-1051    [005] .....  3657.495198: svc_send: xid=0xd1e2303e server=192.168.1.140:2049 client=192.168.1.136:818 status=36 flags=SECURE|USEDEFERRAL|SPLICE_OK|BUSY|DATA
          <idle>-0       [007] ..s2.  3657.651316: svcsock_data_ready: addr=192.168.1.136:818 result=0 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
          <idle>-0       [007] ..s2.  3657.655648: svcsock_data_ready: addr=192.168.1.136:818 result=0 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
          <idle>-0       [007] ..s2.  3657.669552: svcsock_data_ready: addr=192.168.1.136:818 result=0 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
            nfsd-1048    [004] .....  3662.666590: svc_tls_timed_out: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE              <<<<<<<<<<< TIMEOUT HERE
            nfsd-1048    [004] .....  3662.666602: svc_xprt_enqueue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|CACHE_AUTH|CONG_CTRL pid=1051
            nfsd-1048    [004] .....  3662.666630: svc_xprt_dequeue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|CACHE_AUTH|CONG_CTRL wakeup-us=5171655
            nfsd-1048    [004] .....  3662.666631: svc_xprt_detach: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|DEAD|CACHE_AUTH|CONG_CTRL
            nfsd-1048    [004] .....  3662.666689: svc_xprt_free: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|DEAD|CACHE_AUTH|CONG_CTRL

It looks like it timed out waiting for the downcall. I cranked up the
debug logging in tlshd at the same time and attached it to this. It
looks like it all worked, so I'm not sure why the kernel didn't see the
downcall.

Thoughts?
-- 
Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: tlshd.log.gz --]
[-- Type: application/gzip, Size: 4622 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 15:30                 ` Olga Kornievskaia
@ 2023-03-28 15:48                   ` Chuck Lever III
  0 siblings, 0 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 15:48 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Jeff Layton, kernel-tls-handshake



> On Mar 28, 2023, at 11:30 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> On Tue, Mar 28, 2023 at 11:19 AM Olga Kornievskaia <aglo@umich.edu> wrote:
>> 
>> On Tue, Mar 28, 2023 at 11:06 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Mar 28, 2023, at 11:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>>> 
>>>> On Tue, 2023-03-28 at 14:45 +0000, Chuck Lever III wrote:
>>>>> 
>>>>>> On Mar 28, 2023, at 10:39 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>> 
>>>>>> On Tue, Mar 28, 2023 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>> 
>>>>>>> It's true that it is less secure than having full chain-of-trust, but
>>>>>>> this seems like a case of "perfect being the enemy of good". If we don't
>>>>>>> allow for self-signed certificates, then we've created a rather large
>>>>>>> hurdle for anyone who wants to deploy this.
>>>>>>> 
>>>>>>> One thing we could do is reinstate the tlshd option, but still allow it
>>>>>>> to check the signature. Then it could log something if that check fails
>>>>>>> but still allow the connection.
>>>>>>> 
>>>>>>> We should of course document why using that option is not ideal, but
>>>>>>> ripping it out entirely seems rather draconian. That's just going to
>>>>>>> drive people to not use TLS at all because of the hassle factor.
>>>>>> 
>>>>>> I would argue that "no verification" option should only be allowed in
>>>>>> some extreme cases. Like say having an option that explicitly says
>>>>>> it's running in a debug mode and say on the foreground only (-d -f
>>>>>> --noverify). Having such options might clearly state the intent is to
>>>>>> debug only and not run for any user usage.
>>>>>> 
>>>>>> I also don't see a real reason for "noverify" option except to remove
>>>>>> frustrations during the setup.
>>>>> 
>>>>> I might put it this way: we don't want to have customers installing
>>>>> something on their clients whose out-of-the-shrinkwrap configuration
>>>>> is less than secure. "no verification" is less than secure.
>>>>> 
>>>>> My preference would be to have some kind of way to get self-signed
>>>>> certs working with no client-side configuration needed. If the
>>>>> client mounts with "xprtsec=tls" it should work. Do we need to
>>>>> plumb that into our handshake upcall and make "anonymous"
>>>>> handshakes explicitly allow unrecognized signers?
>>>>> 
>>>> 
>>>> Since the client is the side that's rejecting things, having a mount
>>>> option that allows you to relax that check seems like the right
>>>> approach.
>>>> 
>>>> How about a new xprtsec= option? Maybe "xprtsec=nvtls" (no verify TLS)?
>>>> That would allow things to work out of the box, but still leave
>>>> xprtsec=tls as the more secure method.
>>> 
>>> Nah. xprtsec=tls is supposed to be less secure: no authentication,
>>> just encryption. The secure method is xprtsec=mtls.
>> 
>> What's the point of "no authentication". I thought the server is
>> always authenticated.
> 
> Sorry Ok we are discussing no authentication. But my point was "TLS"
> in its know doesn't mean less secure and always does server side
> authentication. In the early days of TLS, you could choose to do pure
> Diffie hellman and that was "no authentication" but that's no longer
> an option.
> 
> HTTPS explicitly prompts that user to do manual verification (ie when
> it couldn't verify using existing CAs). It never allows for "no
> verification" which we are discussing here.

Today, our client always authenticates the server.

That means that for self-signed environments, the server's certificate
has to be distributed to all clients. That also means that automatically
adding a self-signed server cert when ktls-utils is installed is not
going to as helpful as we might want.

I really wanted to have a way to enable encryption while avoiding the
"client key distribution" problem, and to permit self-signed certs to
be used in this mode.

Some possible choices:

 - State that the way to avoid client key distribution is for the
   server administrator to acquire a certificate that is signed by
   a CA that is already known to clients. This is easy for us, and
   I suspect the security community would be agreeable only to this
   alternative.

 - Weaken the client's server authentication so that it does not
   fail the handshake if the server's certificate is self-signed
   (only for xprtsec=tls). Yes, tlshd would log the verification
   failure.

 - Add a third xprtsec= mode where no server verification is done.

 - Add a configuration option to /etc/tlshd.conf that weakens
   the anonymous policy so it does not verify the server.


>>> IMO xprtsec=tls needs to skip the signer check. I think I can make
>>> tlshd do that.
>> 
>> I guess in that case, I (grudgingly) agree with something like
>> xprtsec=anonymous/nvtls".

No snap decisions today. We don't quite have a consensus on this yet.

And, I think there is a reasonable workaround for the moment: if
the server cert is self-signed, just distribute it to clients.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: problems getting rpc over tls to work
  2023-03-28 15:48           ` Jeff Layton
@ 2023-03-28 16:06             ` Chuck Lever III
  0 siblings, 0 replies; 26+ messages in thread
From: Chuck Lever III @ 2023-03-28 16:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Olga Kornievskaia, kernel-tls-handshake



> On Mar 28, 2023, at 11:48 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Tue, 2023-03-28 at 10:38 -0400, Jeff Layton wrote:
>> On Tue, 2023-03-28 at 10:25 -0400, Olga Kornievskaia wrote:
>>> On Tue, Mar 28, 2023 at 10:14 AM Jeff Layton <jlayton@kernel.org> wrote:
>>>> 
>>>> On Tue, 2023-03-28 at 13:55 +0000, Chuck Lever III wrote:
>>>>> 
>>>>>> On Mar 28, 2023, at 9:29 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Mar 28, 2023, at 8:27 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>> 
>>>>>>> Hi Chuck!
>>>>>>> 
>>>>>>> I have started the packaging work for Fedora for ktls-utils:
>>>>>>> 
>>>>>>>  https://bugzilla.redhat.com/show_bug.cgi?id=2182151
>>>>>>> 
>>>>>>> I also built packages for this in copr:
>>>>>>> 
>>>>>>>  https://copr.fedorainfracloud.org/coprs/jlayton/ktls-utils/
>>>>>>> 
>>>>>>> ...and built some interim nfs-utils packages with the requisite exportfs
>>>>>>> patches:
>>>>>>> 
>>>>>>>  https://copr.fedorainfracloud.org/coprs/jlayton/nfs-utils/
>>>>>> 
>>>>>> Note that the nfs-utils changes aren't necessary to support
>>>>>> the kernel server in "opportunistic" mode -- the server will
>>>>>> use RPC-with-TLS if a client requests it, but otherwise does
>>>>>> not restrict access.
>>>>>> 
>>>>>> Client side also has no nfs-utils requirements at this time,
>>>>>> since the new mount options are handled by the kernel.
>>>>> 
>>>>> In case I wasn't clear:
>>>>> 
>>>>> This was meant as a suggestion. If you want to simplify your
>>>>> test set-up a bit, the nfs-utils piece isn't needed at this
>>>>> point. But feel free to include it if you like!
>>>>> 
>>>> 
>>>> Understood. I needed to build it for the server side anyway, so I
>>>> figured I might as well. Eventually I'd like to set up a Fedora COPR
>>>> repo that has all of the packages we need to test this, but I need to
>>>> sort through the certificate handling here first.
>>>> 
>>>> Are there docs on how to administer gnutls? For instance, I guess I'll
>>>> want to set up my own CA and issue client and server certs. How do I
>>>> make gnutls trust a new CA?
>>> 
>>> Hi Jeff,
>>> 
>>> To get self-signed certificates to work you need to (on the client's
>>> machine) copy your server's cert.pem file into
>>> /etc/pki/ca-trust/source/anchors and then run the “update-ca-trust
>>> extract”.
>>> 
>>> 
>> 
>> Many thanks, Olga! That got me further:
>> 
>>    Mar 28 10:35:05 nfsclnt tlshd[1498]: Handshake with nfsd.poochiereds.net (192.168.1.140) was successful
>> 
>> The mount still isn't working yet, but I think I'm getting closer. I'll
>> keep poking at it.
>> 
> 
> OK! I cranked up the debugging. Here's the kernel tracepoints during
> this time:
> 
>          <idle>-0       [007] ..s2.  3657.494946: svc_xprt_enqueue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL pid=1051
>            nfsd-1051    [005] .....  3657.494980: svc_xprt_dequeue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL wakeup-us=45
>            nfsd-1051    [005] .....  3657.495071: svcsock_new_socket: type=STREAM family=AF_INET
>            nfsd-1051    [005] .....  3657.495085: svc_xprt_enqueue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL pid=1050
>            nfsd-1051    [005] .....  3657.495086: svc_xprt_accept: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL protocol=tcp service=nfsd
>            nfsd-1051    [005] .....  3657.495092: svc_xprt_enqueue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL pid=1049
>            nfsd-1051    [005] .....  3657.495095: svc_xprt_dequeue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL wakeup-us=158
>            nfsd-1051    [005] .....  3657.495101: svcsock_marker: addr=192.168.1.136:818 length=40 (last)
>            nfsd-1051    [005] .....  3657.495104: svcsock_tcp_recv: addr=192.168.1.136:818 result=40 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL
>            nfsd-1051    [005] .....  3657.495111: svc_xprt_enqueue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL pid=1048
>            nfsd-1051    [005] .....  3657.495112: svc_xdr_recvfrom: xid=0xd1e2303e head=[000000005f040892,40] page=0 tail=[0000000000000000,0] len=40
>            nfsd-1050    [007] .....  3657.495121: svc_xprt_dequeue: server=0.0.0.0:2049 client=(einval) flags=BUSY|CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL wakeup-us=44
>            nfsd-1051    [005] .....  3657.495125: svc_tls_start: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL
>            nfsd-1051    [005] .....  3657.495128: svc_process: addr=192.168.1.136:818 xid=0xd1e2303e service=nfsd vers=4 proc=NULL
>            nfsd-1051    [005] .....  3657.495132: svc_xdr_sendto: xid=0xd1e2303e head=[000000005ccd151e,32] page=0(0) tail=[0000000000000000,0] len=32
>            nfsd-1051    [005] .....  3657.495133: svc_stats_latency: xid=0xd1e2303e server=192.168.1.140:2049 client=192.168.1.136:818 proc=NULL execute-us=21
>            nfsd-1050    [007] .....  3657.495146: svcsock_accept_err: addr=listener service=nfsd status=-11
>            nfsd-1048    [004] .....  3657.495147: svc_xprt_dequeue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE wakeup-us=42
>            nfsd-1048    [004] .....  3657.495151: svc_tls_upcall: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
>            nfsd-1051    [005] .....  3657.495163: svcsock_tcp_send: addr=192.168.1.136:818 result=36 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
>            nfsd-1051    [005] .....  3657.495198: svc_send: xid=0xd1e2303e server=192.168.1.140:2049 client=192.168.1.136:818 status=36 flags=SECURE|USEDEFERRAL|SPLICE_OK|BUSY|DATA
>          <idle>-0       [007] ..s2.  3657.651316: svcsock_data_ready: addr=192.168.1.136:818 result=0 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
>          <idle>-0       [007] ..s2.  3657.655648: svcsock_data_ready: addr=192.168.1.136:818 result=0 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
>          <idle>-0       [007] ..s2.  3657.669552: svcsock_data_ready: addr=192.168.1.136:818 result=0 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE
>            nfsd-1048    [004] .....  3662.666590: svc_tls_timed_out: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|DATA|TEMP|CACHE_AUTH|CONG_CTRL|HANDSHAKE              <<<<<<<<<<< TIMEOUT HERE
>            nfsd-1048    [004] .....  3662.666602: svc_xprt_enqueue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|CACHE_AUTH|CONG_CTRL pid=1051
>            nfsd-1048    [004] .....  3662.666630: svc_xprt_dequeue: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|CACHE_AUTH|CONG_CTRL wakeup-us=5171655
>            nfsd-1048    [004] .....  3662.666631: svc_xprt_detach: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|DEAD|CACHE_AUTH|CONG_CTRL
>            nfsd-1048    [004] .....  3662.666689: svc_xprt_free: server=192.168.1.140:2049 client=192.168.1.136:818 flags=BUSY|CLOSE|DATA|TEMP|DEAD|CACHE_AUTH|CONG_CTRL
> 
> It looks like it timed out waiting for the downcall. I cranked up the
> debug logging in tlshd at the same time and attached it to this. It
> looks like it all worked, so I'm not sure why the kernel didn't see the
> downcall.

Check that src/tlshd/netlink.h looks exactly like
include/uapi/linux/handshake.h

Otherwise, enable function tracing to confirm that the downcall
is either not getting done or is failing.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-03-28 16:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 12:27 problems getting rpc over tls to work Jeff Layton
2023-03-28 12:55 ` Jeff Layton
2023-03-28 14:04   ` Chuck Lever III
2023-03-28 14:23     ` Benjamin Coddington
2023-03-28 14:29     ` Jeff Layton
2023-03-28 14:39       ` Olga Kornievskaia
2023-03-28 14:45         ` Chuck Lever III
2023-03-28 14:50           ` Olga Kornievskaia
2023-03-28 15:06             ` Jeff Layton
2023-03-28 15:03           ` Jeff Layton
2023-03-28 15:05             ` Chuck Lever III
2023-03-28 15:15               ` Jeff Layton
2023-03-28 15:19               ` Olga Kornievskaia
2023-03-28 15:30                 ` Olga Kornievskaia
2023-03-28 15:48                   ` Chuck Lever III
2023-03-28 14:41       ` Chuck Lever III
2023-03-28 13:29 ` Chuck Lever III
2023-03-28 13:51   ` Jeff Layton
2023-03-28 13:55   ` Chuck Lever III
2023-03-28 14:13     ` Jeff Layton
2023-03-28 14:25       ` Olga Kornievskaia
2023-03-28 14:38         ` Jeff Layton
2023-03-28 14:44           ` Olga Kornievskaia
2023-03-28 14:47             ` Chuck Lever III
2023-03-28 15:48           ` Jeff Layton
2023-03-28 16:06             ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).