kernel-tls-handshake.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: Some issues
       [not found] <DM6PR06MB40747889A1DA72122394C26E93BC9@DM6PR06MB4074.namprd06.prod.outlook.com>
@ 2023-03-16 20:21 ` Chuck Lever III
  2023-03-16 20:37   ` Chuck Lever III
  0 siblings, 1 reply; 2+ messages in thread
From: Chuck Lever III @ 2023-03-16 20:21 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: kernel-tls-handshake



> On Mar 16, 2023, at 4:13 PM, Kornievskaia, Olga <Olga.Kornievskaia@netapp.com> wrote:
> 
> Hi folks,
>  
> I’m seeing issues with  doing multiple handshake calls. I think the easiest way to hit this is to do:
>  
> [aglo@unknown000c29c5d653 ~]$ sudo mount -o vers=4.2,sec=sys,xprtsec=tls,nconnect=16 192.168.1.106:/nfsshare /mnt
> mount.nfs: /mnt is busy or already mounted or sharecache fail
>  
> However, nconnect isn’t really needed. I’ve ran into this when I would do mount/umount and repeat.
>  
> What I seem to notice is that on a repeated mount, the client would get the handshake upcall and send the client hello and then there would be a human noticeable pause (network would have the packet sent to server) but on the server side I would see any output from the tlshd, as if the upcall didn’t happen.

I fixed this one yesterday.

The server side has to set XPT_DATA after a handshake completes.
Otherwise, if the client's RPC arrived before tlshd calls DONE,
it will be missed and the server just sits there staring at you.


> It would seem that the client then times out the request. It would do another TCP connection and another NULL call and handshake initiation. And that might or might not succeed. But the socket that got timed out wouldn’t be closed until it goes into keep alive state and then eventually gets fin-ed. At the transport layer I wouldn’t see an extra xprt that doesn’t go away so I’m not sure how that socket stays around.  Given that it seems like it might be the server that’s “missing” upcalls so problem might not be on the client side but I’m not sure.
>  
> Any thoughts, what might be happening?

Yes, I just fixed this bug an hour ago. :-)

The problem is handshake_sk_destruct is always called with
a struct sock whose sk_socket field is NULL. That NULL is
passed to handshake_req_hash_lookup() who then cannot find
any matching handshake_req, and so it leaves the actual
matching handshake_req in the hash table.

If the struct socket is freed and then another struct socket
is allocated at the same memory address, that matches the
handshake_req that was orphaned in the hash table, and
handshake_req_submit will claim that socket already has a
pending handshake. -EBUSY.

I will push the fix in a few minutes, I'm just running a
few tests.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Some issues
  2023-03-16 20:21 ` Some issues Chuck Lever III
@ 2023-03-16 20:37   ` Chuck Lever III
  0 siblings, 0 replies; 2+ messages in thread
From: Chuck Lever III @ 2023-03-16 20:37 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: kernel-tls-handshake



> On Mar 16, 2023, at 4:21 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Mar 16, 2023, at 4:13 PM, Kornievskaia, Olga <Olga.Kornievskaia@netapp.com> wrote:
>> 
>> Hi folks,
>> 
>> I’m seeing issues with  doing multiple handshake calls. I think the easiest way to hit this is to do:
>> 
>> [aglo@unknown000c29c5d653 ~]$ sudo mount -o vers=4.2,sec=sys,xprtsec=tls,nconnect=16 192.168.1.106:/nfsshare /mnt
>> mount.nfs: /mnt is busy or already mounted or sharecache fail
>> 
>> However, nconnect isn’t really needed. I’ve ran into this when I would do mount/umount and repeat.
>> 
>> What I seem to notice is that on a repeated mount, the client would get the handshake upcall and send the client hello and then there would be a human noticeable pause (network would have the packet sent to server) but on the server side I would see any output from the tlshd, as if the upcall didn’t happen.
> 
> I fixed this one yesterday.
> 
> The server side has to set XPT_DATA after a handshake completes.
> Otherwise, if the client's RPC arrived before tlshd calls DONE,
> it will be missed and the server just sits there staring at you.

This is a server-side fix, btw.


>> It would seem that the client then times out the request. It would do another TCP connection and another NULL call and handshake initiation. And that might or might not succeed. But the socket that got timed out wouldn’t be closed until it goes into keep alive state and then eventually gets fin-ed. At the transport layer I wouldn’t see an extra xprt that doesn’t go away so I’m not sure how that socket stays around.  Given that it seems like it might be the server that’s “missing” upcalls so problem might not be on the client side but I’m not sure.
>> 
>> Any thoughts, what might be happening?
> 
> Yes, I just fixed this bug an hour ago. :-)
> 
> The problem is handshake_sk_destruct is always called with
> a struct sock whose sk_socket field is NULL. That NULL is
> passed to handshake_req_hash_lookup() who then cannot find
> any matching handshake_req, and so it leaves the actual
> matching handshake_req in the hash table.
> 
> If the struct socket is freed and then another struct socket
> is allocated at the same memory address, that matches the
> handshake_req that was orphaned in the hash table, and
> handshake_req_submit will claim that socket already has a
> pending handshake. -EBUSY.

This one is a client-side fix.


> I will push the fix in a few minutes, I'm just running a
> few tests.

It seems OK, so I pushed it out. Pick up
topic-rpc-with-tls-upcall and the new ktls-utils (which
shouldn't be strictly necessary, but there are a handful
of bug fixes there too).

I'm still waiting for Jakub's licensing changes, which he
said should be in net-next today.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-03-16 20:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <DM6PR06MB40747889A1DA72122394C26E93BC9@DM6PR06MB4074.namprd06.prod.outlook.com>
2023-03-16 20:21 ` Some issues Chuck Lever III
2023-03-16 20:37   ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).