kernel-tls-handshake.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: "kernel-tls-handshake@lists.linux.dev"
	<kernel-tls-handshake@lists.linux.dev>
Subject: Re: fput() considered harmful
Date: Wed, 8 Mar 2023 17:04:20 +0100	[thread overview]
Message-ID: <c519d883-5e08-0d28-dee8-89f1a7f197e3@suse.de> (raw)
In-Reply-To: <59EF30C0-05F8-458E-8BAF-36A9CC757D00@oracle.com>

On 3/8/23 16:40, Chuck Lever III wrote:
> 
> 
>> On Mar 8, 2023, at 8:43 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>
>>
>>> On Mar 8, 2023, at 8:24 AM, Hannes Reinecke <hare@suse.de> wrote:
>>>
>>> Hi Chuck,
>>>
>>> I'm playing around with v6 and (again) facing really nasty crashes:
>>>
>>> [ 1662.912887] ------------[ cut here ]------------
>>> [ 1662.913399] kernel BUG at fs/inode.c:1763!
>>> [ 1662.913822] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>> [ 1662.914336] CPU: 1 PID: 6494 Comm: tlshd Kdump: loaded Tainted: G       E      6.2.0-rc4-54-default+ #231
>>> [ 1662.915235] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>>> [ 1662.915932] RIP: 0010:iput+0x1d/0x20
>>> [ 1662.916275] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 0e f6 87 98 00 00 00 40
>>> 75 0a e9 18 fe ff ff e9 23 e3 7a 00 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44
>>> [ 1662.917921] RSP: 0018:ffffaeb600b1fe38 EFLAGS: 00010202
>>> [ 1662.918423] RAX: 0000000000000000 RBX: ffff953bcb9d9180 RCX: 0000000000000064
>>> [ 1662.919065] RDX: 0000000000000001 RSI: ffff953bcbb23c38 RDI: ffff953bcbb23b00
>>> [ 1662.919703] RBP: 0000000000000000 R08: 0000000100053304 R09: ffff953bcbb23b00
>>> [ 1662.920339] R10: ffff953bc5538b40 R11: 0000000000000003 R12: ffff953bcb9d91d8
>>> [ 1662.920978] R13: ffff953bfd15ade0 R14: ffff953bcb9d9180 R15: ffff953b83918cb8
>>> [ 1662.921617] FS:  0000000000000000(0000) GS:ffff953bfb900000(0000) knlGS:0000000000000000
>>> [ 1662.922376] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 1662.922907] CR2: 00007fdc89a091a0 CR3: 000000004ad8c000 CR4: 0000000000350ee0
>>> [ 1662.923568] Call Trace:
>>> [ 1662.923801]  <TASK>
>>> [ 1662.924006]  __dentry_kill+0xca/0x160
>>> [ 1662.924353]  __fput+0xdf/0x240
>>> [ 1662.924652]  task_work_run+0x67/0xa0
>>> [ 1662.924992]  do_exit+0x343/0xb50
>>> [ 1662.925302]  do_group_exit+0x2d/0x80
>>> [ 1662.925644]  __x64_sys_exit_group+0x14/0x20
>>> [ 1662.926039]  do_syscall_64+0x31/0x80
>>>
>>> Key problem seems to be userspace here, which (apparently) is calling close() on the fd which we've passed from the kernel.
>>> While this _normally_ is not a problem, in our case we have the problem
>>> that the filedescriptor is associated with a socket (ie struct socket).
>>> And that one is shared between kernel code and userland.
>>> And doesn't have any refcounting whatsoever; socket_release() removes the reference to the underlying file:
>>>
>>> 	if (!sock->file) {
>>> 		iput(SOCK_INODE(sock));
>>> 		return;
>>> 	}
>>> 	sock->file = NULL;
>>>
>>> ie if _another_ process (the kernel driver, say) is calling 'close()', too, it'll run into the
>>> !sock->file condition and crash in iput().
>>>
>>> I have been testing various things here, not calling fput, calling get_file() etc, but either
>>> hit a crash or kmemleak complaining about
>>> the file not being freed.
>>>
>>> Have you seen similar issues?
>>
>> I have, and they all vanished with the introduction of handshake_dup and haven't returned.
>>
>>
>>> And even if you haven't: relying on userspace _not_ to call 'close()'
>>> on the socket seems to be a quite dangerous concept.
>>
>> - The kernel "closes" that file descriptor anyway when the process exits. Is that not the
>> same as a user space close(2) ?
>>
>> - The endpoint was created via dup, so calling close on it should merely remove that endpoint
>> and not trigger a sock_release().
>>
>> AIUI the reference counting of the struct file * is what prevents user space from calling ->release.
>> So maybe NVMe needs another get_file() somewhere before it calls tls_server_hello_psk().
> 
> I added an explicit "close(sockfd);return;" in src/tlshd/handshake.c.
> I tried it once with the close /before/ doing the handshake, and once
> with the close /after/. I don't see any kernel issues on my server
> system when a client attempts to handshake with it.
> 
> Based on what I've read here, the missing get_file() theory seems
> plausible.
> 
Hmm. Let's see.
I must say the lifetime rules for 'struct file' (or 'struct socket') are 
still somewhat beyond me.
Currently I have to do a

sock_alloc_file()

_and_

get_file()

before calling tls_client_hello_psk() to avoid a crash.
One does wonder why; typically the refcount is initialized to '1',
so the extra 'get_file()' shouldn't be required.

But what do I know ...

I see to get you instructions for my testbed.

Cheers,

Hannes


  reply	other threads:[~2023-03-08 16:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08 13:24 fput() considered harmful Hannes Reinecke
2023-03-08 13:43 ` Chuck Lever III
2023-03-08 15:40   ` Chuck Lever III
2023-03-08 16:04     ` Hannes Reinecke [this message]
2023-03-08 16:51       ` Chuck Lever III
2023-03-08 17:33       ` Hannes Reinecke
2023-03-08 17:53         ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c519d883-5e08-0d28-dee8-89f1a7f197e3@suse.de \
    --to=hare@suse.de \
    --cc=chuck.lever@oracle.com \
    --cc=kernel-tls-handshake@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).