From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D4C3C43381 for ; Wed, 20 Feb 2019 09:47:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D8BF42147C for ; Wed, 20 Feb 2019 09:47:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="SLMKILbN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727083AbfBTJrn (ORCPT ); Wed, 20 Feb 2019 04:47:43 -0500 Received: from mail-lf1-f66.google.com ([209.85.167.66]:43501 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725816AbfBTJrn (ORCPT ); Wed, 20 Feb 2019 04:47:43 -0500 Received: by mail-lf1-f66.google.com with SMTP id j1so17067732lfb.10 for ; Wed, 20 Feb 2019 01:47:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version; bh=vtY2w3hrl9QJiIaNdNsih/pnXqVLv2n2culRHrZzcLE=; b=SLMKILbNU2pl5KFiEXIPIch0NqK3RZOUnh4RSoCXQTlyFhNw2uTkhwH6cgwkv78hk/ oK8MbT4mRh/z0aHuZzCBIjnqcJnAJpcMwcSUeZVMGS18W8cs/rE1XPJnyIxMa2MXFdBA G0uumH0rHcnchYGdSalUM1TQ08QF+z5RejKPs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version; bh=vtY2w3hrl9QJiIaNdNsih/pnXqVLv2n2culRHrZzcLE=; b=m9VMoDVKdll6qW2RPBLP/1upRzeyHRfQz2nPsvjpWmut5FC+cj8ZHAJG4cwBWi5A5+ A8cv9SbYEc3kFdZyIhmLubt3nenBwPMre3d4vEknON8qeIrUar8vUTKR28zas4hX6n8R eq8hR/SWPw5aFJcQrATguy7zFiLOTXvqREncYYTZFTWZVGpZ5xscgC0eBsiwKA9IZkfd Up8Cp/aPo2E7KN0IwAfHAN5hZqGMRnhfj/t4+m55YmN4RqJj13tg3m11JHuMsx4cJKRP i5vLuiq3VS6wkmdBhZceY2XkzrXqP+A5LxQeyPPYwQB2egsX0Vf8hy0uC2VtYVjkCBwt xgsQ== X-Gm-Message-State: AHQUAub/wNJ6jHxtYeSNLAPR7F4+V+QgLVm/v6iN6w+WeX0AOC0BASsL qKJNr+2iOXAC+h1vIe3iodGO5wDsqHY= X-Google-Smtp-Source: AHgI3IbqkuVa421HKkB3xju0zz6vPOdV6yL8dodegbdPvvPBPczFVFa+dMX5V5T+xrbVG6vLvOULNQ== X-Received: by 2002:a19:2d44:: with SMTP id t4mr8109353lft.90.1550656059719; Wed, 20 Feb 2019 01:47:39 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id p14sm5325898lfk.16.2019.02.20.01.47.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 20 Feb 2019 01:47:39 -0800 (PST) References: <20190211090949.18560-1-jakub@cloudflare.com> <5439765e-1288-c379-0ead-75597092a404@iogearbox.net> User-agent: mu4e 1.1.0; emacs 26.1 From: Jakub Sitnicki To: Daniel Borkmann Cc: netdev@vger.kernel.org, John Fastabend , Marek Majkowski Subject: Re: [PATCH net] sk_msg: Keep reference on socket file while psock lives In-reply-to: <5439765e-1288-c379-0ead-75597092a404@iogearbox.net> Date: Wed, 20 Feb 2019 10:47:38 +0100 Message-ID: <871s423i6d.fsf@cloudflare.com> MIME-Version: 1.0 Content-Type: text/plain Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Daniel, On Tue, Feb 19, 2019 at 05:00 PM CET, Daniel Borkmann wrote: > On 02/11/2019 10:09 AM, Jakub Sitnicki wrote: >> Backlog work for psock (sk_psock_backlog) might sleep while waiting for >> memory to free up when sending packets. While sleeping, socket can >> disappear from under our feet together with its wait queue because the >> userspace has closed it. >> >> This breaks an assumption in sk_stream_wait_memory, which expects the >> wait queue to be still there when it wakes up resulting in a >> use-after-free: >> >> ================================================================== >> BUG: KASAN: use-after-free in remove_wait_queue+0x31/0x70 >> Write of size 8 at addr ffff888069a0c4e8 by task kworker/0:2/110 >> >> CPU: 0 PID: 110 Comm: kworker/0:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 >> Workqueue: events sk_psock_backlog >> Call Trace: >> print_address_description+0x6e/0x2b0 >> ? remove_wait_queue+0x31/0x70 >> kasan_report+0xfd/0x177 >> ? remove_wait_queue+0x31/0x70 >> ? remove_wait_queue+0x31/0x70 >> remove_wait_queue+0x31/0x70 >> sk_stream_wait_memory+0x4dd/0x5f0 >> ? sk_stream_wait_close+0x1b0/0x1b0 >> ? wait_woken+0xc0/0xc0 >> ? tcp_current_mss+0xc5/0x110 >> tcp_sendmsg_locked+0x634/0x15d0 >> ? tcp_set_state+0x2e0/0x2e0 >> ? __kasan_slab_free+0x1d1/0x230 >> ? kmem_cache_free+0x70/0x140 >> ? sk_psock_backlog+0x40c/0x4b0 >> ? process_one_work+0x40b/0x660 >> ? worker_thread+0x82/0x680 >> ? kthread+0x1b9/0x1e0 >> ? ret_from_fork+0x1f/0x30 >> ? check_preempt_curr+0xaf/0x130 >> ? iov_iter_kvec+0x5f/0x70 >> ? kernel_sendmsg_locked+0xa0/0xe0 >> skb_send_sock_locked+0x273/0x3c0 >> ? skb_splice_bits+0x180/0x180 >> ? start_thread+0xe0/0xe0 >> ? update_min_vruntime.constprop.27+0x88/0xc0 >> sk_psock_backlog+0xb3/0x4b0 >> ? strscpy+0xbf/0x1e0 >> process_one_work+0x40b/0x660 >> worker_thread+0x82/0x680 >> ? process_one_work+0x660/0x660 >> kthread+0x1b9/0x1e0 >> ? __kthread_create_on_node+0x250/0x250 >> ret_from_fork+0x1f/0x30 >> >> Allocated by task 109: >> sock_alloc_inode+0x54/0x120 >> alloc_inode+0x28/0xb0 >> new_inode_pseudo+0x7/0x60 >> sock_alloc+0x21/0xe0 >> __sys_accept4+0xc2/0x330 >> __x64_sys_accept+0x3b/0x50 >> do_syscall_64+0xb2/0x3e0 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> Freed by task 7: >> kfree+0x7f/0x140 >> rcu_process_callbacks+0xe0/0x100 >> __do_softirq+0xe5/0x29a >> >> The buggy address belongs to the object at ffff888069a0c4e0 >> which belongs to the cache kmalloc-64 of size 64 >> The buggy address is located 8 bytes inside of >> 64-byte region [ffff888069a0c4e0, ffff888069a0c520) >> The buggy address belongs to the page: >> page:ffffea0001a68300 count:1 mapcount:0 mapping:ffff88806d4018c0 index:0x0 >> flags: 0x4000000000000200(slab) >> raw: 4000000000000200 dead000000000100 dead000000000200 ffff88806d4018c0 >> raw: 0000000000000000 00000000002a002a 00000001ffffffff 0000000000000000 >> page dumped because: kasan: bad access detected >> >> Memory state around the buggy address: >> ffff888069a0c380: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb >> ffff888069a0c400: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc >>> ffff888069a0c480: 00 00 00 00 00 00 00 00 fc fc fc fc fb fb fb fb >> ^ >> ffff888069a0c500: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb >> ffff888069a0c580: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc >> ================================================================== >> >> Avoid it by keeping a reference to the socket file until the psock gets >> destroyed. >> >> While at it, rearrange the order of reference grabbing and >> initialization to match the destructor in reverse. >> >> Reported-by: Marek Majkowski >> Link: https://lore.kernel.org/netdev/CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com >> Signed-off-by: Jakub Sitnicki >> --- >> net/core/skmsg.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/net/core/skmsg.c b/net/core/skmsg.c >> index 8c826603bf36..a38442b8580b 100644 >> --- a/net/core/skmsg.c >> +++ b/net/core/skmsg.c >> @@ -493,8 +493,13 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) >> sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); >> refcount_set(&psock->refcnt, 1); >> >> - rcu_assign_sk_user_data(sk, psock); >> + /* Hold on to socket wait queue. Backlog work waits on it for >> + * memory when sending. We must cancel work before socket wait >> + * queue can go away. >> + */ >> + get_file(sk->sk_socket->file); >> sock_hold(sk); >> + rcu_assign_sk_user_data(sk, psock); >> >> return psock; >> } >> @@ -558,6 +563,7 @@ static void sk_psock_destroy_deferred(struct work_struct *gc) >> if (psock->sk_redir) >> sock_put(psock->sk_redir); >> sock_put(psock->sk); >> + fput(psock->sk->sk_socket->file); > > Thanks for the report (and sorry for the late reply). I think holding ref on > the struct file just so we keep it alive till deferred destruction might be > papering over the actual underlying bug. Nothing obvious pops out from staring > at the code right now; as a reproducer to run, did you use the prog in the link > above and hit it after your strparser fix? I get you, I actually sat on this fix for a moment because I had a similar concern, that holding a file ref is a heavy-handed fix and I'm not seeing the real problem. For me it came down to this: 1. tcp_sendmsg_locked that we call from psock backlog work can end up waiting for memory. We somehow need to ensure that the socket wait queue does not disappear until tcp_sendmsg_locked returns. 2. KCM, which I assume must have the same problem, holds a reference on the socket file. I'm curious if there is another angle to it. To answer your actual questions - your guesses are correct on both accounts. For the reproducer, I've used the TCP echo program from Marek [1]. On the client side, I had something like: while :; do nc 10.0.0.1 12345 > /dev/null < /dev/zero & pid=$! sleep 0.1 kill $pid done I can dig out the test scripts or help testing any patches. I was testing with the strparser fix applied, 1d79895aef18 ("sk_msg: Always cancel strp work before freeing the psock"), which unfortunately was not enough. The explanation there was that the socket descriptor can get closed, and in consequence the socket file can get destroyed, before the deferred destructor for psock runs. So psock backlog work can be still very much alive and running while the socket file is gone. Thanks for looking into it, -Jakub [1] https://gist.github.com/majek/a09bcbeb8ab548cde6c18c930895c3f2