From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02EB9CA9EA0 for ; Fri, 18 Oct 2019 16:36:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C37892070B for ; Fri, 18 Oct 2019 16:36:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="AfR7j+fK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390098AbfJRQgb (ORCPT ); Fri, 18 Oct 2019 12:36:31 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:37718 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390036AbfJRQgb (ORCPT ); Fri, 18 Oct 2019 12:36:31 -0400 Received: by mail-pg1-f194.google.com with SMTP id p1so3669211pgi.4 for ; Fri, 18 Oct 2019 09:36:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=rWLs6Xuc3aJC0s0delqoDli8RYMmapuhff73waYsGtY=; b=AfR7j+fKA2c8DHDcxPY9fGz4gy2w1xUP0WHZe8VBKkez/uJ5S27VRq0sl/hVEpKoZO yy75VizWut/ylGy9uoA/ZOrFXLK2XZOSlu/PUQxXcGH8EnIKAFdWsPXDnzos6YIw73+C r4cMjEoXgC+5kT/DS6RpO+4rzGTqxcSDBu7MASCS4TTqqNBvzPW7bxWoHzvl1T2V7rxl jqpcR4cF1ZzyGb9p7AJ9wLyVnnDe9TyZ8Q69BtJlrSVI4dST5kZWnJg9RVlaTXACEQhm pFFykQatyhMoMtnGpunWdEkcmD4SJJzqTUMjItKXLoC7lmePJtaMwyEYMtgsk1uz4tYH gX0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=rWLs6Xuc3aJC0s0delqoDli8RYMmapuhff73waYsGtY=; b=WOz09em1ftlboXFZbe8P0TiFS1yxVL25+hlE4XG1TwBjZDb+YzARV8keD4SwbMm4JF 1d0TMOwAmExR42PykP8zmwBl04kq0syfcb/JKyhzNZf/MpDsX8umhR8aiEM25DqPNS4E 6hh8Co5O5hKw03Tk0uGjwkhktXv/vzpCvP/zN+NkwXO/86ymYeN5Myo9l7t+pNKvQUnT UeHPA7U2vCWErhhIW5DWm+VtQ+y+xK23eUyXaZTQ+LNzT2L1qd2nLFC2ECRaql/1qJbX tAzTGwnTXKDTHDhqEgLsW6rEPs8cauElKYTk5SAwQHMIVRjS/RrDoJyGA2hxoFvzmmkM EzWA== X-Gm-Message-State: APjAAAV6Nu6vOTPqg0SLA2VxgIKtCOYii0AWGObAtoH4G1yxnxTIV2Wm PcrDoCEa8YjQqejhhoLmW4a6+g== X-Google-Smtp-Source: APXvYqwLWsAjNyv0O5PRoFjzmTVjhvNhmHsItJjRJYU2gj2UFw7QsUBLK7PfTTz6vvOdmQtuSdJS5g== X-Received: by 2002:a62:2ec5:: with SMTP id u188mr7635072pfu.252.1571416589407; Fri, 18 Oct 2019 09:36:29 -0700 (PDT) Received: from [192.168.1.188] ([66.219.217.79]) by smtp.gmail.com with ESMTPSA id o64sm15326856pjb.24.2019.10.18.09.36.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Oct 2019 09:36:28 -0700 (PDT) Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table To: Jann Horn Cc: linux-block@vger.kernel.org, "David S. Miller" , Network Development References: <20191017212858.13230-1-axboe@kernel.dk> <20191017212858.13230-2-axboe@kernel.dk> <0fb9d9a0-6251-c4bd-71b0-6e34c6a1aab8@kernel.dk> <572f40fb-201c-99ce-b3f5-05ff9369b895@kernel.dk> From: Jens Axboe Message-ID: <20b44cc0-87b1-7bf8-d20e-f6131da9d130@kernel.dk> Date: Fri, 18 Oct 2019 10:36:26 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 10/18/19 10:20 AM, Jann Horn wrote: > On Fri, Oct 18, 2019 at 5:55 PM Jens Axboe wrote: >> On 10/18/19 9:00 AM, Jens Axboe wrote: >>> On 10/18/19 8:52 AM, Jann Horn wrote: >>>> On Fri, Oct 18, 2019 at 4:43 PM Jens Axboe wrote: >>>>> >>>>> On 10/18/19 8:40 AM, Jann Horn wrote: >>>>>> On Fri, Oct 18, 2019 at 4:37 PM Jens Axboe wrote: >>>>>>> >>>>>>> On 10/18/19 8:34 AM, Jann Horn wrote: >>>>>>>> On Fri, Oct 18, 2019 at 4:01 PM Jens Axboe wrote: >>>>>>>>> On 10/17/19 8:41 PM, Jann Horn wrote: >>>>>>>>>> On Fri, Oct 18, 2019 at 4:01 AM Jens Axboe wrote: >>>>>>>>>>> This is in preparation for adding opcodes that need to modify files >>>>>>>>>>> in a process file table, either adding new ones or closing old ones. >>>>>>>> [...] >>>>>>>>> Updated patch1: >>>>>>>>> >>>>>>>>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=df6caac708dae8ee9a74c9016e479b02ad78d436 >>>>>>>> >>>>>>>> I don't understand what you're doing with old_files in there. In the >>>>>>>> "s->files && !old_files" branch, "current->files = s->files" happens >>>>>>>> without holding task_lock(), but current->files and s->files are also >>>>>>>> the same already at that point anyway. And what's the intent behind >>>>>>>> assigning stuff to old_files inside the loop? Isn't that going to >>>>>>>> cause the workqueue to keep a modified current->files beyond the >>>>>>>> runtime of the work? >>>>>>> >>>>>>> I simply forgot to remove the old block, it should only have this one: >>>>>>> >>>>>>> if (s->files && s->files != cur_files) { >>>>>>> task_lock(current); >>>>>>> current->files = s->files; >>>>>>> task_unlock(current); >>>>>>> if (cur_files) >>>>>>> put_files_struct(cur_files); >>>>>>> cur_files = s->files; >>>>>>> } >>>>>> >>>>>> Don't you still need a put_files_struct() in the case where "s->files >>>>>> == cur_files"? >>>>> >>>>> I want to hold on to the files for as long as I can, to avoid unnecessary >>>>> shuffling of it. But I take it your worry here is that we'll be calling >>>>> something that manipulates ->files? Nothing should do that, unless >>>>> s->files is set. We didn't hide the workqueue ->files[] before this >>>>> change either. >>>> >>>> No, my worry is that the refcount of the files_struct is left too >>>> high. From what I can tell, the "do" loop in io_sq_wq_submit_work() >>>> iterates over multiple instances of struct sqe_submit. If there are >>>> two sqe_submit instances with the same ->files (each holding a >>>> reference from the get_files_struct() in __io_queue_sqe()), then: >>>> >>>> When processing the first sqe_submit instance, current->files and >>>> cur_files are set to $user_files. >>>> When processing the second sqe_submit instance, nothing happens >>>> (s->files == cur_files). >>>> After the loop, at the end of the function, put_files_struct() is >>>> called once on $user_files. >>>> >>>> So get_files_struct() has been called twice, but put_files_struct() >>>> has only been called once. That leaves the refcount too high, and by >>>> repeating this, an attacker can make the refcount wrap around and then >>>> cause a use-after-free. >>> >>> Ah now I see what you are getting at, yes that's clearly a bug! I wonder >>> how we best safely can batch the drops. We can track the number of times >>> we've used the same files, and do atomic_sub_and_test() in a >>> put_files_struct_many() type addition. But that would leave us open to >>> the issue you describe, where someone could maliciously overflow the >>> files ref count. >>> >>> Probably not worth over-optimizing, as long as we can avoid the >>> current->files task lock/unlock and shuffle. >>> >>> I'll update the patch. >> >> Alright, this incremental on top should do it. And full updated patch >> here: >> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=40449c5a3d3b16796fa13e9469c69d62986e961c >> >> Let me know what you think. > > Ignoring the locking elision, basically the logic is now this: > > static void io_sq_wq_submit_work(struct work_struct *work) > { > struct io_kiocb *req = container_of(work, struct io_kiocb, work); > struct files_struct *cur_files = NULL, *old_files; > [...] > old_files = current->files; > [...] > do { > struct sqe_submit *s = &req->submit; > [...] > if (cur_files) > /* drop cur_files reference; borrow lifetime must > * end before here */ > put_files_struct(cur_files); > /* move reference ownership to cur_files */ > cur_files = s->files; > if (cur_files) { > task_lock(current); > /* current->files borrows reference from cur_files; > * existing borrow from previous loop ends here */ > current->files = cur_files; > task_unlock(current); > } > > [call __io_submit_sqe()] > [...] > } while (req); > [...] > /* existing borrow ends here */ > task_lock(current); > current->files = old_files; > task_unlock(current); > if (cur_files) > /* drop cur_files reference; borrow lifetime must > * end before here */ > put_files_struct(cur_files); > } > > If you run two iterations of this loop, with a first element that has > a ->files pointer and a second element that doesn't, then in the > second run through the loop, the reference to the files_struct will be > dropped while current->files still points to it; current->files is > only reset after the loop has ended. If someone accesses > current->files through procfs directly after that, AFAICS you'd get a > use-after-free. Amazing how this is still broken. You are right, and it's especially annoying since that's exactly the case I originally talked about (not flipping current->files if we don't have to). I just did it wrong, so we'll leave a dangling pointer in ->files. The by far most common case is if one sqe has a files it needs to attach, then others that also have files will be the same set. So I want to optimize for the case where we only flip current->files once when we see the files, and once when we're done with the loop. Let me see if I can get this right... -- Jens Axboe