From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F2A1CA9EAF for ; Thu, 24 Oct 2019 20:31:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 418312070B for ; Thu, 24 Oct 2019 20:31:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gT3oUHub" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728217AbfJXUbf (ORCPT ); Thu, 24 Oct 2019 16:31:35 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:40073 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727426AbfJXUbf (ORCPT ); Thu, 24 Oct 2019 16:31:35 -0400 Received: by mail-ot1-f65.google.com with SMTP id d8so169058otc.7 for ; Thu, 24 Oct 2019 13:31:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WlCz3glcIgz6+cv61sJX8bVfCHYO1bWoZJFgsRvlORQ=; b=gT3oUHubQDXmr2T9J0IXhoqzm/5OCQ60wFi51htkj27rPoXnOLSk2VOGtiAJIE9DJg kKTd0s2mJBv4ks6yE9/VQmNQnQiaOxKDjK+KJ8/5mU7+zOALBieopGskBc+YRy7OmJsV rKVTTzvclUFoPA/8ZhmZ5FHMl/A7/MuCcaOND9LMWbFh6Gx81R8vJQ+bCP7SBZX1s9HP iVLE9o2vNAIUuLY02/NYC6SQ6hCgX3WV/WcsHh6PTGso8jGA/NY+hP60KL8kwp4oj/3d ubaUJ5neLBCcUDhg5ZN/e/Fgi4JGgYDyJSxDrNyWiexYicPNqoC16XJQWkpYHeTFH0KH 4mrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WlCz3glcIgz6+cv61sJX8bVfCHYO1bWoZJFgsRvlORQ=; b=ou0RRh5r50w/ii7ujSn8nKJqBtDltWYL7imWSV1SGsZuYWvW9MeHu90w3dfXbyLRmj ZVyqCFLIPY4Ct4ZQtHabkn41nIuPCGAPbGy3PVpiQ39iLTje28L/wWj0c1uO+GC7Cb8Q QKj/9drziuTIHKMKcVaNbSWh/VLnGAH+Spoo3NuJpsiyupLKpnccgd9HB9AL756/nHX1 5zROSEBjxkdPzgOXDjzPZHoNRIh9JtD11O2nt6PmcFj49+AvLRzfLDRrN5/v+qGQxoSb TIyJ4iRn0STzp8lQ+oT6GGYGG0SuFcFCWWbUnDklVX8ZwIh0qNIsmWPs1U1mqeCt/3Do 33fw== X-Gm-Message-State: APjAAAVLDWZjlm4uwg1HDSs05lDKWVndrSwvZe3GmBWld+6aq/mxU4VP umAdZuo9uhAfHKnSku9SdFzLys6s+22CYMQUwldmvQ== X-Google-Smtp-Source: APXvYqwSSC8vko9rgvKYJIKJLuhV2zwatVLj4ZmfiHNIzLorIWNuLmfDhLASlBDXJHKLnQ9syJrTPxGZ7qT9nUWosLM= X-Received: by 2002:a9d:75d0:: with SMTP id c16mr291167otl.32.1571949094082; Thu, 24 Oct 2019 13:31:34 -0700 (PDT) MIME-Version: 1.0 References: <20191017212858.13230-1-axboe@kernel.dk> <20191017212858.13230-2-axboe@kernel.dk> <0fb9d9a0-6251-c4bd-71b0-6e34c6a1aab8@kernel.dk> <572f40fb-201c-99ce-b3f5-05ff9369b895@kernel.dk> <20b44cc0-87b1-7bf8-d20e-f6131da9d130@kernel.dk> <2d208fc8-7c24-bca5-3d4a-796a5a8267eb@kernel.dk> <0a3de9b2-3d3a-07b5-0e1c-515f610fbf75@kernel.dk> In-Reply-To: From: Jann Horn Date: Thu, 24 Oct 2019 22:31:07 +0200 Message-ID: Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table To: Jens Axboe Cc: linux-block@vger.kernel.org, "David S. Miller" , Network Development Content-Type: text/plain; charset="UTF-8" Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Thu, Oct 24, 2019 at 9:41 PM Jens Axboe wrote: > On 10/18/19 12:50 PM, Jann Horn wrote: > > On Fri, Oct 18, 2019 at 8:16 PM Jens Axboe wrote: > >> On 10/18/19 12:06 PM, Jann Horn wrote: > >>> But actually, by the way: Is this whole files_struct thing creating a > >>> reference loop? The files_struct has a reference to the uring file, > >>> and the uring file has ACCEPT work that has a reference to the > >>> files_struct. If the task gets killed and the accept work blocks, the > >>> entire files_struct will stay alive, right? > >> > >> Yes, for the lifetime of the request, it does create a loop. So if the > >> application goes away, I think you're right, the files_struct will stay. > >> And so will the io_uring, for that matter, as we depend on the closing > >> of the files to do the final reap. > >> > >> Hmm, not sure how best to handle that, to be honest. We need some way to > >> break the loop, if the request never finishes. > > > > A wacky and dubious approach would be to, instead of taking a > > reference to the files_struct, abuse f_op->flush() to synchronously > > flush out pending requests with references to the files_struct... But > > it's probably a bad idea, given that in f_op->flush(), you can't > > easily tell which files_struct the close is coming from. I suppose you > > could keep a list of (fdtable, fd) pairs through which ACCEPT requests > > have come in and then let f_op->flush() probe whether the file > > pointers are gone from them... > > Got back to this after finishing the io-wq stuff, which we need for the > cancel. > > Here's an updated patch: > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=1ea847edc58d6a54ca53001ad0c656da57257570 > > that seems to work for me (lightly tested), we correctly find and cancel > work that is holding on to the file table. > > The full series sits on top of my for-5.5/io_uring-wq branch, and can be > viewed here: > > http://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-test > > Let me know what you think! Ah, I didn't realize that the second argument to f_op->flush is a pointer to the files_struct. That's neat. Security: There is no guarantee that ->flush() will run after the last io_uring_enter() finishes. You can race like this, with threads A and B in one process and C in another one: A: sends uring fd to C via unix domain socket A: starts syscall io_uring_enter(fd, ...) A: calls fdget(fd), takes reference to file B: starts syscall close(fd) B: fd table entry is removed B: f_op->flush is invoked and finds no pending transactions B: syscall close() returns A: continues io_uring_enter(), grabbing current->files A: io_uring_enter() returns A and B: exit worker: use-after-free access to files_struct I think the solution to this would be (unless you're fine with adding some broad global read-write mutex) something like this in __io_queue_sqe(), where "fd" and "f" are the variables from io_uring_enter(), plumbed through the stack somehow: if (req->flags & REQ_F_NEED_FILES) { rcu_read_lock(); spin_lock_irq(&ctx->inflight_lock); if (fcheck(fd) == f) { list_add(&req->inflight_list, &ctx->inflight_list); req->work.files = current->files; ret = 0; } else { ret = -EBADF; } spin_unlock_irq(&ctx->inflight_lock); rcu_read_unlock(); if (ret) goto put_req; } Minor note: If a process uses dup() to duplicate the uring fd, then closes the duplicated fd, that will cause work cancellations - but I guess that's fine? Style nit: I find it a bit confusing to name both the list head and the list member heads "inflight_list". Maybe name them "inflight_list" and "inflight_entry", or something like that? Correctness: Why is the wait in io_uring_flush() TASK_INTERRUPTIBLE? Shouldn't it be TASK_UNINTERRUPTIBLE? If someone sends a signal to the task while it's at that schedule(), it's just going to loop back around and retry what it was doing already, right? Security + Correctness: If there is more than one io_wqe, it seems to me that io_uring_flush() calls io_wq_cancel_work(), which calls io_wqe_cancel_work(), which may return IO_WQ_CANCEL_OK if the first request it looks at is pending. In that case, io_wq_cancel_work() will immediately return, and io_uring_flush() will also immediately return. It looks like any other requests will continue running?