From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 295D2C433DF for ; Mon, 10 Aug 2020 20:36:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 030342076B for ; Mon, 10 Aug 2020 20:36:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uwEeQUTc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726657AbgHJUgM (ORCPT ); Mon, 10 Aug 2020 16:36:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726505AbgHJUgM (ORCPT ); Mon, 10 Aug 2020 16:36:12 -0400 Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80DB6C061756 for ; Mon, 10 Aug 2020 13:36:11 -0700 (PDT) Received: by mail-lj1-x242.google.com with SMTP id z14so11116371ljm.1 for ; Mon, 10 Aug 2020 13:36:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ys0PPIaowDblp47MmQ67XHzGcdkz/IvEj8kEa9cSJOg=; b=uwEeQUTc3SopDhbrB3fAuH9UiFcYvUf6cRMVHU9kvtWacwpPUeE3pVSHTyrfegjiY+ Nac9twhqm1E7qgtV6bsnBhrSbtrAMUl0u3o//PXZcsBtiNgIUGjHza4B/WgFx3/ztTc/ ZtkQ/xcitw9zu2OzpSvnitFK3tPHGXyHDLhryhbQVccyJ2eb0vQueijOxY6UkHrw+Iy+ Wwq4Ot25NIA53P6g1uxV/Qof+n+KVTo9weWTEKzj6BoYL/xGgpyLU84gufAllZVQj+75 d/f3jFhdaHu4IAiU18V8wuHD8dl2GXEF0N37zTB/VCTEiWVUORopTrDx9TUSB1X/C53U FgeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ys0PPIaowDblp47MmQ67XHzGcdkz/IvEj8kEa9cSJOg=; b=VKkbA4c5J7OQBPDZVWmO2z+9tGanhHFaDSwPx0+joByJwKvLGjuci1jUjFIBmqKp7+ RrA94XABJ1hjL0GbWxEei8TAupfEYmulpypSb/QnZVSLV1sO0RLuY3A8rafGMcfIwusi tNDM71t+QJXKm8E8HcfL4ZpJDPnmKS2C/IEZhFIu6cmQ12SO7gRhA3ZXIZ0eIASFmhAD UQwCPAVs8Eb+wyvHAs9wrRTqXt6GKP4OZVT3OfC7unnHsw1nX2RmQ8F3Kwa48IlTpAdX r/1xv3M/Vslnc02P86ninHcGBWAzilSVreEeVaOls8ePhVKR8WvlIgujHCNRPG8vixAx vzbw== X-Gm-Message-State: AOAM531plQJD35rycI/WXR2N0GDM7qgzt47nuPlBkW3OpEbha3ulqoCz Je7PwFQ5ebym8/+oOzqNbKoKgfuB9Ksw6MHNyTVgLw== X-Google-Smtp-Source: ABdhPJw66y9OOJO9r1nbxjPpmNhECAHQcjYEwR3UNwwSknRVVPBmP2V+RXpeIZwyBpxfDC4Vmc9QvF0yBW9pugwF2s4= X-Received: by 2002:a2e:302:: with SMTP id 2mr1289664ljd.156.1597091769650; Mon, 10 Aug 2020 13:36:09 -0700 (PDT) MIME-Version: 1.0 References: <20200808183439.342243-1-axboe@kernel.dk> <20200808183439.342243-3-axboe@kernel.dk> <20200810114256.GS2674@hirez.programming.kicks-ass.net> <07df8ab4-16a8-8537-b4fe-5438bd8110cf@kernel.dk> <20200810201213.GB3982@worktop.programming.kicks-ass.net> <4a8fa719-330f-d380-522f-15d79c74ca9a@kernel.dk> In-Reply-To: From: Jann Horn Date: Mon, 10 Aug 2020 22:35:41 +0200 Message-ID: Subject: Re: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't running To: Jens Axboe Cc: Peter Zijlstra , io-uring , stable , Josef , Oleg Nesterov Content-Type: text/plain; charset="UTF-8" Sender: io-uring-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On Mon, Aug 10, 2020 at 10:25 PM Jens Axboe wrote: > On 8/10/20 2:13 PM, Jens Axboe wrote: > >> Would it be clearer to write it like so perhaps? > >> > >> /* > >> * Optimization; when the task is RUNNING we can do with a > >> * cheaper TWA_RESUME notification because,... >> * here>. Otherwise do the more expensive, but always correct > >> * TWA_SIGNAL. > >> */ > >> if (READ_ONCE(tsk->state) == TASK_RUNNING) { > >> __task_work_notify(tsk, TWA_RESUME); > >> if (READ_ONCE(tsk->state) == TASK_RUNNING) > >> return; > >> } > >> __task_work_notify(tsk, TWA_SIGNAL); > >> wake_up_process(tsk); > > > > Yeah that is easier to read, wasn't a huge fan of the loop since it's > > only a single retry kind of condition. I'll adopt this suggestion, > > thanks! > > Re-write it a bit on top of that, just turning it into two separate > READ_ONCE, and added appropriate comments. For the SQPOLL case, the > wake_up_process() is enough, so we can clean up that if/else. > > https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=49bc5c16483945982cf81b0109d7da7cd9ee55ed I think I'm starting to understand the overall picture here, and I think if my understanding is correct, your solution isn't going to work properly. My understanding of the scenario you're trying to address is: - task A starts up io_uring - task A tells io_uring to bump the counter of an eventfd E when work has been completed - task A submits some work ("read a byte from file descriptor X", or something like that) - io_uring internally starts an asynchronous I/O operation, with a callback C - task A calls read(E, &counter, sizeof(counter)) to wait for events to be processed - the async I/O operation finishes, C is invoked, and C schedules task_work for task A And here you run into a deadlock, because the task_work will only run when task A returns from the syscall, but the syscall will only return once the task_work is executing and has finished the I/O operation. If that is the scenario you're trying to solve here (where you're trying to force a task that's in the middle of some syscall that's completely unrelated to io_uring to return back to syscall context), I don't think this will work: It might well be that the task has e.g. just started entering the read() syscall, and is *about to* block, but is currently still running.