From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6676C11D00 for ; Thu, 20 Feb 2020 22:56:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 996A4207FD for ; Thu, 20 Feb 2020 22:56:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uVlNQ1fz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729335AbgBTW4x (ORCPT ); Thu, 20 Feb 2020 17:56:53 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:33466 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729130AbgBTW4x (ORCPT ); Thu, 20 Feb 2020 17:56:53 -0500 Received: by mail-ot1-f66.google.com with SMTP id w6so343619otk.0 for ; Thu, 20 Feb 2020 14:56:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=udSQjP2zpwW8wxEC0irpv/d6oryvIHJArzGHSkq96iI=; b=uVlNQ1fz7UWoZ1ATyjCqSY/9lM/7DW1CW4+8C1vKzY15n4pUzm70/Dyb/8RnDJx2fW Yee8Vd0BjzVl6vpQfWl/bONKYxAl78dQib2L/ONd64FddXzyrk66epiFZrwuI8VthFn9 WaNoDzWnTrkpwGvD54jvzDmjGgeCvf38bQ0USawNRntw3iv2bUZh8hyKlUzhEqlICFi9 damReOcDkk33ncsQPeh+Sbl/faGLN/vI6+u4OqMTjiEWjvoUXexkGavouuj610ploEf8 w1b7/GmMbJKJw0xXlhpLBTa8P93v1N+DSyOFDXyjeFG88n+25EzfQMtb3AXh3TotT83L r9qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=udSQjP2zpwW8wxEC0irpv/d6oryvIHJArzGHSkq96iI=; b=ofBN3QEyl4cDU/6g613lJE63+iXN66e1EbdgP7qQaWCUltgn26u5fNOIro+MAE7nhY ccDL2iaQmwk2zYzMW89PbbSJ/UkYPlM9PaejWOjGDrBA4P1Ajjvjhs0GwVBVjA95NKZ5 GKvx+1V1/gtwxUfaRYnwAkPPJpIzuAvpS/fK3uKQHl0aEfg0VXHOTNdbPqvxNwxK7bmo ely4TQBBSu759EFbbT0HJ7Akq0lRJ//SgfYhJZLpZFRTyxCzxlEUS2Ssnu1HGeGcvBNY eyBVt9Zad4Lo7jMdBR8coUH68CQxykuYvZOvlltYMixvMylVFmmVj5g4dBOrKsPDb7d2 dhAA== X-Gm-Message-State: APjAAAU8hHuSD6tD/yXQN6yRGq4bM7z90dgvuu7Kwz1WQGa1nyKVuIqG XcqusNOSGxzHmFCmBKpfASLrwYQgHjSb52/quw1uLg== X-Google-Smtp-Source: APXvYqyHKsOIkUFNrGwpc/MokogAnUCbBL4xNCyhPZZzIrsYvArJWa+mAdbgkVVi40PcViNKAAPFlo36SwSDTKqHxLg= X-Received: by 2002:a05:6830:1219:: with SMTP id r25mr1210786otp.180.1582239412050; Thu, 20 Feb 2020 14:56:52 -0800 (PST) MIME-Version: 1.0 References: <20200220203151.18709-1-axboe@kernel.dk> <20200220203151.18709-8-axboe@kernel.dk> In-Reply-To: From: Jann Horn Date: Thu, 20 Feb 2020 23:56:25 +0100 Message-ID: Subject: Re: [PATCH 7/9] io_uring: add per-task callback handler To: Jens Axboe Cc: io-uring , Glauber Costa , Peter Zijlstra , Pavel Begunkov Content-Type: text/plain; charset="UTF-8" Sender: io-uring-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On Thu, Feb 20, 2020 at 11:02 PM Jann Horn wrote: > On Thu, Feb 20, 2020 at 9:32 PM Jens Axboe wrote: > > > > For poll requests, it's not uncommon to link a read (or write) after > > the poll to execute immediately after the file is marked as ready. > > Since the poll completion is called inside the waitqueue wake up handler, > > we have to punt that linked request to async context. This slows down > > the processing, and actually means it's faster to not use a link for this > > use case. > > > > We also run into problems if the completion_lock is contended, as we're > > doing a different lock ordering than the issue side is. Hence we have > > to do trylock for completion, and if that fails, go async. Poll removal > > needs to go async as well, for the same reason. > > > > eventfd notification needs special case as well, to avoid stack blowing > > recursion or deadlocks. > > > > These are all deficiencies that were inherited from the aio poll > > implementation, but I think we can do better. When a poll completes, > > simply queue it up in the task poll list. When the task completes the > > list, we can run dependent links inline as well. This means we never > > have to go async, and we can remove a bunch of code associated with > > that, and optimizations to try and make that run faster. The diffstat > > speaks for itself. > [...] > > -static void io_poll_trigger_evfd(struct io_wq_work **workptr) > > +static void io_poll_task_func(struct callback_head *cb) > > { > > - struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work); > > + struct io_kiocb *req = container_of(cb, struct io_kiocb, sched_work); > > + struct io_kiocb *nxt = NULL; > > > [...] > > + io_poll_task_handler(req, &nxt); > > + if (nxt) > > + __io_queue_sqe(nxt, NULL); > > This can now get here from anywhere that calls schedule(), right? > Which means that this might almost double the required kernel stack > size, if one codepath exists that calls schedule() while near the > bottom of the stack and another codepath exists that goes from here > through the VFS and again uses a big amount of stack space? Oh, I think this also implies that any mutex reachable via any of the nonblocking uring ops nests inside any mutex under which we happen to schedule(), right? I wonder whether that's going to cause deadlocks... For example, FUSE's ->read_iter() can call fuse_direct_io(), which can call inode_lock() and then call fuse_sync_writes() under the inode lock, which can wait_event(), which can schedule(); and if uring then from schedule() calls ->read_iter() again, you could reach inode_lock() on the same inode again, causing a deadlock, I think?