From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38B16C282CD for ; Mon, 28 Jan 2019 23:36:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 05C032175B for ; Mon, 28 Jan 2019 23:36:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oOpC3IiX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727257AbfA1XgR (ORCPT ); Mon, 28 Jan 2019 18:36:17 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:35460 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727193AbfA1XgR (ORCPT ); Mon, 28 Jan 2019 18:36:17 -0500 Received: by mail-ot1-f66.google.com with SMTP id 81so16323564otj.2 for ; Mon, 28 Jan 2019 15:36:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PQIDq6ClBgggejiPZ9U1GYAYyrne8cJ3+jqu+8cslzo=; b=oOpC3IiX08kAgsrmlAZCiFEkuZ2GJHsezPUimQ+rte148nSigOe5poUKmqKwBtmO7+ ze2O9SD9dV2+FlLBhk/oWOcdHnBMg0gFs47+vS1B47y2fVn76jlVgbuLcb+pAeWcsiRs A+dyCTVWAEyaAiEoelvBmSidRyRfGMSCUws0seP5t9G1Z8M/ASkMHUsR72GA5bdgE62u zduePFBbk//X6uxcq7yqj8n4EWhszWuqswRtqJqFUTYR+30xgjSSbdVbzGdWeA7T974u TiaWqHNkEKb0hHxo/yu9gAZOixcAKd4NTA9lSWW/x87dPZcMxXhbMx+j2ROo9HHYaPrq 8XYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PQIDq6ClBgggejiPZ9U1GYAYyrne8cJ3+jqu+8cslzo=; b=Ur/z82nfDAwHjwDxcKoI4gHEyxWucRF1ZADWrcYfaChsGJXvUmqipRX4iEvfDJHT7c RmGrb1C4EZl9b9R+Ws4mA+rNUYB+BNm+xWkchMUrFAOoQuOZqmfdpjI8EpMGOVdThYIZ HBDMC5GZC2hbSvPrMyNurF0fAt0V9vcBtxuxo7IQ5qhrUyYJ11YPFvezgwPf9Evb4geV GT35llfrGbKXyKednLFuQUddmtb0E1YK3ztsbdKZ8k07N1avxc5GZ4g2Mrf7WWTKxbTZ Fes2XSLLvRwSUkTetQ55aw9kr1ZZA5b6BMZ1msYPFxB1Dbb2ndv4OnNvQAsyWAu58//P onTA== X-Gm-Message-State: AJcUukdTgNBZQxoa93FlyJHzxyTgRfStPgtElXsZ/RTdMH2oxaT5JxW4 HdmqSjbxOYDfa3gTWRMtYNmhXiYMCq1DdNcFR59zBvlWluU= X-Google-Smtp-Source: ALg8bN6j7H+JmmP+h1+gQ0SvbZQK2S2j03qF/Szfeh8lloZS440SXE3TIOfUvReIxaNLUHKqgUwUR0ucNxcFcyWqnZw= X-Received: by 2002:a9d:4e06:: with SMTP id p6mr18976867otf.73.1548718576370; Mon, 28 Jan 2019 15:36:16 -0800 (PST) MIME-Version: 1.0 References: <20190128213538.13486-1-axboe@kernel.dk> <20190128213538.13486-13-axboe@kernel.dk> In-Reply-To: <20190128213538.13486-13-axboe@kernel.dk> From: Jann Horn Date: Tue, 29 Jan 2019 00:35:50 +0100 Message-ID: Subject: Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers To: Jens Axboe Cc: linux-aio@kvack.org, linux-block@vger.kernel.org, linux-man , Linux API , hch@lst.de, jmoyer@redhat.com, Avi Kivity Content-Type: text/plain; charset="UTF-8" Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Mon, Jan 28, 2019 at 10:36 PM Jens Axboe wrote: > If we have fixed user buffers, we can map them into the kernel when we > setup the io_context. That avoids the need to do get_user_pages() for > each and every IO. [...] > +static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, > + void __user *arg, unsigned nr_args) > +{ > + int ret; > + > + /* Drop our initial ref and wait for the ctx to be fully idle */ > + percpu_ref_put(&ctx->refs); The line above drops a reference that you just got in the caller... > + percpu_ref_kill(&ctx->refs); > + wait_for_completion(&ctx->ctx_done); > + > + switch (opcode) { > + case IORING_REGISTER_BUFFERS: > + ret = io_sqe_buffer_register(ctx, arg, nr_args); > + break; > + case IORING_UNREGISTER_BUFFERS: > + ret = -EINVAL; > + if (arg || nr_args) > + break; > + ret = io_sqe_buffer_unregister(ctx); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + > + /* bring the ctx back to life */ > + reinit_completion(&ctx->ctx_done); > + percpu_ref_resurrect(&ctx->refs); > + percpu_ref_get(&ctx->refs); And then this line takes a reference that the caller will immediately drop again? Why? > + return ret; > +} > + > +SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, > + void __user *, arg, unsigned int, nr_args) > +{ > + struct io_ring_ctx *ctx; > + long ret = -EBADF; > + struct fd f; > + > + f = fdget(fd); > + if (!f.file) > + return -EBADF; > + > + ret = -EOPNOTSUPP; > + if (f.file->f_op != &io_uring_fops) > + goto out_fput; > + > + ret = -ENXIO; > + ctx = f.file->private_data; > + if (!percpu_ref_tryget(&ctx->refs)) > + goto out_fput; If you are holding the uring_lock of a ctx that can be accessed through a file descriptor (which you do just after this point), you know that the percpu_ref isn't zero, right? Why are you doing the tryget here? > + ret = -EBUSY; > + if (mutex_trylock(&ctx->uring_lock)) { > + ret = __io_uring_register(ctx, opcode, arg, nr_args); > + mutex_unlock(&ctx->uring_lock); > + } > + io_ring_drop_ctx_refs(ctx, 1); > +out_fput: > + fdput(f); > + return ret; > +} From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jann Horn Subject: Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Date: Tue, 29 Jan 2019 00:35:50 +0100 Message-ID: References: <20190128213538.13486-1-axboe@kernel.dk> <20190128213538.13486-13-axboe@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: <20190128213538.13486-13-axboe@kernel.dk> Sender: owner-linux-aio@kvack.org To: Jens Axboe Cc: linux-aio@kvack.org, linux-block@vger.kernel.org, linux-man , Linux API , hch@lst.de, jmoyer@redhat.com, Avi Kivity List-Id: linux-man@vger.kernel.org On Mon, Jan 28, 2019 at 10:36 PM Jens Axboe wrote: > If we have fixed user buffers, we can map them into the kernel when we > setup the io_context. That avoids the need to do get_user_pages() for > each and every IO. [...] > +static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, > + void __user *arg, unsigned nr_args) > +{ > + int ret; > + > + /* Drop our initial ref and wait for the ctx to be fully idle */ > + percpu_ref_put(&ctx->refs); The line above drops a reference that you just got in the caller... > + percpu_ref_kill(&ctx->refs); > + wait_for_completion(&ctx->ctx_done); > + > + switch (opcode) { > + case IORING_REGISTER_BUFFERS: > + ret = io_sqe_buffer_register(ctx, arg, nr_args); > + break; > + case IORING_UNREGISTER_BUFFERS: > + ret = -EINVAL; > + if (arg || nr_args) > + break; > + ret = io_sqe_buffer_unregister(ctx); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + > + /* bring the ctx back to life */ > + reinit_completion(&ctx->ctx_done); > + percpu_ref_resurrect(&ctx->refs); > + percpu_ref_get(&ctx->refs); And then this line takes a reference that the caller will immediately drop again? Why? > + return ret; > +} > + > +SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, > + void __user *, arg, unsigned int, nr_args) > +{ > + struct io_ring_ctx *ctx; > + long ret = -EBADF; > + struct fd f; > + > + f = fdget(fd); > + if (!f.file) > + return -EBADF; > + > + ret = -EOPNOTSUPP; > + if (f.file->f_op != &io_uring_fops) > + goto out_fput; > + > + ret = -ENXIO; > + ctx = f.file->private_data; > + if (!percpu_ref_tryget(&ctx->refs)) > + goto out_fput; If you are holding the uring_lock of a ctx that can be accessed through a file descriptor (which you do just after this point), you know that the percpu_ref isn't zero, right? Why are you doing the tryget here? > + ret = -EBUSY; > + if (mutex_trylock(&ctx->uring_lock)) { > + ret = __io_uring_register(ctx, opcode, arg, nr_args); > + mutex_unlock(&ctx->uring_lock); > + } > + io_ring_drop_ctx_refs(ctx, 1); > +out_fput: > + fdput(f); > + return ret; > +} -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org