From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC76DC43612 for ; Fri, 18 Jan 2019 16:12:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9497920850 for ; Fri, 18 Jan 2019 16:12:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="eVkW55zx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728037AbfARQMz (ORCPT ); Fri, 18 Jan 2019 11:12:55 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44104 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728024AbfARQMz (ORCPT ); Fri, 18 Jan 2019 11:12:55 -0500 Received: by mail-pf1-f196.google.com with SMTP id u6so6804739pfh.11 for ; Fri, 18 Jan 2019 08:12:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=PwQBn9IIrQsdPjS9vWd08BtfF9gkn2bo+RQgQJfxJUs=; b=eVkW55zxoYsHXwQEwt7K2/GrZtiYzJByOW9SCi6LHUmzXzh4+rad/vmJ+X+6ALT+He aRzBXrC/o9jyq0rky1vbxk9Ccj7WmYznc+rIyciRIUBQE2OWA3q3afezCwY8WjiwaPM4 MuC3Xo3geoAVCHo6siZ2ydrYxijN4ZFfELw43OdcY66KK78dIVRfdndCFMsLpj458vvT Ai6LPsrS4MpOCMUYGDc5nAXlZk1NuGmD5sSU5/Ybkr9nu+d2aRyXOIQ2zmq3YcZwZ5Gg nr/Kei2a78MEQU7LRilR+bF676OJqHLt9zOfyO/4/w/5Cl1UqcQnm9uKjB1SuRAmIO8s 3m2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=PwQBn9IIrQsdPjS9vWd08BtfF9gkn2bo+RQgQJfxJUs=; b=N0GAF6ebXYgPASNoXsR6WJUMGCsjcquKhaFllSVSWpadZfh2vyHMmr6E+BkgePRfPw xwG8n+bqcWPXfLOQ/iL50bfuFVe4bHSwhFYULMZYC780CpPpz3vM4PH6bUTtXyzfzxcY o7qhb7lfGbjKkQ7FNgU8P6a6gX3iqX5QH7X9FFsRIOokp05atQHJ/Yc/OKfZtbse7yaO Q76sKiSi/UFvmzNUDA1/e1ssRndD840DcwclGG5xHXUdtWl+uk/2N6KWf63/rAZeco/m DrGVts9qo3knNVr/tjWhij4Yi3SdXMkDrw+6yMyys9EvhnAbwYlnDqcffjyVAAZ0GRFz eMQQ== X-Gm-Message-State: AJcUukfGM2MTRptgPHHC5u+AcBv07yZoe+K6vnYugr7k5b6VSXBv+fTf ZhtMZ7P5SYlMQuU60z9tuyrqn69CnEph4g== X-Google-Smtp-Source: ALg8bN7A0SD40pMGwsNJL0xvo663/fdmH6zoFu0Qq7jqVO0uIDt5gU9+BxLpkzTXggabHvmXlG7F/w== X-Received: by 2002:a63:66c6:: with SMTP id a189mr18318697pgc.167.1547827974240; Fri, 18 Jan 2019 08:12:54 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id m20sm5317804pgv.93.2019.01.18.08.12.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Jan 2019 08:12:53 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, jmoyer@redhat.com, avi@scylladb.com, Jens Axboe Subject: [PATCH 09/17] io_uring: use fget/fput_many() for file references Date: Fri, 18 Jan 2019 09:12:17 -0700 Message-Id: <20190118161225.4545-10-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190118161225.4545-1-axboe@kernel.dk> References: <20190118161225.4545-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a separate io_submit_state structure, to cache some of the things we need for IO submission. One such example is file reference batching. io_submit_state. We get as many references as the number of sqes we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple, hopefuly they are at least somewhat ordered. Could trivially be extended to cover multiple fds, if needed. On the completion side we do the same thing, except this is trivially done just locally in io_iopoll_reap(). Signed-off-by: Jens Axboe --- fs/io_uring.c | 139 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 118 insertions(+), 21 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 0218d516d184..668043c87ddb 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -132,6 +132,19 @@ struct io_kiocb { #define IO_PLUG_THRESHOLD 2 #define IO_IOPOLL_BATCH 8 +struct io_submit_state { + struct blk_plug plug; + + /* + * File reference cache + */ + struct file *file; + unsigned int fd; + unsigned int has_refs; + unsigned int used_refs; + unsigned int ios_left; +}; + static struct kmem_cache *req_cachep; static const struct file_operations io_uring_fops; @@ -285,9 +298,11 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, struct list_head *done) { void *reqs[IO_IOPOLL_BATCH]; + int file_count, to_free; + struct file *file = NULL; struct io_kiocb *req; - int to_free = 0; + file_count = to_free = 0; while (!list_empty(done)) { req = list_first_entry(done, struct io_kiocb, list); list_del(&req->list); @@ -297,12 +312,28 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, reqs[to_free++] = req; (*nr_events)++; - fput(req->rw.ki_filp); + /* + * Batched puts of the same file, to avoid dirtying the + * file usage count multiple times, if avoidable. + */ + if (!file) { + file = req->rw.ki_filp; + file_count = 1; + } else if (file == req->rw.ki_filp) { + file_count++; + } else { + fput_many(file, file_count); + file = req->rw.ki_filp; + file_count = 1; + } + if (to_free == ARRAY_SIZE(reqs)) io_free_req_many(ctx, reqs, &to_free); } io_commit_cqring(ctx); + if (file) + fput_many(file, file_count); if (to_free) io_free_req_many(ctx, reqs, &to_free); } @@ -491,14 +522,56 @@ static void io_iopoll_req_issued(struct io_kiocb *req) list_add_tail(&req->list, &ctx->poll_list); } +static void io_file_put(struct io_submit_state *state, struct file *file) +{ + if (!state) { + fput(file); + } else if (state->file) { + int diff = state->has_refs - state->used_refs; + + if (diff) + fput_many(state->file, diff); + state->file = NULL; + } +} + +/* + * Get as many references to a file as we have IOs left in this submission, + * assuming most submissions are for one file, or at least that each file + * has more than one submission. + */ +static struct file *io_file_get(struct io_submit_state *state, int fd) +{ + if (!state) + return fget(fd); + + if (state->file) { + if (state->fd == fd) { + state->used_refs++; + state->ios_left--; + return state->file; + } + io_file_put(state, NULL); + } + state->file = fget_many(fd, state->ios_left); + if (!state->file) + return NULL; + + state->fd = fd; + state->has_refs = state->ios_left; + state->used_refs = 1; + state->ios_left--; + return state->file; +} + static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, - bool force_nonblock) + bool force_nonblock, struct io_submit_state *state) { struct io_ring_ctx *ctx = req->ctx; struct kiocb *kiocb = &req->rw; int ret; - kiocb->ki_filp = fget(sqe->fd); + kiocb->ki_filp = io_file_get(state, sqe->fd); if (unlikely(!kiocb->ki_filp)) return -EBADF; kiocb->ki_pos = sqe->off; @@ -537,7 +610,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, } return 0; out_fput: - fput(kiocb->ki_filp); + io_file_put(state, kiocb->ki_filp); return ret; } @@ -577,7 +650,7 @@ static int io_import_iovec(struct io_ring_ctx *ctx, int rw, } static ssize_t io_read(struct io_kiocb *req, const struct io_uring_sqe *sqe, - bool force_nonblock) + bool force_nonblock, struct io_submit_state *state) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *kiocb = &req->rw; @@ -585,7 +658,7 @@ static ssize_t io_read(struct io_kiocb *req, const struct io_uring_sqe *sqe, struct file *file; ssize_t ret; - ret = io_prep_rw(req, sqe, force_nonblock); + ret = io_prep_rw(req, sqe, force_nonblock, state); if (ret) return ret; file = kiocb->ki_filp; @@ -620,7 +693,7 @@ static ssize_t io_read(struct io_kiocb *req, const struct io_uring_sqe *sqe, } static ssize_t io_write(struct io_kiocb *req, const struct io_uring_sqe *sqe, - bool force_nonblock) + bool force_nonblock, struct io_submit_state *state) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *kiocb = &req->rw; @@ -628,7 +701,7 @@ static ssize_t io_write(struct io_kiocb *req, const struct io_uring_sqe *sqe, struct file *file; ssize_t ret; - ret = io_prep_rw(req, sqe, force_nonblock); + ret = io_prep_rw(req, sqe, force_nonblock, state); if (ret) return ret; file = kiocb->ki_filp; @@ -721,7 +794,8 @@ static int io_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe, } static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, - struct sqe_submit *s, bool force_nonblock) + struct sqe_submit *s, bool force_nonblock, + struct io_submit_state *state) { const struct io_uring_sqe *sqe = s->sqe; ssize_t ret; @@ -736,10 +810,10 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, ret = io_nop(req, sqe); break; case IORING_OP_READV: - ret = io_read(req, sqe, force_nonblock); + ret = io_read(req, sqe, force_nonblock, state); break; case IORING_OP_WRITEV: - ret = io_write(req, sqe, force_nonblock); + ret = io_write(req, sqe, force_nonblock, state); break; case IORING_OP_FSYNC: ret = io_fsync(req, sqe, force_nonblock); @@ -784,7 +858,7 @@ static void io_sq_wq_submit_work(struct work_struct *work) use_mm(ctx->sqo_mm); set_fs(USER_DS); - ret = __io_submit_sqe(ctx, req, s, false); + ret = __io_submit_sqe(ctx, req, s, false, NULL); set_fs(old_fs); unuse_mm(ctx->sqo_mm); @@ -797,7 +871,8 @@ static void io_sq_wq_submit_work(struct work_struct *work) current->files = old_files; } -static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s) +static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, + struct io_submit_state *state) { struct io_kiocb *req; ssize_t ret; @@ -810,7 +885,7 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s) if (unlikely(!req)) return -EAGAIN; - ret = __io_submit_sqe(ctx, req, s, true); + ret = __io_submit_sqe(ctx, req, s, true, state); if (ret == -EAGAIN) { memcpy(&req->submit, s, sizeof(*s)); INIT_WORK(&req->work, io_sq_wq_submit_work); @@ -823,6 +898,26 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s) return ret; } +/* + * Batched submission is done, ensure local IO is flushed out. + */ +static void io_submit_state_end(struct io_submit_state *state) +{ + blk_finish_plug(&state->plug); + io_file_put(state, NULL); +} + +/* + * Start submission side cache. + */ +static void io_submit_state_start(struct io_submit_state *state, + struct io_ring_ctx *ctx, unsigned max_ios) +{ + blk_start_plug(&state->plug); + state->file = NULL; + state->ios_left = max_ios; +} + static void io_commit_sqring(struct io_ring_ctx *ctx) { struct io_sq_ring *ring = ctx->sq_ring; @@ -869,11 +964,13 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s) static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) { + struct io_submit_state state, *statep = NULL; int i, ret = 0, submit = 0; - struct blk_plug plug; - if (to_submit > IO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (to_submit > IO_PLUG_THRESHOLD) { + io_submit_state_start(&state, ctx, to_submit); + statep = &state; + } for (i = 0; i < to_submit; i++) { struct sqe_submit s; @@ -881,7 +978,7 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) if (!io_get_sqring(ctx, &s)) break; - ret = io_submit_sqe(ctx, &s); + ret = io_submit_sqe(ctx, &s, statep); if (ret) { io_drop_sqring(ctx); break; @@ -891,8 +988,8 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) } io_commit_sqring(ctx); - if (to_submit > IO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + io_submit_state_end(statep); return submit ? submit : ret; } -- 2.17.1