From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CC0FC433E0 for ; Mon, 29 Jun 2020 20:49:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A3272065D for ; Mon, 29 Jun 2020 20:49:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=126.com header.i=@126.com header.b="lj9PiDYY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387703AbgF2Utq (ORCPT ); Mon, 29 Jun 2020 16:49:46 -0400 Received: from mail-m964.mail.126.com ([123.126.96.4]:54182 "EHLO mail-m964.mail.126.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731557AbgF2Utk (ORCPT ); Mon, 29 Jun 2020 16:49:40 -0400 X-Greylist: delayed 5420 seconds by postgrey-1.27 at vger.kernel.org; Mon, 29 Jun 2020 16:49:39 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=dZFXghp2i9BkRXeQIr QISjtevrHQgS0lHVzMWGLDDZI=; b=lj9PiDYYMZOxqGFVGXaCBp+s71xwrYi66o tNIizvnECV5i9rd0sDxGOaFrjeO8jd5GfhEa374YfNLPRbxAhd5Fl9FS59zPG8tO DDPUNDhRhzg8wyXCQeQf7diukTLsSS6TQFGa7dEI7h2WOEjDW6BrsvyGm2SsyKME Lp/tyPGbA= Received: from xr-hulk-k8s-node1933.gh.sankuai.com (unknown [101.236.11.2]) by smtp9 (Coremail) with SMTP id NeRpCgCHnS07uPlezqHWAg--.1769S2; Mon, 29 Jun 2020 17:45:36 +0800 (CST) From: Jiang Ying To: Markus.Elfring@web.de, tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, wanglong19@meituan.com, heguanjun@meituan.com Subject: [PATCH v3] ext4: fix direct I/O read error Date: Mon, 29 Jun 2020 17:45:30 +0800 Message-Id: <1593423930-5576-1-git-send-email-jiangying8582@126.com> X-Mailer: git-send-email 1.8.3.1 X-CM-TRANSID: NeRpCgCHnS07uPlezqHWAg--.1769S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxAw47WFW8tw4rCr1UZr4xZwb_yoWrGry5pr nxCa15WrZ5Zr4xCanrK3ZrZFyFy3yDGFWUXry5u34UZr4Yg3s5KFWxKF17C3yUGrWF9w4F qFZ8tryfAw1UAFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07U_cTdUUUUU= X-Originating-IP: [101.236.11.2] X-CM-SenderInfo: xmld0wp1lqwmqvysqiyswou0bp/1tbiXABSAFpEA7E69AAAss Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch is used to fix ext4 direct I/O read error when the read size is not aligned with block size. Then, I will use a test to explain the error. (1) Make a file that is not aligned with block size: $dd if=/dev/zero of=./test.jar bs=1000 count=3 (2) I wrote a source file named "direct_io_read_file.c" as following: #include #include #include #include #include #include #include #define BUF_SIZE 1024 int main() { int fd; int ret; unsigned char *buf; ret = posix_memalign((void **)&buf, 512, BUF_SIZE); if (ret) { perror("posix_memalign failed"); exit(1); } fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755); if (fd < 0){ perror("open ./test.jar failed"); exit(1); } do { ret = read(fd, buf, BUF_SIZE); printf("ret=%d\n",ret); if (ret < 0) { perror("write test.jar failed"); } } while (ret > 0); free(buf); close(fd); } (3) Compile the source file: $gcc direct_io_read_file.c -D_GNU_SOURCE (4) Run the test program: $./a.out The result is as following: ret=1024 ret=1024 ret=952 ret=-1 write test.jar failed: Invalid argument. I have tested this program on XFS filesystem, XFS does not have this problem, because XFS use iomap_dio_rw() to do direct I/O read. And the comparing between read offset and file size is done in iomap_dio_rw(), the code is as following: if (pos < size) { retval = filemap_write_and_wait_range(mapping, pos, pos + iov_length(iov, nr_segs) - 1); if (!retval) { retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); } ... } ...only when "pos < size", direct I/O can be done, or 0 will be return. I have tested the fix patch on Ext4, it is up to the mustard of EINVAL in man2(read) as following: #include ssize_t read(int fd, void *buf, size_t count); EINVAL fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned. So I think this patch can be applied to fix ext4 direct I/O error. However Ext4 introduces direct I/O read using iomap infrastructure on kernel 5.5, the patch is commit ("ext4: introduce direct I/O read using iomap infrastructure"), then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct I/O read. So this problem does not exist on kernel 5.5 for Ext4. >From above description, we can see this problem exists on all the kernel versions between kernel 3.14 and kernel 5.4. Please apply this patch on these kernel versions, or please use the method on kernel 5.5 to fix this problem. Fixes: 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read") Co-developed-by: Wang Long Signed-off-by: Wang Long Signed-off-by: Jiang Ying Changes since V2: Optimize the description of the commit message and make a variation for the patch, e.g. with: Before: loff_t size; size = i_size_read(inode); After: loff_t size = i_size_read(inode); Changes since V1: Signed-off use real name and add "Fixes:" flag --- fs/ext4/inode.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 516faa2..a66b0ac 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3821,6 +3821,11 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) struct inode *inode = mapping->host; size_t count = iov_iter_count(iter); ssize_t ret; + loff_t offset = iocb->ki_pos; + loff_t size = i_size_read(inode); + + if (offset >= size) + return 0; /* * Shared inode_lock is enough for us - it protects against concurrent -- 1.8.3.1