All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"open list:FILESYSTEMS (VFS and infrastructure)" 
	<linux-fsdevel@vger.kernel.org>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] block: fallocate: avoid false positive on collision detection
Date: Thu, 7 Jan 2021 16:37:49 +0100	[thread overview]
Message-ID: <20210107153749.GH12990@quack2.suse.cz> (raw)
In-Reply-To: <20210107124022.900172-1-mlevitsk@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Thu 07-01-21 14:40:22, Maxim Levitsky wrote:
> Align start and end on page boundaries before calling
> invalidate_inode_pages2_range.
> 
> This might allow us to miss a collision if the write and the discard were done
> to the same page and do overlap but it is still better than returning -EBUSY
> if those writes didn't overlap.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks for getting back to this and I'm sorry I didn't get to this earlier
myself! I actually think the fix should be different as we discussed with
Darrick. Attached patch should fix the issue for you (I'll also post it
formally for inclusion).

								Honza

> ---
>  fs/block_dev.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 9e84b1928b94..97f0d16661b5 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1970,6 +1970,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	loff_t end = start + len - 1;
>  	loff_t isize;
>  	int error;
> +	pgoff_t invalidate_first_page, invalidate_last_page;
>  
>  	/* Fail if we don't recognize the flags. */
>  	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> @@ -2020,12 +2021,23 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  
>  	/*
>  	 * Invalidate again; if someone wandered in and dirtied a page,
> -	 * the caller will be given -EBUSY.  The third argument is
> -	 * inclusive, so the rounding here is safe.
> +	 * the caller will be given -EBUSY.
> +	 *
> +	 * If the start/end of the range is not page aligned, exclude the
> +	 * non aligned regions to avoid false positives.
>  	 */
> +	invalidate_first_page = DIV_ROUND_UP(start, PAGE_SIZE);
> +	invalidate_last_page = end >> PAGE_SHIFT;
> +
> +	if ((end + 1) & PAGE_MASK)
> +		invalidate_last_page--;
> +
> +	if (invalidate_last_page < invalidate_first_page)
> +		return 0;
> +
>  	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
> -					     start >> PAGE_SHIFT,
> -					     end >> PAGE_SHIFT);
> +					     invalidate_first_page,
> +					     invalidate_last_page);
>  }
>  
>  const struct file_operations def_blk_fops = {
> -- 
> 2.26.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-bdev-Do-not-return-EBUSY-if-bdev-discard-races-with-.patch --]
[-- Type: text/x-patch, Size: 1918 bytes --]

From 36f751ac88420a6bda8a3c161986455629dc80d4 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 7 Jan 2021 16:26:52 +0100
Subject: [PATCH] bdev: Do not return EBUSY if bdev discard races with write

blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.

Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
CC: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3e5b02f6606c..a97f43b49839 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1797,13 +1797,11 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 		return error;
 
 	/*
-	 * Invalidate again; if someone wandered in and dirtied a page,
-	 * the caller will be given -EBUSY.  The third argument is
-	 * inclusive, so the rounding here is safe.
+	 * Invalidate the page cache again; if someone wandered in and dirtied
+	 * a page, we just discard it - userspace has no way of knowing whether
+	 * the write happened before or after discard completing...
 	 */
-	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
-					     start >> PAGE_SHIFT,
-					     end >> PAGE_SHIFT);
+	return truncate_bdev_range(bdev, file->f_mode, start, end);
 }
 
 const struct file_operations def_blk_fops = {
-- 
2.26.2


  parent reply	other threads:[~2021-01-07 15:38 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 15:39 [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-11 15:39 ` [PATCH 1/2] file-posix: allow -EBUSY errors during write zeros on raw block devices Maxim Levitsky
2020-11-16 14:48   ` Kevin Wolf
2021-01-07 12:44     ` Maxim Levitsky
2020-11-11 15:39 ` [PATCH 2/2] qemu-img: align next status sector on destination alignment Maxim Levitsky
2020-11-12 12:40   ` Peter Lieven
2020-11-12 13:45     ` Eric Blake
2020-11-12 15:04       ` Maxim Levitsky
2021-03-18  9:57         ` Maxim Levitsky
2020-11-11 15:44 ` [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-12 11:19   ` Jan Kara
2020-11-12 11:19     ` Jan Kara
2020-11-12 12:00     ` Jan Kara
2020-11-12 12:00       ` Jan Kara
2020-11-12 22:08       ` Darrick J. Wong
2020-11-12 22:08         ` Darrick J. Wong
2020-12-07 17:23         ` Maxim Levitsky
2020-12-07 17:23           ` Maxim Levitsky
2021-01-07 12:40           ` [PATCH] block: fallocate: avoid false positive on collision detection Maxim Levitsky
2021-01-07 12:42             ` Maxim Levitsky
2021-01-07 15:37             ` Jan Kara [this message]
2020-11-12 15:38     ` [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-12 15:38       ` Maxim Levitsky
2020-11-13 10:07       ` Jan Kara
2020-11-13 10:07         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210107153749.GH12990@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=darrick.wong@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.