All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Dave Chinner <david@fromorbit.com>, Ted Tso <tytso@mit.edu>,
	linux-mm@kvack.org
Subject: Re: [PATCH 10/12] shmem: Use invalidate_lock to protect fallocate
Date: Thu, 29 Apr 2021 11:20:52 +0200	[thread overview]
Message-ID: <20210429092052.GA11234@quack2.suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.2104282004410.10848@eggly.anvils>

On Wed 28-04-21 20:24:59, Hugh Dickins wrote:
> On Fri, 23 Apr 2021, Jan Kara wrote:
> 
> > We have to handle pages added by currently running shmem_fallocate()
> > specially in shmem_writepage(). For this we use serialization mechanism
> > using structure attached to inode->i_private field. If we protect
> > allocation of pages in shmem_fallocate() with invalidate_lock instead,
> > we are sure added pages cannot be dirtied until shmem_fallocate() is done
> > (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we
> > cannot see those pages in shmem_writepage() and there's no need for the
> > serialization mechanism.
> 
> Appealing diffstat, but NAK, this patch is based on a false premise.
> 
> All those pages that are added by shmem_fallocate() are marked dirty:
> see the set_page_dirty(page) and comment above it in shmem_fallocate().

Aha, I missed that set_page_dirty(). Thanks for correcting me.

> It's intentional, that they should percolate through to shmem_writepage()
> when memory is tight, and give feedback to fallocate that it should stop
> (instead of getting into that horrid OOM-kill flurry that never even
> frees any tmpfs anyway).

I understand the reason, I just feel there should be a better way of
communicating memory pressure than ->writepage talking to ->fallocate
through structure attached to inode->i_private. But now I see it isn't as
simple as I thought ;) Anyway for now I'll just remove this patch. Thanks
for review!

								Honza

> > CC: Hugh Dickins <hughd@google.com>
> > CC: <linux-mm@kvack.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  mm/shmem.c | 61 ++++++------------------------------------------------
> >  1 file changed, 6 insertions(+), 55 deletions(-)
> > 
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index f34162ac46de..7a2b0744031e 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt;
> >  /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
> >  #define SHORT_SYMLINK_LEN 128
> >  
> > -/*
> > - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with
> > - * i_rwsem making sure that it has only one user at a time): we would prefer
> > - * not to enlarge the shmem inode just for that.
> > - */
> > -struct shmem_falloc {
> > -	pgoff_t start;		/* start of range currently being fallocated */
> > -	pgoff_t next;		/* the next page offset to be fallocated */
> > -	pgoff_t nr_falloced;	/* how many new pages have been fallocated */
> > -	pgoff_t nr_unswapped;	/* how often writepage refused to swap out */
> > -};
> > -
> >  struct shmem_options {
> >  	unsigned long long blocks;
> >  	unsigned long long inodes;
> > @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> >  	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
> >  	 * value into swapfile.c, the only way we can correctly account for a
> >  	 * fallocated page arriving here is now to initialize it and write it.
> > -	 *
> > -	 * That's okay for a page already fallocated earlier, but if we have
> > -	 * not yet completed the fallocation, then (a) we want to keep track
> > -	 * of this page in case we have to undo it, and (b) it may not be a
> > -	 * good idea to continue anyway, once we're pushing into swap.  So
> > -	 * reactivate the page, and let shmem_fallocate() quit when too many.
> 
> (b) there commenting on communicating back to fallocate by nr_unswapped.
> 
> > +	 * Since a page added by currently running fallocate call cannot be
> > +	 * dirtied and thus arrive here we know the fallocate has already
> > +	 * completed and we are fine writing it out.
> >  	 */
> >  	if (!PageUptodate(page)) {
> > -		if (inode->i_private) {
> > -			struct shmem_falloc *shmem_falloc;
> > -			spin_lock(&inode->i_lock);
> > -			shmem_falloc = inode->i_private;
> > -			if (shmem_falloc &&
> > -			    index >= shmem_falloc->start &&
> > -			    index < shmem_falloc->next)
> > -				shmem_falloc->nr_unswapped++;
> > -			else
> > -				shmem_falloc = NULL;
> > -			spin_unlock(&inode->i_lock);
> > -			if (shmem_falloc)
> > -				goto redirty;
> > -		}
> >  		clear_highpage(page);
> >  		flush_dcache_page(page);
> >  		SetPageUptodate(page);
> > @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  							 loff_t len)
> >  {
> >  	struct inode *inode = file_inode(file);
> > +	struct address_space *mapping = file->f_mapping;
> >  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> >  	struct shmem_inode_info *info = SHMEM_I(inode);
> > -	struct shmem_falloc shmem_falloc;
> >  	pgoff_t start, index, end;
> >  	int error;
> >  
> > @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  	inode_lock(inode);
> >  
> >  	if (mode & FALLOC_FL_PUNCH_HOLE) {
> > -		struct address_space *mapping = file->f_mapping;
> >  		loff_t unmap_start = round_up(offset, PAGE_SIZE);
> >  		loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
> >  
> > @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  		goto out;
> >  	}
> >  
> > -	shmem_falloc.start = start;
> > -	shmem_falloc.next  = start;
> > -	shmem_falloc.nr_falloced = 0;
> > -	shmem_falloc.nr_unswapped = 0;
> > -	spin_lock(&inode->i_lock);
> > -	inode->i_private = &shmem_falloc;
> > -	spin_unlock(&inode->i_lock);
> > -
> > +	down_write(&mapping->invalidate_lock);
> >  	for (index = start; index < end; index++) {
> >  		struct page *page;
> >  
> > @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  		 */
> >  		if (signal_pending(current))
> >  			error = -EINTR;
> > -		else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced)
> > -			error = -ENOMEM;
> >  		else
> >  			error = shmem_getpage(inode, index, &page, SGP_FALLOC);
> >  		if (error) {
> > @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  			goto undone;
> >  		}
> >  
> > -		/*
> > -		 * Inform shmem_writepage() how far we have reached.
> > -		 * No need for lock or barrier: we have the page lock.
> > -		 */
> > -		shmem_falloc.next++;
> > -		if (!PageUptodate(page))
> > -			shmem_falloc.nr_falloced++;
> > -
> >  		/*
> >  		 * If !PageUptodate, leave it that way so that freeable pages
> >  		 * can be recognized if we need to rollback on error later.
> 
> Which goes on to say:
> 
> 		 * But set_page_dirty so that memory pressure will swap rather
> 		 * than free the pages we are allocating (and SGP_CACHE pages
> 		 * might still be clean: we now need to mark those dirty too).
> 		 */
> 		set_page_dirty(page);
> 		unlock_page(page);
> 		put_page(page);
> 		cond_resched();
> 
> > @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  		i_size_write(inode, offset + len);
> >  	inode->i_ctime = current_time(inode);
> >  undone:
> > -	spin_lock(&inode->i_lock);
> > -	inode->i_private = NULL;
> > -	spin_unlock(&inode->i_lock);
> > +	up_write(&mapping->invalidate_lock);
> >  out:
> >  	inode_unlock(inode);
> >  	return error;
> > -- 
> > 2.26.2
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2021-04-29  9:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23 17:29 [PATCH 0/12 v4] fs: Hole punch vs page cache filling races Jan Kara
2021-04-23 17:29 ` [PATCH 01/12] mm: Fix comments mentioning i_mutex Jan Kara
2021-04-23 17:29 ` [PATCH 02/12] mm: Protect operations adding pages to page cache with invalidate_lock Jan Kara
2021-04-23 18:30   ` Matthew Wilcox
2021-04-23 18:30     ` [f2fs-dev] " Matthew Wilcox
2021-04-23 23:04   ` Dave Chinner
2021-04-23 23:04     ` [f2fs-dev] " Dave Chinner
2021-04-26 15:46     ` Jan Kara
2021-04-26 15:46       ` [f2fs-dev] " Jan Kara
2021-04-23 17:29 ` [PATCH 03/12] ext4: Convert to use mapping->invalidate_lock Jan Kara
2021-04-23 17:29 ` [PATCH 04/12] ext2: Convert to using invalidate_lock Jan Kara
2021-04-23 17:29 ` [PATCH 05/12] xfs: Convert to use invalidate_lock Jan Kara
2021-04-23 22:39   ` Dave Chinner
2021-04-23 17:29 ` [PATCH 06/12] zonefs: Convert to using invalidate_lock Jan Kara
2021-04-26  6:40   ` Damien Le Moal
2021-04-26 16:24     ` Jan Kara
2021-04-23 17:29 ` [PATCH 07/12] f2fs: " Jan Kara
2021-04-23 19:15   ` kernel test robot
2021-04-23 19:15     ` kernel test robot
2021-04-23 20:05   ` kernel test robot
2021-04-23 20:05     ` kernel test robot
2021-04-23 17:29 ` [PATCH 08/12] fuse: " Jan Kara
2021-04-23 17:29 ` [PATCH 09/12] shmem: " Jan Kara
2021-04-29  4:12   ` Hugh Dickins
2021-04-29  4:12     ` Hugh Dickins
2021-04-29  9:30     ` Jan Kara
2021-04-23 17:29 ` [PATCH 10/12] shmem: Use invalidate_lock to protect fallocate Jan Kara
2021-04-23 19:27   ` kernel test robot
2021-04-23 19:27     ` kernel test robot
2021-04-29  3:24   ` Hugh Dickins
2021-04-29  3:24     ` Hugh Dickins
2021-04-29  9:20     ` Jan Kara [this message]
2021-04-23 17:29 ` [PATCH 11/12] ceph: Fix race between hole punch and page fault Jan Kara
2021-04-23 17:29 ` [PATCH 12/12] cifs: " Jan Kara
2021-04-23 22:07 ` [PATCH 0/12 v4] fs: Hole punch vs page cache filling races Dave Chinner
2021-04-23 22:07   ` [f2fs-dev] " Dave Chinner
2021-04-23 23:51   ` Matthew Wilcox
2021-04-23 23:51     ` [f2fs-dev] " Matthew Wilcox
2021-04-24  6:11     ` Christoph Hellwig
2021-04-24  6:11       ` [f2fs-dev] " Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210429092052.GA11234@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.