All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Jan Kara <jack@suse.cz>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock
Date: Thu, 12 Oct 2017 15:53:06 +0100	[thread overview]
Message-ID: <20171012145306.2lepcjtpdxshua6j@techsingularity.net> (raw)
In-Reply-To: <20171012133323.GB29293@quack2.suse.cz>

On Thu, Oct 12, 2017 at 03:33:23PM +0200, Jan Kara wrote:
> >  		return;
> >  
> > -	if (dax_mapping(mapping)) {
> > -		dax_delete_mapping_entry(mapping, index);
> > -		return;
> > +	dax = dax_mapping(mapping);
> > +	if (!dax)
> > +		spin_lock_irq(&mapping->tree_lock);
> > +
> > +	for (i = ei, j = ei; i < pagevec_count(pvec); i++) {
> > +		struct page *page = pvec->pages[i];
> > +		pgoff_t index = indices[i];
> > +
> > +		if (!radix_tree_exceptional_entry(page)) {
> > +			pvec->pages[j++] = page;
> > +			continue;
> > +		}
> > +
> > +		if (unlikely(dax)) {
> > +			dax_delete_mapping_entry(mapping, index);
> > +			continue;
> > +		}
> > +
> > +		__clear_shadow_entry(mapping, index, page);
> >  	}
> > -	clear_shadow_entry(mapping, index, entry);
> > +
> > +	if (!dax)
> > +		spin_unlock_irq(&mapping->tree_lock);
> > +	pvec->nr = j;
> >  }
> 
> When I look at this I think could make things cleaner. I have the following
> observations:
> 
> 1) All truncate_inode_pages(), invalidate_mapping_pages(),
> invalidate_inode_pages2_range() essentially do very similar thing and would
> benefit from a similar kind of batching.
> 

While this is true, the benefit is much more marginal that I didn't feel
the level of churn was justified. Primarily it would help fadvise() and
invalidating when buffered and direct IO is mixed. I didn't think it would
be that much cleaner as a result so I left it.

> 2) As you observed and measured, batching of radix tree operations makes
> sense both when removing pages and shadow entries, I'm very confident it
> would make sense for DAX exceptional entries as well.
> 

True, but I didn't have a suitable setup for testing DAX so I wasn't
comfortable with making the change. dax_delete_mapping_entry can sleep but it
should be as simple as not taking the spinlock in dax_delete_mapping_entry
and always locking in truncate_exceptional_pvec_entries. dax is already
releasing the mapping->tree_lock if it needs to sleep and I didn't spot
any other gotcha but I'd prefer that change was done by someone that can
verify it works properly.

> 3) In all cases (i.e., those three functions and for all entry types) the
> workflow seems to be:
>   * lockless lookup of entries
>   * prepare entry for reclaim (or determine it is not elligible)
>   * lock mapping->tree_lock
>   * verify entry is still elligible for reclaim (otherwise bail)
>   * clear radix tree entry
>   * unlock mapping->tree_lock
>   * final cleanup of the entry
> 
> So I'm wondering whether we cannot somehow refactor stuff so that batching
> of radix tree operations could be shared and we wouldn't have to duplicate
> it in all those cases.
> 
> But it would be rather large overhaul of the code so it may be a bit out of
> scope for these improvements...
> 

I think it would be out of scope for this improvement but I can look into
it if the series is accepted. I think it would be a lot of churn for fairly
marginal benefit though.

> > @@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
> >  			}
> >  
> >  			if (radix_tree_exceptional_entry(page)) {
> > -				truncate_exceptional_entry(mapping, index,
> > -							   page);
> > +				if (ei != PAGEVEC_SIZE)
> > +					ei = i;
> 
> This should be ei == PAGEVEC_SIZE I think.
> 
> Otherwise the patch looks good to me so feel free to add:
> 

Fixed.

> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@techsingularity.net>
To: Jan Kara <jack@suse.cz>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock
Date: Thu, 12 Oct 2017 15:53:06 +0100	[thread overview]
Message-ID: <20171012145306.2lepcjtpdxshua6j@techsingularity.net> (raw)
In-Reply-To: <20171012133323.GB29293@quack2.suse.cz>

On Thu, Oct 12, 2017 at 03:33:23PM +0200, Jan Kara wrote:
> >  		return;
> >  
> > -	if (dax_mapping(mapping)) {
> > -		dax_delete_mapping_entry(mapping, index);
> > -		return;
> > +	dax = dax_mapping(mapping);
> > +	if (!dax)
> > +		spin_lock_irq(&mapping->tree_lock);
> > +
> > +	for (i = ei, j = ei; i < pagevec_count(pvec); i++) {
> > +		struct page *page = pvec->pages[i];
> > +		pgoff_t index = indices[i];
> > +
> > +		if (!radix_tree_exceptional_entry(page)) {
> > +			pvec->pages[j++] = page;
> > +			continue;
> > +		}
> > +
> > +		if (unlikely(dax)) {
> > +			dax_delete_mapping_entry(mapping, index);
> > +			continue;
> > +		}
> > +
> > +		__clear_shadow_entry(mapping, index, page);
> >  	}
> > -	clear_shadow_entry(mapping, index, entry);
> > +
> > +	if (!dax)
> > +		spin_unlock_irq(&mapping->tree_lock);
> > +	pvec->nr = j;
> >  }
> 
> When I look at this I think could make things cleaner. I have the following
> observations:
> 
> 1) All truncate_inode_pages(), invalidate_mapping_pages(),
> invalidate_inode_pages2_range() essentially do very similar thing and would
> benefit from a similar kind of batching.
> 

While this is true, the benefit is much more marginal that I didn't feel
the level of churn was justified. Primarily it would help fadvise() and
invalidating when buffered and direct IO is mixed. I didn't think it would
be that much cleaner as a result so I left it.

> 2) As you observed and measured, batching of radix tree operations makes
> sense both when removing pages and shadow entries, I'm very confident it
> would make sense for DAX exceptional entries as well.
> 

True, but I didn't have a suitable setup for testing DAX so I wasn't
comfortable with making the change. dax_delete_mapping_entry can sleep but it
should be as simple as not taking the spinlock in dax_delete_mapping_entry
and always locking in truncate_exceptional_pvec_entries. dax is already
releasing the mapping->tree_lock if it needs to sleep and I didn't spot
any other gotcha but I'd prefer that change was done by someone that can
verify it works properly.

> 3) In all cases (i.e., those three functions and for all entry types) the
> workflow seems to be:
>   * lockless lookup of entries
>   * prepare entry for reclaim (or determine it is not elligible)
>   * lock mapping->tree_lock
>   * verify entry is still elligible for reclaim (otherwise bail)
>   * clear radix tree entry
>   * unlock mapping->tree_lock
>   * final cleanup of the entry
> 
> So I'm wondering whether we cannot somehow refactor stuff so that batching
> of radix tree operations could be shared and we wouldn't have to duplicate
> it in all those cases.
> 
> But it would be rather large overhaul of the code so it may be a bit out of
> scope for these improvements...
> 

I think it would be out of scope for this improvement but I can look into
it if the series is accepted. I think it would be a lot of churn for fairly
marginal benefit though.

> > @@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
> >  			}
> >  
> >  			if (radix_tree_exceptional_entry(page)) {
> > -				truncate_exceptional_entry(mapping, index,
> > -							   page);
> > +				if (ei != PAGEVEC_SIZE)
> > +					ei = i;
> 
> This should be ei == PAGEVEC_SIZE I think.
> 
> Otherwise the patch looks good to me so feel free to add:
> 

Fixed.

> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-12 14:53 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12  9:30 [PATCH 0/8] Follow-up for speed up page cache truncation Mel Gorman
2017-10-12  9:30 ` Mel Gorman
2017-10-12  9:30 ` [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages Mel Gorman
2017-10-12  9:30   ` Mel Gorman
2017-10-12  9:30 ` [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated Mel Gorman
2017-10-12  9:30   ` Mel Gorman
2017-10-12 12:15   ` Jan Kara
2017-10-12 12:15     ` Jan Kara
2017-10-12 12:41     ` Mel Gorman
2017-10-12 12:41       ` Mel Gorman
2017-10-12 19:11   ` Johannes Weiner
2017-10-12 19:11     ` Johannes Weiner
2017-10-12  9:30 ` [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Mel Gorman
2017-10-12  9:30   ` Mel Gorman
2017-10-12 13:33   ` Jan Kara
2017-10-12 13:33     ` Jan Kara
2017-10-12 14:53     ` Mel Gorman [this message]
2017-10-12 14:53       ` Mel Gorman
2017-10-12 19:45   ` Johannes Weiner
2017-10-12 19:45     ` Johannes Weiner
2017-10-12  9:30 ` [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage Mel Gorman
2017-10-12  9:30   ` Mel Gorman
2017-10-12  9:31 ` [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs Mel Gorman
2017-10-12  9:31   ` Mel Gorman
2017-10-12  9:31 ` [PATCH 6/8] mm: Remove cold parameter for release_pages Mel Gorman
2017-10-12  9:31 ` [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page* Mel Gorman
2017-10-12  9:31   ` Mel Gorman
2017-10-12  9:31 ` [PATCH 8/8] mm: Remove __GFP_COLD Mel Gorman
2017-10-12  9:31   ` Mel Gorman
2017-10-18  7:59 [PATCH 0/8] Follow-up for speed up page cache truncation v2 Mel Gorman
2017-10-18  7:59 ` [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Mel Gorman
2017-10-18  7:59   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171012145306.2lepcjtpdxshua6j@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.