Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

From: Yafang Shao <laoar.shao@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>, Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
Date: Wed, 23 Sep 2020 18:05:27 +0800	[thread overview]
Message-ID: <CALOAHbBjSd_t5DDGsetjb=Dj9_b+q8_Zem5mAXjjTUG69viZZQ@mail.gmail.com> (raw)
In-Reply-To: <20200922072324.GJ3117@suse.de>

On Tue, Sep 22, 2020 at 3:23 PM Mel Gorman <mgorman@suse.de> wrote:
>
> On Tue, Sep 22, 2020 at 10:12:31AM +0800, Yafang Shao wrote:
> > On Tue, Sep 22, 2020 at 6:34 AM Mel Gorman <mgorman@suse.de> wrote:
> > >
> > > On Mon, Sep 21, 2020 at 09:43:17AM +0800, Yafang Shao wrote:
> > > > Our users reported that there're some random latency spikes when their RT
> > > > process is running. Finally we found that latency spike is caused by
> > > > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > > > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > > > is uncertain, which may be tens millisecond.
> > > > That behavior is unreasonable, because this process is bound to a
> > > > specific CPU and the file is only accessed by itself, IOW, there should
> > > > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > > > unreasonable behavior is partially caused by the wrong comparation of the
> > > > number of invalidated pages and the number of the target. For example,
> > > >       if (count < (end_index - start_index + 1))
> > > > The count above is how many pages were invalidated in the local CPU, and
> > > > (end_index - start_index + 1) is how many pages should be invalidated.
> > > > The usage of (end_index - start_index + 1) is incorrect, because they
> > > > are virtual addresses, which may not mapped to pages. We'd better use
> > > > inode->i_data.nrpages as the target.
> > > >
> > >
> > > How does that work if the invalidation is for a subset of the file?
> > >
> >
> > I realized it as well. There are some solutions to improve it.
> >
> > Option 1, take the min as the target.
> > -                       if (count < (end_index - start_index + 1)) {
> > +                       target = min_t(unsigned long, inode->i_data.nrpages,
> > +                                      end_index - start_index + 1);
> > +                       if (count < target) {
> >                                 lru_add_drain_all();
> >
> > Option 2, change the prototype of  invalidate_mapping_pages and then
> > check how many pages were skipped.
> >
> > + struct invalidate_stat {
> > +    unsigned long skipped;       // how many pages were skipped
> > +    unsigned long invalidated;   // how many pages were invalidated
> > +};
> >
> > - unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > +unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > struct invalidate_stat *stat,
> >
>
> That would involve updating each caller and the struct is
> unnecessarily heavy. Create one that returns via **nr_lruvec. For
> invalidate_mapping_pages, pass in NULL as nr_lruvec.  Create a new helper
> for fadvise that accepts nr_lruvec. In the common helper, account for pages
> that are likely on an LRU and count them in nr_lruvec if !NULL. Update
> fadvise to drain only if pages were skipped that were on the lruvec. That
> should also deal with the case where holes have been punched between
> start and end.
>

Good suggestion, thanks Mel.
I will send v2.

-- 
Thanks
Yafang