linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	akpm@linux-foundation.org, aarcange@redhat.com,
	dan.j.williams@intel.com, dave.hansen@intel.com,
	konrad.wilk@oracle.com, lcapitulino@redhat.com,
	mm-commits@vger.kernel.org, mst@redhat.com, osalvador@suse.de,
	pagupta@redhat.com, pbonzini@redhat.com, riel@surriel.com,
	vbabka@suse.cz, wei.w.wang@intel.com, willy@infradead.org,
	yang.zhang.wz@gmail.com, linux-mm@kvack.org
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
Date: Thu, 7 Nov 2019 10:20:45 +0000	[thread overview]
Message-ID: <20191107102045.GS3016@techsingularity.net> (raw)
In-Reply-To: <673862eb7f0425f638ea3fc507d0e8049ee4133c.camel@linux.intel.com>

On Wed, Nov 06, 2019 at 04:20:56PM -0800, Alexander Duyck wrote:
> > > I get that. But v10 was posted in mid September. Back then we had a
> > > discussion about addressing what Mel had mentioned and I had mentioned
> > > then that I had addressed it by allowing compaction to essentially reset
> > > the reporter to get it out of the list so compaction could do this split
> > > and splice tumbling logic.
> > > 
> > 
> > At the time I said "While in theory that could be addressed by always
> > going through an interface maintained by the page allocator, it would be
> > tricky to test the virtio case in particular."
> > 
> > Now, I get that you added an interface for that *but* if compaction was
> > ever updated or yet another approach was taken to deal with it, virtio
> > could get broken. If the page allocator itself has a bug or compaction
> > has a bug, the effect is very visible. While stuff does slip through, it
> > tends to be an obscure corner case. However, when virtio gets broken,
> > it'll be a long time before we get it.
> 
> Specifically all we are doing is walking through a list using a pointer as
> an iterator.  I would think we would likely see the page reporting blow up
> pretty quickly if we were to somehow mess up the list so bad that it still
> had access to a page that was no longer on the list. Other than that if
> the list is just shuffled without resetting the pointer then worst case
> would be that we end up with the reporting being rearmed as soon as we
> were complete due to a batch of unreported pages being shuffled past the
> iterator.
> 

And what I'm saying is that the initial version should have focused
exclusively on the mechanism to report free pages and kept the search as
simple as possible with optimisations on top. By all means track pages
that are already reported and skip them because that is a relatively basic
operation with the caveat that even the page_reported checks should be
behind a static branch if there is no virtio balloon driver loaded.

The page reporting to a hypervisor is done out of band from any application
so even if it takes a little longer to do that reporting, is there an
impact really? If so, how much and what frequency is this impact incurred?
The primary impact I can see is that free pages are invisible while the
notification takes place so the window for that may be wider if it takes
longer to find enough pages to send in batch but there will always be a
window where the free pages are invisible.

> > <SNIP>
> >
> > What confused me quite a lot is that this is enabled at compile time
> > and then incurs a performance hit whether there is a hypervisor that
> > even cares is involved or not. So I don't think the performance angle
> > justifying this approach is a good one because this implementation has
> > issues of its own. Previously I said
> 
> We are adding code. There is always going to be some performance impact.

And that impact should be negligible in so far as possible, parts of
it are protected by branches but as it stands, just building the virtio
balloon driver enables all of this whether the running system is using
virtual machines or not.

> > Adding an API for compaction does not get away from the problem that
> > it'll be fragile to depend on the internal state of the allocator for
> > correctness. Existing users that poke into the state do so as an
> > optimistic shortcut but if it fails, nothing actually breaks. The free
> > list reporting stuff might and will not be routinely tested.
> 
> I view what I am doing as not being too different from that. I am only
> maintaining state when I can. I understand that it makes things more
> fragile as we have more routes where things could go wrong, but isn't that
> the case with adding any interface. I have simplified this about as far as
> it can go.
> 

The enhanced tracking of state is not necessary initially to simply report
free pages to the hypervisor. 

> All I am tracking is basically a pointer into the freelists so that we can
> remember where we left off, and adding a flag indicating that those
> pointers are there. If the flag is cleared we stop using the pointers. We
> can also just reset to the list_head when an individual freelist is
> manipulated since the list_head itself will never actually go anywhere.
> 
> > Take compaction as an example, the initial implementation of it was dumb
> > as rocks and only started maintaining additional state and later poking
> > into the page allocator when there was empirical evidence it was necessary.
> 
> The issue is that we are trying to do this iteratively. As such I need to
> store some sort of state so that once I go off to report the small bunch
> of pages I have collected I have some way to quickly get back to where I
> was. Without doing that I burn a large number of cycles having to rewalk
> the list.
> 
> That is why I rewrote this so that we can have the list get completely
> shuffled and we don't care as long as our iterator is reset to the
> list_head, or the flag indicating that the iterators are active is
> cleared.
> 

And again, it's not clear that the additional complexity is required.
Specifically, it'll be very hard to tell if the state tracking is
actually helping because excessive list shuffling due to compaction may
mean that a lot of state is being tracked while the full free list ends
up having to be searched anyway.

As it stands, the optimisations are hard-wired into the providing of
the feature itself making it an all or nothing approach and no option to
"merge a bare-bones mechanism that is behind static branches and build
optimisations on top". At least that way, any competing approach could
be based on optimisations alone while the feature is still available.

It also doesn't help that reviewing the series takes a lot of jumping
around. This patch has a kfree of a structure that is not allocated
until a later patch for example.

-- 
Mel Gorman
SUSE Labs


  reply	other threads:[~2019-11-07 10:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191106000547.juQRi83gi%akpm@linux-foundation.org>
2019-11-06 12:16 ` + mm-introduce-reported-pages.patch added to -mm tree Michal Hocko
2019-11-06 14:09   ` David Hildenbrand
2019-11-06 16:35     ` Alexander Duyck
2019-11-06 16:54       ` Michal Hocko
2019-11-06 17:48         ` Alexander Duyck
2019-11-06 22:11           ` Mel Gorman
2019-11-06 23:38             ` David Hildenbrand
2019-11-07  0:20             ` Alexander Duyck
2019-11-07 10:20               ` Mel Gorman [this message]
2019-11-07 16:07                 ` Alexander Duyck
2019-11-08  9:43                   ` Mel Gorman
2019-11-08 16:17                     ` Alexander Duyck
2019-11-08 18:41                       ` Mel Gorman
2019-11-08 20:29                         ` Alexander Duyck
2019-11-09 14:57                           ` Mel Gorman
2019-11-10 18:03                             ` Alexander Duyck
2019-11-06 23:33           ` David Hildenbrand
2019-11-07  0:20             ` Dave Hansen
2019-11-07  0:52               ` David Hildenbrand
2019-11-07 17:12                 ` Dave Hansen
2019-11-07 17:46                   ` Michal Hocko
2019-11-07 18:08                     ` Dave Hansen
2019-11-07 18:12                     ` Alexander Duyck
2019-11-08  9:57                       ` Michal Hocko
2019-11-08 16:43                         ` Alexander Duyck
2019-11-07 18:46                   ` Qian Cai
2019-11-07 18:02             ` Alexander Duyck
2019-11-07 19:37               ` Nitesh Narayan Lal
2019-11-07 22:46                 ` Alexander Duyck
2019-11-07 22:43               ` David Hildenbrand
2019-11-08  0:42                 ` Alexander Duyck
2019-11-08  7:06                   ` David Hildenbrand
2019-11-08 17:18                     ` Alexander Duyck
2019-11-12 13:04                       ` David Hildenbrand
2019-11-12 18:34                         ` Alexander Duyck
2019-11-12 21:05                           ` David Hildenbrand
2019-11-12 22:17                             ` David Hildenbrand
2019-11-12 22:19                             ` Alexander Duyck
2019-11-12 23:10                               ` David Hildenbrand
2019-11-13  0:31                                 ` Alexander Duyck
2019-11-13 18:51                           ` Nitesh Narayan Lal
2019-11-06 16:49   ` Nitesh Narayan Lal
2019-11-11 18:52   ` Nitesh Narayan Lal
2019-11-11 22:00     ` Alexander Duyck
2019-11-12 15:19       ` Nitesh Narayan Lal
2019-11-12 16:18         ` Alexander Duyck
2019-11-13 18:39           ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191107102045.GS3016@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).