From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: Nitesh Narayan Lal <nitesh@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
David Hildenbrand <david@redhat.com>,
virtio-dev@lists.oasis-open.org, kvm list <kvm@vger.kernel.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
Dave Hansen <dave.hansen@intel.com>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Vlastimil Babka <vbabka@suse.cz>,
Oscar Salvador <osalvador@suse.de>,
Yang Zhang <yang.zhang.wz@gmail.com>,
Pankaj Gupta <pagupta@redhat.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Rik van Riel <riel@surriel.com>,
lcapitulino@redhat.com, "Wang, Wei W" <wei.w.wang@intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v11 0/6] mm / virtio: Provide support for unused page reporting
Date: Mon, 07 Oct 2019 09:27:24 -0700 [thread overview]
Message-ID: <5b6e0b6df46c03bfac906313071ac0362d43c432.camel@linux.intel.com> (raw)
In-Reply-To: <7fc13837-546c-9c4a-1456-753df199e171@redhat.com>
On Mon, 2019-10-07 at 12:19 -0400, Nitesh Narayan Lal wrote:
> On 10/7/19 11:33 AM, Alexander Duyck wrote:
> > On Mon, 2019-10-07 at 08:29 -0400, Nitesh Narayan Lal wrote:
> > > On 10/2/19 10:25 AM, Alexander Duyck wrote:
> > >
> [...]
> > > You don't have to, I can fix the issues in my patch-set. :)
> > > > Sounds good. Hopefully the stuff I pointed out above helps you to get
> > > > a reproduction and resolve the issues.
> > > So I did observe a significant drop in running my v12 path-set [1] with the
> > > suggested test setup. However, on making certain changes the performance
> > > improved significantly.
> > >
> > > I used my v12 patch-set which I have posted earlier and made the following
> > > changes:
> > > 1. Started reporting only (MAX_ORDER - 1) pages and increased the number of
> > > pages that can be reported at a time to 32 from 16. The intent of making
> > > these changes was to bring my configuration closer to what Alexander is
> > > using.
> > The increase from 16 to 32 is valid. No point in working in too small of
> > batches. However tightening the order to only test for MAX_ORDER - 1 seems
> > like a step in the wrong direction. The bitmap approach doesn't have much
> > value if it can only work with the highest order page. I realize it is
> > probably necessary in order to make the trick for checking on page_buddy
> > work, but it seems very limiting.
>
> If using (pageblock_order - 1) is a better way to do this, then I can probably
> switch to that.
> I will agree with the fact that we have to make the reporting order
> configurable, atleast to an extent.
I think you mean pageblock_order, not pageblock_order - 1. The problem
with pageblock_order - 1 is that it will have a negative impact on
performance as it would disable THP.
> > > 2. I made an additional change in my bitmap scanning logic to prevent acquiring
> > > spinlock if the page is already allocated.
> > Again, not a fan. It basically means you can only work with MAX_ORDER - 1
> > and there will be no ability to work with anything smaller.
> >
> > > Setup:
> > > On a 16 vCPU 30 GB single NUMA guest affined to a single host NUMA, I ran the
> > > modified will-it-scale/page_fault number of times and calculated the average
> > > of the number of process and threads launched on the 16th core to compare the
> > > impact of my patch-set against an unmodified kernel.
> > >
> > >
> > > Conclusion:
> > > %Drop in number of processes launched on 16th vCPU = 1-2%
> > > %Drop in number of threads launched on 16th vCPU = 5-6%
> > These numbers don't make that much sense to me. Are you talking about a
> > fully functioning setup that is madvsing away the memory in the
> > hypervisor?
>
> Without making this change I was observing a significant amount of drop
> in the number of processes and specifically in the number of threads.
> I did a double-check of the configuration which I have shared.
> I was also observing the "AnonHugePages" via meminfo to check the THP usage.
> Any more suggestions about what else I can do to verify?
> I will be more than happy to try them out.
So what was the size of your guest? One thing that just occurred to me is
that you might be running a much smaller guest than I was.
> > If so I would have expected a much higher difference versus
> > baseline as zeroing/faulting the pages in the host gets expensive fairly
> > quick. What is the host kernel you are running your test on? I'm just
> > wondering if there is some additional overhead currently limiting your
> > setup. My host kernel was just the same kernel I was running in the guest,
> > just built without the patches applied.
>
> Right now I have a different host-kernel. I can install the same kernel to the
> host as well and see if that changes anything.
The host kernel will have a fairly significant impact as I recall. For
example running a stock CentOS kernel lowered the performance compared to
running a linux-next kernel. As a result the numbers looked better since
the overall baseline was lower to begin with as the host OS was
introducing additional overhead.
> > > Other observations:
> > > - I also tried running Alexander's latest v11 page-reporting patch set and
> > > observe a similar amount of average degradation in the number of processes
> > > and threads.
> > > - I didn't include the linear component recorded by will-it-scale because for
> > > some reason it was fluctuating too much even when I was using an unmodified
> > > kernel. If required I can investigate this further.
> > >
> > > Note: If there is a better way to analyze the will-it-scale/page_fault results
> > > then please do let me know.
> > Honestly I have mostly just focused on the processes performance.
>
> In my observation processes seems to be most consistent in general.
Agreed.
> > There is
> > usually a fair bit of variability but a pattern forms after a few runs so
> > you can generally tell if a configuration is an improvement or not.
>
> Yeah, that's why I thought of taking the average of 5-6 runs.
Same here. I am usually running about 5 iterations.
> > > Other setup details:
> > > Following are the configurations which I enabled to run my tests:
> > > - Enabled: CONFIG_SLAB_FREELIST_RANDOM & CONFIG_SHUFFLE_PAGE_ALLOCATOR
> > > - Set host THP to always
> > > - Set guest THP to madvise
> > > - Added the suggested madvise call in page_fault source code.
> > > @Alexander please let me know if I missed something.
> > This seems about right.
> >
> > > The current state of my v13:
> > > I still have to look into Michal's suggestion of using page-isolation API's
> > > instead of isolating the page. However, I believe at this moment our objective
> > > is to decide with which approach we can proceed and that's why I decided to
> > > post the numbers by making small required changes in v12 instead of posting a
> > > new series.
> > >
> > >
> > > Following are the changes which I have made on top of my v12:
> > >
> > > page_reporting.h change:
> > > -#define PAGE_REPORTING_MIN_ORDER (MAX_ORDER - 2)
> > > -#define PAGE_REPORTING_MAX_PAGES 16
> > > +#define PAGE_REPORTING_MIN_ORDER (MAX_ORDER - 1)
> > > +#define PAGE_REPORTING_MAX_PAGES 32
> > >
> > > page_reporting.c change:
> > > @@ -101,8 +101,12 @@ static void scan_zone_bitmap(struct page_reporting_config
> > > *phconf,
> > > /* Process only if the page is still online */
> > > page = pfn_to_online_page((setbit << PAGE_REPORTING_MIN_ORDER) +
> > > zone->base_pfn);
> > > - if (!page)
> > > + if (!page || !PageBuddy(page)) {
> > > + clear_bit(setbit, zone->bitmap);
> > > + atomic_dec(&zone->free_pages);
> > > continue;
> > > + }
> > >
> > I suspect the zone->free_pages is going to be expensive for you to deal
> > with. It is a global atomic value and is going to have the cacheline
> > bouncing that it is contained in. As a result thinks like setting the
> > bitmap with be more expensive as every tome a CPU increments free_pages it
> > will likely have to take the cache line containing the bitmap pointer as
> > well.
>
> I see I will have to explore this more. I am wondering if there is a way to
> measure this If its effect is not visible in will-it-scale/page_fault1. If
> there is a noticeable amount of degradation, I will have to address this.
If nothing else you might look at seeing if you can split up the
structures so that the bitmap and nr_bits is in a different region
somewhere since those are read-mostly values.
Also you are now updating the bitmap and free_pages both inside and
outside of the zone lock so that will likely have some impact.
> > > @Alexander in case you decide to give it a try and find different results,
> > > please do let me know.
> > >
> > > [1] https://lore.kernel.org/lkml/20190812131235.27244-1-nitesh@redhat.com/
> > >
> > >
> > If I have some free time I will take a look.
>
> That would be great, thanks.
>
> > However one thing that
> > concerns me about this change is that it will limit things much further in
> > terms of how much memory can ultimately be freed since you are now only
> > working with the highest order page and that becomes a hard requirement
> > for your design.
>
> I would assume that should be resolved with (pageblock_order - 1).
There is no need for the - 1. The pageblock_order value is the lowest you
can go before you start causing THP to be disabled. If you cross that
threshold the performance will drop significantly.
next prev parent reply other threads:[~2019-10-07 16:27 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-01 15:29 [PATCH v11 0/6] mm / virtio: Provide support for unused page reporting Alexander Duyck
2019-10-01 15:29 ` [PATCH v11 1/6] mm: Adjust shuffle code to allow for future coalescing Alexander Duyck
2019-10-01 15:29 ` [PATCH v11 2/6] mm: Use zone and order instead of free area in free_list manipulators Alexander Duyck
2019-10-15 0:42 ` [mm] 2eca680594: will-it-scale.per_process_ops -2.5% regression kernel test robot
2019-10-01 15:29 ` [PATCH v11 3/6] mm: Introduce Reported pages Alexander Duyck
2019-10-01 15:29 ` [PATCH v11 4/6] mm: Add device side and notifier for unused page reporting Alexander Duyck
2019-10-01 15:29 ` [PATCH v11 5/6] virtio-balloon: Pull page poisoning config out of free page hinting Alexander Duyck
2019-10-01 15:29 ` [PATCH v11 6/6] virtio-balloon: Add support for providing unused page reports to host Alexander Duyck
2019-10-01 15:31 ` [PATCH v11 QEMU 1/3] virtio-ballon: Implement support for page poison tracking feature Alexander Duyck
2019-10-01 15:31 ` [PATCH v11 QEMU 2/3] virtio-balloon: Add bit to notify guest of unused page reporting Alexander Duyck
2019-10-01 15:31 ` [PATCH v11 QEMU 3/3] virtio-balloon: Provide a interface for " Alexander Duyck
2019-10-01 15:35 ` [PATCH v11 0/6] mm / virtio: Provide support " David Hildenbrand
2019-10-01 16:21 ` Alexander Duyck
2019-10-01 18:41 ` David Hildenbrand
2019-10-01 19:17 ` Nitesh Narayan Lal
2019-10-01 19:08 ` Michael S. Tsirkin
2019-10-01 19:16 ` Nitesh Narayan Lal
2019-10-01 20:25 ` Alexander Duyck
2019-10-01 20:49 ` Alexander Duyck
2019-10-01 20:51 ` Dave Hansen
2019-10-02 15:04 ` Nitesh Narayan Lal
2019-10-02 14:41 ` [virtio-dev] " Nitesh Narayan Lal
2019-10-02 0:55 ` Alexander Duyck
2019-10-02 7:13 ` David Hildenbrand
2019-10-02 10:44 ` Nitesh Narayan Lal
2019-10-02 10:36 ` Nitesh Narayan Lal
2019-10-02 14:25 ` Alexander Duyck
2019-10-02 14:36 ` Nitesh Narayan Lal
2019-10-07 12:29 ` Nitesh Narayan Lal
2019-10-07 15:33 ` Alexander Duyck
2019-10-07 16:19 ` Nitesh Narayan Lal
2019-10-07 16:27 ` Alexander Duyck [this message]
2019-10-07 17:06 ` Nitesh Narayan Lal
2019-10-07 17:20 ` Alexander Duyck
2019-10-09 16:25 ` Nitesh Narayan Lal
2019-10-09 16:50 ` Alexander Duyck
2019-10-09 17:08 ` Nitesh Narayan Lal
2019-10-09 17:26 ` Alexander Duyck
2019-10-09 15:21 ` Nitesh Narayan Lal
2019-10-09 16:35 ` Alexander Duyck
2019-10-09 19:46 ` Nitesh Narayan Lal
2019-10-10 7:36 ` David Hildenbrand
2019-10-10 10:27 ` Nitesh Narayan Lal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b6e0b6df46c03bfac906313071ac0362d43c432.camel@linux.intel.com \
--to=alexander.h.duyck@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=lcapitulino@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mst@redhat.com \
--cc=nitesh@redhat.com \
--cc=osalvador@suse.de \
--cc=pagupta@redhat.com \
--cc=pbonzini@redhat.com \
--cc=riel@surriel.com \
--cc=vbabka@suse.cz \
--cc=virtio-dev@lists.oasis-open.org \
--cc=wei.w.wang@intel.com \
--cc=willy@infradead.org \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).