From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mel Gorman Subject: Re: Page allocator bottleneck Date: Fri, 3 Nov 2017 13:40:20 +0000 Message-ID: <20171103134020.3hwquerifnc6k6qw@techsingularity.net> References: <20170915102320.zqceocmvvkyybekj@techsingularity.net> <1c218381-067e-7757-ccc2-4e5befd2bfc3@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Cc: Linux Kernel Network Developers , linux-mm , David Miller , Jesper Dangaard Brouer , Eric Dumazet , Alexei Starovoitov , Saeed Mahameed , Eran Ben Elisha , Andrew Morton , Michal Hocko To: Tariq Toukan Return-path: Content-Disposition: inline In-Reply-To: <1c218381-067e-7757-ccc2-4e5befd2bfc3@mellanox.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Thu, Nov 02, 2017 at 07:21:09PM +0200, Tariq Toukan wrote: > > > On 18/09/2017 12:16 PM, Tariq Toukan wrote: > > > > > > On 15/09/2017 1:23 PM, Mel Gorman wrote: > > > On Thu, Sep 14, 2017 at 07:49:31PM +0300, Tariq Toukan wrote: > > > > Insights: Major degradation between #1 and #2, not getting any > > > > close to linerate! Degradation is fixed between #2 and #3. This is > > > > because page allocator cannot stand the higher allocation rate. In > > > > #2, we also see that the addition of rings (cores) reduces BW (!!), > > > > as result of increasing congestion over shared resources. > > > > > > > > > > Unfortunately, no surprises there. > > > > > > > Congestion in this case is very clear. When monitored in perf > > > > top: 85.58% [kernel] [k] queued_spin_lock_slowpath > > > > > > > > > > While it's not proven, the most likely candidate is the zone lock > > > and that should be confirmed using a call-graph profile. If so, then > > > the suggestion to tune to the size of the per-cpu allocator would > > > mitigate the problem. > > > > > Indeed, I tuned the per-cpu allocator and bottleneck is released. > > > > Hi all, > > After leaving this task for a while doing other tasks, I got back to it now > and see that the good behavior I observed earlier was not stable. > > Recall: I work with a modified driver that allocates a page (4K) per packet > (MTU=1500), in order to simulate the stress on page-allocator in 200Gbps > NICs. > There is almost new in the data that hasn't been discussed before. The suggestion to free on a remote per-cpu list would be expensive as it would require per-cpu lists to have a lock for safe remote access. However, I'd be curious if you could test the mm-pagealloc-irqpvec-v1r4 branch ttps://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git . It's an unfinished prototype I worked on a few weeks ago. I was going to revisit in about a months time when 4.15-rc1 was out. I'd be interested in seeing if it has a postive gain in normal page allocations without destroying the performance of interrupt and softirq allocation contexts. The interrupt/softirq context testing is crucial as that is something that hurt us before when trying to improve page allocator performance. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org