Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree

From: Uladzislau Rezki <urezki@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hillf Danton <hdanton@sina.com>, Michal Hocko <mhocko@suse.com>,
	mm-commits@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
Date: Fri, 14 May 2021 16:50:26 +0200	[thread overview]
Message-ID: <20210514145026.GA7183@pc638.lan> (raw)
In-Reply-To: <20210514134530.GP3672@suse.de>

> On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote:
> > > > Seems like "zoneref" refers to invalid address.
> > > > 
> > > > Thoughts?
> > > 
> > > I have not previously read the patch but there are a few concerns and it's
> > > probably just as well this blew up early. The bulk allocator assumes a
> > > valid node but the patch can send in NUMA_NO_NODE (-1). 
> > >
> >
> > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
> > of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.
> > 
> 
> No for API similarity reasons. __alloc_pages_bulk is the API bulk
> equivalent to __alloc_pages() and both expect valid node IDs. vmalloc
> is using alloc_pages_node for high-order pages which first checks
> the node ID so your options are to check it within vmalloc.c or add a
> alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node
> as a prerequisite to your patch.
> 
OK. Thanks.

> > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> > > Also, area->pages is not necessarily initialised so that could be interpreted
> > > as a partially populated array so minmally you need.
> > >
> >
> > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.
> > 
> 
> Ah, yes.
> 
> > > However, the high-order path also looks suspicious. area->nr_pages is
> > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > pages allocated so that needs examination.
> > > 
> > <snip>
> >     for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> >         area->nr_pages += 1U << page_order) {
> > <snip>
> > 
> > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > inside the for(...) loop, thus it will be zero if the single page allocator
> > fails on a first iteration.
> > 
> > Or i miss your point?
> > 
> 
> At the time of the break, area->nr_pages += 1U << page_order happened
> before the allocation failure happens. That looks very suspicious.
> 
The "for" loop does not work that way. If you break the loop the
"area->nr_pages += 1U << page_order" or an "increment" is not increased.
It is increased only after the body of the "for" loop executes and it
goes to next iteration.

> > > As an aside, where or what is test_vmalloc.sh? It appears to have been
> > > used a few times but it's not clear it's representative so are you aware
> > > of workloads that are vmalloc-intensive? It does not matter for the
> > > patch as such but it would be nice to know examples of vmalloc-intensive
> > > workloads because I cannot recall a time during the last few years where
> > > I saw vmalloc.c high in profiles.
> > > 
> > test_vmalloc.sh is a shell script that is used for stressing and testing a
> > vmalloc subsystem as well as performance evaluation. You can find it here:
> > 
> > ./tools/testing/selftests/vm/test_vmalloc.sh
> > 
> 
> Thanks.
> 
> > As for workloads. Most of them which are critical to time and latency. For
> > example audio/video, especially in the mobile area. I did a big rework of
> > the KVA allocator because i found it not optimal to allocation time.
> > 
> 
> Can you give an example benchmark that triggers it or is it somewhat
> specific to mobile platforms with drivers that use vmalloc heavily?
> 
See below an example of audio glitches. That was related to our phones
and audio workloads:

# Explanation is here
wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt

# Audio 10 seconds sample is here.
# The drop occurs at 00:09.295 you can hear it
wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav

Apart of that a slow allocation can course two type of issues. First one
is direct. When for example a high-priority RT thread does some allocation
to bypass data to DSP. Long latency courses a delay of data to be passed to
DSP. This is drivers area.

Another example is when a task is doing an allocation and the RT task is
placed onto a same CPU. In that case a long preemption-off(milliseconds)
section can lead the RT task for starvation. For mobile devices it is UI
stack where RT tasks are used. As a result we face frame drops.

All such issues have been solved after a rework:

wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf

--
Vlad Rezki