From: Joel Fernandes <joel@joelfernandes.org>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Thomas Garnier <thgarnie@google.com>,
Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v1 2/2] mm: add priority threshold to __purge_vmap_area_lazy()
Date: Wed, 6 Mar 2019 11:25:19 -0500 [thread overview]
Message-ID: <20190306162519.GB193418@google.com> (raw)
In-Reply-To: <20190129173936.4sscooiybzbhos77@pc636>
On Tue, Jan 29, 2019 at 06:39:36PM +0100, Uladzislau Rezki wrote:
> On Mon, Jan 28, 2019 at 05:45:28PM -0500, Joel Fernandes wrote:
> > On Thu, Jan 24, 2019 at 12:56:48PM +0100, Uladzislau Rezki (Sony) wrote:
> > > commit 763b218ddfaf ("mm: add preempt points into
> > > __purge_vmap_area_lazy()")
> > >
> > > introduced some preempt points, one of those is making an
> > > allocation more prioritized over lazy free of vmap areas.
> > >
> > > Prioritizing an allocation over freeing does not work well
> > > all the time, i.e. it should be rather a compromise.
> > >
> > > 1) Number of lazy pages directly influence on busy list length
> > > thus on operations like: allocation, lookup, unmap, remove, etc.
> > >
> > > 2) Under heavy stress of vmalloc subsystem i run into a situation
> > > when memory usage gets increased hitting out_of_memory -> panic
> > > state due to completely blocking of logic that frees vmap areas
> > > in the __purge_vmap_area_lazy() function.
> > >
> > > Establish a threshold passing which the freeing is prioritized
> > > back over allocation creating a balance between each other.
> >
> > I'm a bit concerned that this will introduce the latency back if vmap_lazy_nr
> > is greater than half of lazy_max_pages(). Which IIUC will be more likely if
> > the number of CPUs is large.
> >
> The threshold that we establish is two times more than lazy_max_pages(),
> i.e. in case of 4 system CPUs lazy_max_pages() is 24576, therefore the
> threshold is 49152, if PAGE_SIZE is 4096.
>
> It means that we allow rescheduling if vmap_lazy_nr < 49152. If vmap_lazy_nr
> is higher then we forbid rescheduling and free areas until it becomes lower
> again to stabilize the system. By doing that, we will not allow vmap_lazy_nr
> to be enormously increased.
Sorry for late reply.
This sounds reasonable. Such an extreme situation of vmap_lazy_nr being twice
the lazy_max_pages() is probably only possible using a stress test anyway
since (hopefully) the try_purge_vmap_area_lazy() call is happening often
enough to keep the vmap_lazy_nr low.
Have you experimented with what is the highest threshold that prevents the
issues you're seeing? Have you tried 3x or 4x the vmap_lazy_nr?
I also wonder what is the cost these days of the global TLB flush on the most
common Linux architectures and if the whole purge vmap_area lazy stuff is
starting to get a bit dated, and if we can do the purging inline as areas are
freed. There is a cost to having this mechanism too as you said, which is as
the list size grows, all other operations also take time.
thanks,
- Joel
> > In fact, when vmap_lazy_nr is high, that's when the latency will be the worst
> > so one could say that that's when you *should* reschedule since the frees are
> > taking too long and hurting real-time tasks.
> >
> > Could this be better solved by tweaking lazy_max_pages() such that purging is
> > more aggressive?
> >
> > Another approach could be to detect the scenario you brought up (allocations
> > happening faster than free), somehow, and avoid a reschedule?
> >
> This is what i am trying to achieve by this change.
>
> Thank you for your comments.
>
> --
> Vlad Rezki
> > >
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > > mm/vmalloc.c | 18 ++++++++++++------
> > > 1 file changed, 12 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index fb4fb5fcee74..abe83f885069 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -661,23 +661,27 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
> > > struct llist_node *valist;
> > > struct vmap_area *va;
> > > struct vmap_area *n_va;
> > > - bool do_free = false;
> > > + int resched_threshold;
> > >
> > > lockdep_assert_held(&vmap_purge_lock);
> > >
> > > valist = llist_del_all(&vmap_purge_list);
> > > + if (unlikely(valist == NULL))
> > > + return false;
> > > +
> > > + /*
> > > + * TODO: to calculate a flush range without looping.
> > > + * The list can be up to lazy_max_pages() elements.
> > > + */
> > > llist_for_each_entry(va, valist, purge_list) {
> > > if (va->va_start < start)
> > > start = va->va_start;
> > > if (va->va_end > end)
> > > end = va->va_end;
> > > - do_free = true;
> > > }
> > >
> > > - if (!do_free)
> > > - return false;
> > > -
> > > flush_tlb_kernel_range(start, end);
> > > + resched_threshold = (int) lazy_max_pages() << 1;
> > >
> > > spin_lock(&vmap_area_lock);
> > > llist_for_each_entry_safe(va, n_va, valist, purge_list) {
> > > @@ -685,7 +689,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
> > >
> > > __free_vmap_area(va);
> > > atomic_sub(nr, &vmap_lazy_nr);
> > > - cond_resched_lock(&vmap_area_lock);
> > > +
> > > + if (atomic_read(&vmap_lazy_nr) < resched_threshold)
> > > + cond_resched_lock(&vmap_area_lock);
> > > }
> > > spin_unlock(&vmap_area_lock);
> > > return true;
> > > --
> > > 2.11.0
> > >
next prev parent reply other threads:[~2019-03-06 16:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-24 11:56 [PATCH v1 0/2] stability fixes for vmalloc allocator Uladzislau Rezki (Sony)
2019-01-24 11:56 ` [PATCH v1 1/2] mm/vmalloc: fix kernel BUG at mm/vmalloc.c:512! Uladzislau Rezki (Sony)
2019-01-24 11:56 ` [PATCH v1 2/2] mm: add priority threshold to __purge_vmap_area_lazy() Uladzislau Rezki (Sony)
2019-01-28 20:04 ` Andrew Morton
2019-01-29 16:17 ` Uladzislau Rezki
2019-01-29 18:03 ` Andrew Morton
2019-01-28 22:45 ` Joel Fernandes
2019-01-29 17:39 ` Uladzislau Rezki
2019-03-06 16:25 ` Joel Fernandes [this message]
2019-03-07 11:15 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190306162519.GB193418@google.com \
--to=joel@joelfernandes.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mingo@elte.hu \
--cc=oleksiy.avramchenko@sonymobile.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=thgarnie@google.com \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).