From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932088AbcEML3X (ORCPT ); Fri, 13 May 2016 07:29:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:55412 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750907AbcEML3V (ORCPT ); Fri, 13 May 2016 07:29:21 -0400 Subject: Re: mm: pages are not freed from lru_add_pvecs after process termination To: Michal Hocko , Dave Hansen References: <5720F2A8.6070406@intel.com> <20160428143710.GC31496@dhcp22.suse.cz> <20160502130006.GD25265@dhcp22.suse.cz> <20160504203643.GI21490@dhcp22.suse.cz> <20160505072122.GA4386@dhcp22.suse.cz> <572CC092.5020702@intel.com> <20160511075313.GE16677@dhcp22.suse.cz> Cc: "Odzioba, Lukasz" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Shutemov, Kirill" , "Anaczkowski, Lukasz" From: Vlastimil Babka Message-ID: <5735BA8E.3080201@suse.cz> Date: Fri, 13 May 2016 13:29:18 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <20160511075313.GE16677@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/11/2016 09:53 AM, Michal Hocko wrote: > On Fri 06-05-16 09:04:34, Dave Hansen wrote: >> On 05/06/2016 08:10 AM, Odzioba, Lukasz wrote: >>> On Thu 05-05-16 09:21:00, Michal Hocko wrote: >>>> Or maybe the async nature of flushing turns >>>> out to be just impractical and unreliable and we will end up skipping >>>> THP (or all compound pages) for pcp LRU add cache. Let's see... >>> >>> What if we simply skip lru_add pvecs for compound pages? >>> That way we still have compound pages on LRU's, but the problem goes >>> away. It is not quite what this naïve patch does, but it works nice for me. >>> >>> diff --git a/mm/swap.c b/mm/swap.c >>> index 03aacbc..c75d5e1 100644 >>> --- a/mm/swap.c >>> +++ b/mm/swap.c >>> @@ -392,7 +392,9 @@ static void __lru_cache_add(struct page *page) >>> get_page(page); >>> if (!pagevec_space(pvec)) >>> __pagevec_lru_add(pvec); >>> pagevec_add(pvec, page); >>> + if (PageCompound(page)) >>> + __pagevec_lru_add(pvec); >>> put_cpu_var(lru_add_pvec); >>> } >> >> That's not _quite_ what I had in mind since that drains the entire pvec >> every time a large page is encountered. But I'm conflicted about what >> the right behavior _is_. >> >> We'd taking the LRU lock for 'page' anyway, so we might as well drain >> the pvec. Note that pages in the pagevec can come from different zones, so this is not universally true. > > Yes I think this makes sense. The only case where it would be suboptimal > is when the pagevec was already full and then we just created a single > page pvec to drain it. This can be handled better though by: > > diff --git a/mm/swap.c b/mm/swap.c > index 95916142fc46..3fe4f180e8bf 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -391,9 +391,8 @@ static void __lru_cache_add(struct page *page) > struct pagevec *pvec = &get_cpu_var(lru_add_pvec); > > get_page(page); > - if (!pagevec_space(pvec)) > + if (!pagevec_add(pvec, page) || PageCompound(page)) > __pagevec_lru_add(pvec); > - pagevec_add(pvec, page); > put_cpu_var(lru_add_pvec); > } Yeah that could work. There might be more complex solutions at the level of lru_cache_add_active_or_unevictable() where we call it either from base page code (mm/memory.c) or functions in mm/huge_memory.c. We could redirect it at that point, but likely not worth the trouble unless this simple solution doesn't show some performance regression... >> Or, does the additional work to put the page on to a pvec and then >> immediately drain it overwhelm that advantage? > > pagevec_add is quite trivial so I would be really surprised if it > mattered. >