From: Michal Hocko <mhocko@kernel.org> To: Dave Hansen <dave.hansen@intel.com> Cc: "Odzioba, Lukasz" <lukasz.odzioba@intel.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "Shutemov, Kirill" <kirill.shutemov@intel.com>, "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com> Subject: Re: mm: pages are not freed from lru_add_pvecs after process termination Date: Thu, 28 Apr 2016 16:37:10 +0200 [thread overview] Message-ID: <20160428143710.GC31496@dhcp22.suse.cz> (raw) In-Reply-To: <5720F2A8.6070406@intel.com> On Wed 27-04-16 10:11:04, Dave Hansen wrote: > On 04/27/2016 10:01 AM, Odzioba, Lukasz wrote: [...] > > 1. We need some statistics on the number and total *SIZES* of all pages > > in the lru pagevecs. It's too opaque now. > > 2. We need to make darn sure we drain the lru pagevecs before failing > > any kind of allocation. lru_add_drain_all is unfortunatelly too costly (especially on large machines). You are right that failing an allocation with a lot of cached pages is less than suboptimal though. So maybe we can do it from the slow path after the first round of direct reclaim failed to allocate anything. Something like the following: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dd65d9fb76a..0743c58c2e9d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3559,6 +3559,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, enum compact_result compact_result; int compaction_retries = 0; int no_progress_loops = 0; + bool drained_lru = false; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3667,6 +3668,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (page) goto got_pg; + if (!drained_lru) { + drained_lru = true; + lru_add_drain_all(); + } + /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto noretry; The downside would be that we really depend on the WQ to make any progress here. If we are really out of memory then we are screwed so we would need a flush_work_timeout() or something else that would guarantee maximum timeout. That something else might be to stop using WQ and move the flushing into the IRQ context. Not for free too but at least not dependant on having some memory to make a progress. > > 3. We need some way to drain the lru pagevecs directly. Maybe the buddy > > pcp lists too. > > 4. We need to make sure that a zone_reclaim_mode=0 system still drains > > too. > > 5. The VM stats and their updates are now related to how often > > drain_zone_pages() gets run. That might be interacting here too. > > 6. Perhaps don't use the LRU pagevecs for large pages. It limits the > severity of the problem. 7. Hook into vmstat and flush from there? This would drain them periodically but it would also introduce an undeterministic interference as well. -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Dave Hansen <dave.hansen@intel.com> Cc: "Odzioba, Lukasz" <lukasz.odzioba@intel.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "Shutemov, Kirill" <kirill.shutemov@intel.com>, "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com> Subject: Re: mm: pages are not freed from lru_add_pvecs after process termination Date: Thu, 28 Apr 2016 16:37:10 +0200 [thread overview] Message-ID: <20160428143710.GC31496@dhcp22.suse.cz> (raw) In-Reply-To: <5720F2A8.6070406@intel.com> On Wed 27-04-16 10:11:04, Dave Hansen wrote: > On 04/27/2016 10:01 AM, Odzioba, Lukasz wrote: [...] > > 1. We need some statistics on the number and total *SIZES* of all pages > > in the lru pagevecs. It's too opaque now. > > 2. We need to make darn sure we drain the lru pagevecs before failing > > any kind of allocation. lru_add_drain_all is unfortunatelly too costly (especially on large machines). You are right that failing an allocation with a lot of cached pages is less than suboptimal though. So maybe we can do it from the slow path after the first round of direct reclaim failed to allocate anything. Something like the following: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dd65d9fb76a..0743c58c2e9d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3559,6 +3559,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, enum compact_result compact_result; int compaction_retries = 0; int no_progress_loops = 0; + bool drained_lru = false; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3667,6 +3668,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (page) goto got_pg; + if (!drained_lru) { + drained_lru = true; + lru_add_drain_all(); + } + /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto noretry; The downside would be that we really depend on the WQ to make any progress here. If we are really out of memory then we are screwed so we would need a flush_work_timeout() or something else that would guarantee maximum timeout. That something else might be to stop using WQ and move the flushing into the IRQ context. Not for free too but at least not dependant on having some memory to make a progress. > > 3. We need some way to drain the lru pagevecs directly. Maybe the buddy > > pcp lists too. > > 4. We need to make sure that a zone_reclaim_mode=0 system still drains > > too. > > 5. The VM stats and their updates are now related to how often > > drain_zone_pages() gets run. That might be interacting here too. > > 6. Perhaps don't use the LRU pagevecs for large pages. It limits the > severity of the problem. 7. Hook into vmstat and flush from there? This would drain them periodically but it would also introduce an undeterministic interference as well. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-28 14:37 UTC|newest] Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-04-27 17:01 mm: pages are not freed from lru_add_pvecs after process termination Odzioba, Lukasz 2016-04-27 17:01 ` Odzioba, Lukasz 2016-04-27 17:11 ` Dave Hansen 2016-04-27 17:11 ` Dave Hansen 2016-04-28 14:37 ` Michal Hocko [this message] 2016-04-28 14:37 ` Michal Hocko 2016-05-02 13:00 ` Michal Hocko 2016-05-02 13:00 ` Michal Hocko 2016-05-04 19:41 ` Odzioba, Lukasz 2016-05-04 19:41 ` Odzioba, Lukasz 2016-05-04 20:16 ` Dave Hansen 2016-05-04 20:16 ` Dave Hansen 2016-05-04 20:36 ` Michal Hocko 2016-05-04 20:36 ` Michal Hocko 2016-05-05 7:21 ` Michal Hocko 2016-05-05 7:21 ` Michal Hocko 2016-05-05 17:25 ` Odzioba, Lukasz 2016-05-05 17:25 ` Odzioba, Lukasz 2016-05-11 7:38 ` Michal Hocko 2016-05-11 7:38 ` Michal Hocko 2016-05-06 15:10 ` Odzioba, Lukasz 2016-05-06 15:10 ` Odzioba, Lukasz 2016-05-06 16:04 ` Dave Hansen 2016-05-06 16:04 ` Dave Hansen 2016-05-11 7:53 ` Michal Hocko 2016-05-11 7:53 ` Michal Hocko 2016-05-13 11:29 ` Vlastimil Babka 2016-05-13 11:29 ` Vlastimil Babka 2016-05-13 12:05 ` Odzioba, Lukasz 2016-05-13 12:05 ` Odzioba, Lukasz 2016-06-07 9:02 ` Odzioba, Lukasz 2016-06-07 9:02 ` Odzioba, Lukasz 2016-06-07 11:19 ` Michal Hocko 2016-06-07 11:19 ` Michal Hocko 2016-06-08 8:51 ` Odzioba, Lukasz 2016-06-08 8:51 ` Odzioba, Lukasz 2016-05-02 14:39 ` Vlastimil Babka 2016-05-02 14:39 ` Vlastimil Babka 2016-05-02 15:01 ` Kirill A. Shutemov 2016-05-02 15:01 ` Kirill A. Shutemov 2016-05-02 15:13 ` Vlastimil Babka 2016-05-02 15:13 ` Vlastimil Babka 2016-05-02 15:49 ` Dave Hansen 2016-05-02 15:49 ` Dave Hansen 2016-05-02 16:02 ` Kirill A. Shutemov 2016-05-02 16:02 ` Kirill A. Shutemov 2016-05-03 7:37 ` Michal Hocko 2016-05-03 7:37 ` Michal Hocko 2016-05-03 10:07 ` Kirill A. Shutemov 2016-05-03 10:07 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160428143710.GC31496@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=dave.hansen@intel.com \ --cc=kirill.shutemov@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=lukasz.anaczkowski@intel.com \ --cc=lukasz.odzioba@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.