linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
	Christoph Lameter <cl@gentwo.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Steve Capper <steve.capper@linaro.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	Jerome Marchand <jmarchan@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv12 34/37] thp: introduce deferred_split_huge_page()
Date: Thu, 21 Jan 2016 02:22:37 +0100	[thread overview]
Message-ID: <20160121012237.GE7119@redhat.com> (raw)
In-Reply-To: <1444145044-72349-35-git-send-email-kirill.shutemov@linux.intel.com>

Hello Kirill,

On Tue, Oct 06, 2015 at 06:24:01PM +0300, Kirill A. Shutemov wrote:
> +static unsigned long deferred_split_scan(struct shrinker *shrink,
> +		struct shrink_control *sc)
> +{
> +	unsigned long flags;
> +	LIST_HEAD(list), *pos, *next;
> +	struct page *page;
> +	int split = 0;
> +
> +	spin_lock_irqsave(&split_queue_lock, flags);
> +	list_splice_init(&split_queue, &list);
> +
> +	/* Take pin on all head pages to avoid freeing them under us */
> +	list_for_each_safe(pos, next, &list) {
> +		page = list_entry((void *)pos, struct page, mapping);
> +		page = compound_head(page);
> +		/* race with put_compound_page() */
> +		if (!get_page_unless_zero(page)) {
> +			list_del_init(page_deferred_list(page));
> +			split_queue_len--;
> +		}
> +	}
> +	spin_unlock_irqrestore(&split_queue_lock, flags);

While rebasing I noticed this loop looks a bit too heavy. There's no
lockbreak and no cap on the list size, and million of THP pages could
have been partially unmapped but not be entirely freed yet, and sit
there for a while (there are other scenarios but this is the one that
could more realistically happen with certain allocators). Then as
result of random memory pressure we'd be calling millions of
get_page_unless_zero across multiple NUMA nodes thrashing cachelines
at every list entry, with irq disabled too for the whole period.

I haven't verified it, but I guess that in some large NUMA (i.e. 4TiB)
system that could take down a CPU for a second or more with irq
disabled.

I think it needs to isolate a certain number of pages, not splice
(userland programs can invoke the shrinker through direct reclaim too
and they can't stuck there for too long) and perhaps use
sc->nr_to_scan to achieve that.

The split_queue can also be moved from global to the "struct
pglist_data" and then you can do NODE_DATA(sc->nid)->split_queue, same
for the spinlock. That will make it more scalable for the lock and
more efficient in freeing memory so we don't split THP from nodes
reclaim isn't currently interested about (reclaim will later try again
on the zones in the other nodes by itself if needed).

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-01-21  1:22 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-06 15:23 [PATCHv12 00/37] THP refcounting redesign Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 01/37] mm, proc: adjust PSS calculation Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 02/37] rmap: add argument to charge compound page Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 03/37] memcg: adjust to support new THP refcounting Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 04/37] mm, thp: adjust conditions when we can reuse the page on WP fault Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 05/37] mm: adjust FOLL_SPLIT for new refcounting Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 06/37] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 07/37] thp, mlock: do not allow huge pages in mlocked area Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 08/37] khugepaged: ignore pmd tables with THP mapped with ptes Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 09/37] thp: rename split_huge_page_pmd() to split_huge_pmd() Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 10/37] mm, vmstats: new THP splitting event Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 11/37] mm: temporally mark THP broken Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 12/37] thp: drop all split_huge_page()-related code Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 13/37] mm: drop tail page refcounting Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 14/37] futex, thp: remove special case for THP in get_futex_key Kirill A. Shutemov
2015-10-22  8:24   ` Artem Savkov
2015-10-22  9:49     ` Kirill A. Shutemov
2015-10-22 10:33       ` Artem Savkov
2015-10-06 15:23 ` [PATCHv12 15/37] ksm: prepare to new THP semantics Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 16/37] mm, thp: remove compound_lock Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 17/37] arm64, thp: remove infrastructure for handling splitting PMDs Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 18/37] arm, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 19/37] mips, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 20/37] powerpc, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 21/37] s390, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 22/37] sparc, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 23/37] tile, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 24/37] x86, " Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 25/37] mm, " Kirill A. Shutemov
2015-10-08  8:52   ` Vineet Gupta
2015-10-09  9:25     ` Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 26/37] mm: rework mapcount accounting to enable 4k mapping of THPs Kirill A. Shutemov
2015-10-27  6:18   ` Naoya Horiguchi
2015-10-27  9:30     ` Kirill A. Shutemov
2015-10-27 23:24       ` Naoya Horiguchi
2015-10-29 21:50         ` Kirill A. Shutemov
2015-10-30  8:33           ` Naoya Horiguchi
2015-10-29  8:19   ` Naoya Horiguchi
2015-10-29 21:20     ` Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 27/37] mm: differentiate page_mapped() from page_mapcount() for compound pages Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 28/37] mm, numa: skip PTE-mapped THP on numa fault Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 29/37] thp: implement split_huge_pmd() Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 30/37] thp: add option to setup migration entries during PMD split Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 31/37] thp, mm: split_huge_page(): caller need to lock page Kirill A. Shutemov
2015-10-06 15:23 ` [PATCHv12 32/37] thp: reintroduce split_huge_page() Kirill A. Shutemov
2015-11-18 16:24   ` Sasha Levin
2015-11-18 19:05     ` Kirill A. Shutemov
2015-11-27  4:26       ` Sasha Levin
2015-10-06 15:24 ` [PATCHv12 33/37] migrate_pages: try to split pages on qeueuing Kirill A. Shutemov
2015-10-06 15:24 ` [PATCHv12 34/37] thp: introduce deferred_split_huge_page() Kirill A. Shutemov
2016-01-21  1:22   ` Andrea Arcangeli [this message]
2016-01-21 12:09     ` [PATCH 0/3] Couple of fixes for deferred_split_huge_page() Kirill A. Shutemov
2016-01-21 12:09       ` [PATCH 1/3] thp: make split_queue per-node Kirill A. Shutemov
2016-01-21 12:09       ` [PATCH 2/3] thp: change deferred_split_count() to return number of THP in queue Kirill A. Shutemov
2016-01-22 14:31         ` Andrea Arcangeli
2016-01-22 15:20           ` Kirill A. Shutemov
2016-01-21 12:09       ` [PATCH 3/3] thp: limit number of object to scan on deferred_split_scan() Kirill A. Shutemov
2016-02-04 13:11         ` Kirill A. Shutemov
2016-01-21 22:52       ` [PATCH 0/3] Couple of fixes for deferred_split_huge_page() Andrea Arcangeli
2015-10-06 15:24 ` [PATCHv12 35/37] mm: re-enable THP Kirill A. Shutemov
2015-10-06 15:24 ` [PATCHv12 36/37] thp: update documentation Kirill A. Shutemov
2015-10-06 15:24 ` [PATCHv12 37/37] thp: allow mlocked THP again Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160121012237.GE7119@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jmarchan@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=steve.capper@linaro.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).