From patchwork Fri Jan 18 17:51:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033509 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BBC1C07EBF for ; Fri, 18 Jan 2019 17:52:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2D6BC2087E for ; Fri, 18 Jan 2019 17:52:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728642AbfARRv7 (ORCPT ); Fri, 18 Jan 2019 12:51:59 -0500 Received: from outbound-smtp10.blacknight.com ([46.22.139.15]:45799 "EHLO outbound-smtp10.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727940AbfARRv7 (ORCPT ); Fri, 18 Jan 2019 12:51:59 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp10.blacknight.com (Postfix) with ESMTPS id 593D61C35A3 for ; Fri, 18 Jan 2019 17:51:57 +0000 (GMT) Received: (qmail 30679 invoked from network); 18 Jan 2019 17:51:57 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:51:57 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 01/22] mm, compaction: Shrink compact_control Date: Fri, 18 Jan 2019 17:51:15 +0000 Message-Id: <20190118175136.31341-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The isolate and migrate scanners should never isolate more than a pageblock of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 536bc2a839b9..5564841fce36 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -185,8 +185,8 @@ struct compact_control { struct list_head freepages; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ struct zone *zone; - unsigned long nr_freepages; /* Number of isolated free pages */ - unsigned long nr_migratepages; /* Number of pages to migrate */ + unsigned int nr_freepages; /* Number of isolated free pages */ + unsigned int nr_migratepages; /* Number of pages to migrate */ unsigned long total_migrate_scanned; unsigned long total_free_scanned; unsigned long free_pfn; /* isolate_freepages search base */ From patchwork Fri Jan 18 17:51:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033510 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C9EEC07EBF for ; Fri, 18 Jan 2019 17:52:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 030CD20850 for ; Fri, 18 Jan 2019 17:52:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728674AbfARRwJ (ORCPT ); Fri, 18 Jan 2019 12:52:09 -0500 Received: from outbound-smtp12.blacknight.com ([46.22.139.17]:57485 "EHLO outbound-smtp12.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728664AbfARRwJ (ORCPT ); Fri, 18 Jan 2019 12:52:09 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp12.blacknight.com (Postfix) with ESMTPS id 8338E1C3589 for ; Fri, 18 Jan 2019 17:52:07 +0000 (GMT) Received: (qmail 31214 invoked from network); 18 Jan 2019 17:52:07 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:52:07 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 02/22] mm, compaction: Rearrange compact_control Date: Fri, 18 Jan 2019 17:51:16 +0000 Message-Id: <20190118175136.31341-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org compact_control spans two cache lines with write-intensive lines on both. Rearrange so the most write-intensive fields are in the same cache line. This has a negligible impact on the overall performance of compaction and is more a tidying exercise than anything. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5564841fce36..867af5425432 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -184,14 +184,14 @@ extern int user_min_free_kbytes; struct compact_control { struct list_head freepages; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ - struct zone *zone; unsigned int nr_freepages; /* Number of isolated free pages */ unsigned int nr_migratepages; /* Number of pages to migrate */ - unsigned long total_migrate_scanned; - unsigned long total_free_scanned; unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ + struct zone *zone; + unsigned long total_migrate_scanned; + unsigned long total_free_scanned; const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ int migratetype; /* migratetype of direct compactor */ From patchwork Fri Jan 18 17:51:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033511 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80285C5AC81 for ; Fri, 18 Jan 2019 17:52:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5F8812087E for ; Fri, 18 Jan 2019 17:52:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728694AbfARRwU (ORCPT ); Fri, 18 Jan 2019 12:52:20 -0500 Received: from outbound-smtp12.blacknight.com ([46.22.139.17]:39375 "EHLO outbound-smtp12.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727323AbfARRwU (ORCPT ); Fri, 18 Jan 2019 12:52:20 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp12.blacknight.com (Postfix) with ESMTPS id A85931C3584 for ; Fri, 18 Jan 2019 17:52:17 +0000 (GMT) Received: (qmail 31737 invoked from network); 18 Jan 2019 17:52:17 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:52:17 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 03/22] mm, compaction: Remove last_migrated_pfn from compact_control Date: Fri, 18 Jan 2019 17:51:17 +0000 Message-Id: <20190118175136.31341-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The last_migrated_pfn field is a bit dubious as to whether it really helps but either way, the information from it can be inferred without increasing the size of compact_control so remove the field. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 25 +++++++++---------------- mm/internal.h | 1 - 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index c15b4bbc9e9e..e59dd7a7564c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -886,15 +886,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, cc->nr_migratepages++; nr_isolated++; - /* - * Record where we could have freed pages by migration and not - * yet flushed them to buddy allocator. - * - this is the lowest page that was isolated and likely be - * then freed by migration. - */ - if (!cc->last_migrated_pfn) - cc->last_migrated_pfn = low_pfn; - /* Avoid isolating too much */ if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) { ++low_pfn; @@ -918,7 +909,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; - cc->last_migrated_pfn = 0; nr_isolated = 0; } @@ -1539,6 +1529,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro enum compact_result ret; unsigned long start_pfn = zone->zone_start_pfn; unsigned long end_pfn = zone_end_pfn(zone); + unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); @@ -1584,7 +1575,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->whole_zone = true; } - cc->last_migrated_pfn = 0; + last_migrated_pfn = 0; trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); @@ -1593,12 +1584,14 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) { int err; + unsigned long start_pfn = cc->migrate_pfn; switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_CONTENDED; putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; + last_migrated_pfn = 0; goto out; case ISOLATE_NONE: /* @@ -1608,6 +1601,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro */ goto check_drain; case ISOLATE_SUCCESS: + last_migrated_pfn = start_pfn; ; } @@ -1639,8 +1633,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->migrate_pfn = block_end_pfn( cc->migrate_pfn - 1, cc->order); /* Draining pcplists is useless in this case */ - cc->last_migrated_pfn = 0; - + last_migrated_pfn = 0; } } @@ -1652,18 +1645,18 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro * compact_finished() can detect immediately if allocation * would succeed. */ - if (cc->order > 0 && cc->last_migrated_pfn) { + if (cc->order > 0 && last_migrated_pfn) { int cpu; unsigned long current_block_start = block_start_pfn(cc->migrate_pfn, cc->order); - if (cc->last_migrated_pfn < current_block_start) { + if (last_migrated_pfn < current_block_start) { cpu = get_cpu(); lru_add_drain_cpu(cpu); drain_local_pages(zone); put_cpu(); /* No more flushing until we migrate again */ - cc->last_migrated_pfn = 0; + last_migrated_pfn = 0; } } diff --git a/mm/internal.h b/mm/internal.h index 867af5425432..f40d06d70683 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -188,7 +188,6 @@ struct compact_control { unsigned int nr_migratepages; /* Number of pages to migrate */ unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ - unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ struct zone *zone; unsigned long total_migrate_scanned; unsigned long total_free_scanned; From patchwork Fri Jan 18 17:51:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033512 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8987CC5AC81 for ; Fri, 18 Jan 2019 17:52:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 612ED20850 for ; Fri, 18 Jan 2019 17:52:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728746AbfARRwg (ORCPT ); Fri, 18 Jan 2019 12:52:36 -0500 Received: from outbound-smtp13.blacknight.com ([46.22.139.230]:43247 "EHLO outbound-smtp13.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728696AbfARRwa (ORCPT ); Fri, 18 Jan 2019 12:52:30 -0500 Received: from mail.blacknight.com (unknown [81.17.254.16]) by outbound-smtp13.blacknight.com (Postfix) with ESMTPS id CEB0A1C3584 for ; Fri, 18 Jan 2019 17:52:27 +0000 (GMT) Received: (qmail 32350 invoked from network); 18 Jan 2019 17:52:27 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:52:27 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 04/22] mm, compaction: Remove unnecessary zone parameter in some instances Date: Fri, 18 Jan 2019 17:51:18 +0000 Message-Id: <20190118175136.31341-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A zone parameter is passed into a number of top-level compaction functions despite the fact that it's already in compact_control. This is harmless but it did need an audit to check if zone actually ever changes meaningfully. This patches removes the parameter in a number of top-level functions. The change could be much deeper but this was enough to briefly clarify the flow. No functional change. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 54 ++++++++++++++++++++++++++---------------------------- 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index e59dd7a7564c..163841e1b167 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order) return order == -1; } -static enum compact_result __compact_finished(struct zone *zone, - struct compact_control *cc) +static enum compact_result __compact_finished(struct compact_control *cc) { unsigned int order; const int migratetype = cc->migratetype; @@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone *zone, /* Compaction run completes if the migrate and free scanner meet */ if (compact_scanners_met(cc)) { /* Let the next compaction start anew. */ - reset_cached_positions(zone); + reset_cached_positions(cc->zone); /* * Mark that the PG_migrate_skip information should be cleared @@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone *zone, * based on an allocation request. */ if (cc->direct_compaction) - zone->compact_blockskip_flush = true; + cc->zone->compact_blockskip_flush = true; if (cc->whole_zone) return COMPACT_COMPLETE; @@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone *zone, /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { - struct free_area *area = &zone->free_area[order]; + struct free_area *area = &cc->zone->free_area[order]; bool can_steal; /* Job done if page is free of the right migratetype */ @@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct zone *zone, return COMPACT_NO_SUITABLE_PAGE; } -static enum compact_result compact_finished(struct zone *zone, - struct compact_control *cc) +static enum compact_result compact_finished(struct compact_control *cc) { int ret; - ret = __compact_finished(zone, cc); - trace_mm_compaction_finished(zone, cc->order, ret); + ret = __compact_finished(cc); + trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) ret = COMPACT_CONTINUE; @@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, return false; } -static enum compact_result compact_zone(struct zone *zone, struct compact_control *cc) +static enum compact_result compact_zone(struct compact_control *cc) { enum compact_result ret; - unsigned long start_pfn = zone->zone_start_pfn; - unsigned long end_pfn = zone_end_pfn(zone); + unsigned long start_pfn = cc->zone->zone_start_pfn; + unsigned long end_pfn = zone_end_pfn(cc->zone); unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); - ret = compaction_suitable(zone, cc->order, cc->alloc_flags, + ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags, cc->classzone_idx); /* Compaction is likely to fail */ if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED) @@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro * Clear pageblock skip if there were failures recently and compaction * is about to be retried after being deferred. */ - if (compaction_restarting(zone, cc->order)) - __reset_isolation_suitable(zone); + if (compaction_restarting(cc->zone, cc->order)) + __reset_isolation_suitable(cc->zone); /* * Setup to move all movable pages to the end of the zone. Used cached @@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->migrate_pfn = start_pfn; cc->free_pfn = pageblock_start_pfn(end_pfn - 1); } else { - cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync]; - cc->free_pfn = zone->compact_cached_free_pfn; + cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync]; + cc->free_pfn = cc->zone->compact_cached_free_pfn; if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) { cc->free_pfn = pageblock_start_pfn(end_pfn - 1); - zone->compact_cached_free_pfn = cc->free_pfn; + cc->zone->compact_cached_free_pfn = cc->free_pfn; } if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) { cc->migrate_pfn = start_pfn; - zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn; - zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn; + cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn; + cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn; } if (cc->migrate_pfn == start_pfn) @@ -1582,11 +1580,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro migrate_prep_local(); - while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) { + while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) { int err; unsigned long start_pfn = cc->migrate_pfn; - switch (isolate_migratepages(zone, cc)) { + switch (isolate_migratepages(cc->zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_CONTENDED; putback_movable_pages(&cc->migratepages); @@ -1653,7 +1651,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro if (last_migrated_pfn < current_block_start) { cpu = get_cpu(); lru_add_drain_cpu(cpu); - drain_local_pages(zone); + drain_local_pages(cc->zone); put_cpu(); /* No more flushing until we migrate again */ last_migrated_pfn = 0; @@ -1678,8 +1676,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro * Only go back, not forward. The cached pfn might have been * already reset to zone end in compact_finished() */ - if (free_pfn > zone->compact_cached_free_pfn) - zone->compact_cached_free_pfn = free_pfn; + if (free_pfn > cc->zone->compact_cached_free_pfn) + cc->zone->compact_cached_free_pfn = free_pfn; } count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned); @@ -1716,7 +1714,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - ret = compact_zone(zone, &cc); + ret = compact_zone(&cc); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); @@ -1834,7 +1832,7 @@ static void compact_node(int nid) INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - compact_zone(zone, &cc); + compact_zone(&cc); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); @@ -1968,7 +1966,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) if (kthread_should_stop()) return; - status = compact_zone(zone, &cc); + status = compact_zone(&cc); if (status == COMPACT_SUCCESS) { compaction_defer_reset(zone, cc.order, false); From patchwork Fri Jan 18 17:51:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033513 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19878C07EBF for ; Fri, 18 Jan 2019 17:52:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ECFDE20850 for ; Fri, 18 Jan 2019 17:52:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728778AbfARRwk (ORCPT ); Fri, 18 Jan 2019 12:52:40 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:53440 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728696AbfARRwj (ORCPT ); Fri, 18 Jan 2019 12:52:39 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id 0DFC01C3589 for ; Fri, 18 Jan 2019 17:52:38 +0000 (GMT) Received: (qmail 340 invoked from network); 18 Jan 2019 17:52:37 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:52:37 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 05/22] mm, compaction: Rename map_pages to split_map_pages Date: Fri, 18 Jan 2019 17:51:19 +0000 Message-Id: <20190118175136.31341-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It's non-obvious that high-order free pages are split into order-0 pages from the function name. Fix it. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 163841e1b167..32a88b49f973 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head *freelist) return high_pfn; } -static void map_pages(struct list_head *list) +static void split_map_pages(struct list_head *list) { unsigned int i, order, nr_pages; struct page *page, *next; @@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc, } /* __isolate_free_page() does not map the pages */ - map_pages(&freelist); + split_map_pages(&freelist); if (pfn < end_pfn) { /* Loop terminated early, cleanup. */ @@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc) } /* __isolate_free_page() does not map the pages */ - map_pages(freelist); + split_map_pages(freelist); /* * Record where the free scanner will restart next time. Either we From patchwork Fri Jan 18 17:51:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033514 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7E34C5AC81 for ; Fri, 18 Jan 2019 17:52:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 895AC20850 for ; Fri, 18 Jan 2019 17:52:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728804AbfARRwu (ORCPT ); Fri, 18 Jan 2019 12:52:50 -0500 Received: from outbound-smtp16.blacknight.com ([46.22.139.233]:46689 "EHLO outbound-smtp16.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728698AbfARRwu (ORCPT ); Fri, 18 Jan 2019 12:52:50 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp16.blacknight.com (Postfix) with ESMTPS id 33BD01C35B2 for ; Fri, 18 Jan 2019 17:52:48 +0000 (GMT) Received: (qmail 750 invoked from network); 18 Jan 2019 17:52:48 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:52:48 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 06/22] mm, migrate: Immediately fail migration of a page with no migration handler Date: Fri, 18 Jan 2019 17:51:20 +0000 Message-Id: <20190118175136.31341-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Pages with no migration handler use a fallback handler which sometimes works and sometimes persistently retries. A historical example was blockdev pages but there are others such as odd refcounting when page->private is used. These are retried multiple times which is wasteful during compaction so this patch will fail migration faster unless the caller specifies MIGRATE_SYNC. This is not expected to help THP allocation success rates but it did reduce latencies very slightly in some cases. 1-socket thpfioscale 4.20.0 4.20.0 noreserved-v2r15 failfast-v2r15 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 3839.67 ( 0.00%) 3833.72 ( 0.15%) Amean fault-both-5 5177.47 ( 0.00%) 4967.15 ( 4.06%) Amean fault-both-7 7245.03 ( 0.00%) 7139.19 ( 1.46%) Amean fault-both-12 11534.89 ( 0.00%) 11326.30 ( 1.81%) Amean fault-both-18 16241.10 ( 0.00%) 16270.70 ( -0.18%) Amean fault-both-24 19075.91 ( 0.00%) 19839.65 ( -4.00%) Amean fault-both-30 22712.11 ( 0.00%) 21707.05 ( 4.43%) Amean fault-both-32 21692.92 ( 0.00%) 21968.16 ( -1.27%) The 2-socket results are not materially different. Scan rates are similar as expected. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 52b04c217e30..4512afab46ac 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -916,7 +916,7 @@ static int fallback_migrate_page(struct address_space *mapping, */ if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) - return -EAGAIN; + return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY; return migrate_page(mapping, newpage, page, mode); } From patchwork Fri Jan 18 17:51:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033515 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43B2CC07EBF for ; Fri, 18 Jan 2019 17:53:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 241762087E for ; Fri, 18 Jan 2019 17:53:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728827AbfARRxA (ORCPT ); Fri, 18 Jan 2019 12:53:00 -0500 Received: from outbound-smtp25.blacknight.com ([81.17.249.193]:60052 "EHLO outbound-smtp25.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728426AbfARRxA (ORCPT ); Fri, 18 Jan 2019 12:53:00 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 534FBB87ED for ; Fri, 18 Jan 2019 17:52:58 +0000 (GMT) Received: (qmail 1326 invoked from network); 18 Jan 2019 17:52:58 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:52:58 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 07/22] mm, compaction: Always finish scanning of a full pageblock Date: Fri, 18 Jan 2019 17:51:21 +0000 Message-Id: <20190118175136.31341-8-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When compaction is finishing, it uses a flag to ensure the pageblock is complete but it makes sense to always complete migration of a pageblock. Minimally, skip information is based on a pageblock and partially scanned pageblocks may incur more scanning in the future. The pageblock skip handling also becomes more strict later in the series and the hint is more useful if a complete pageblock was always scanned. The potentially impacts latency as more scanning is done but it's not a consistent win or loss as the scanning is not always a high percentage of the pageblock and sometimes it is offset by future reductions in scanning. Hence, the results are not presented this time due to a misleading mix of gains/losses without any clear pattern. However, full scanning of the pageblock is important for later patches. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 19 ++++++++----------- mm/internal.h | 1 - 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 32a88b49f973..3d11c209614a 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1331,16 +1331,14 @@ static enum compact_result __compact_finished(struct compact_control *cc) if (is_via_compact_memory(cc->order)) return COMPACT_CONTINUE; - if (cc->finishing_block) { - /* - * We have finished the pageblock, but better check again that - * we really succeeded. - */ - if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) - cc->finishing_block = false; - else - return COMPACT_CONTINUE; - } + /* + * Always finish scanning a pageblock to reduce the possibility of + * fallbacks in the future. This is particularly important when + * migration source is unmovable/reclaimable but it's not worth + * special casing. + */ + if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) + return COMPACT_CONTINUE; /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { @@ -1382,7 +1380,6 @@ static enum compact_result __compact_finished(struct compact_control *cc) return COMPACT_SUCCESS; } - cc->finishing_block = true; return COMPACT_CONTINUE; } } diff --git a/mm/internal.h b/mm/internal.h index f40d06d70683..9b32f4cab0ae 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -203,7 +203,6 @@ struct compact_control { bool direct_compaction; /* False from kcompactd or /proc/... */ bool whole_zone; /* Whole zone should/has been scanned */ bool contended; /* Signal lock or sched contention */ - bool finishing_block; /* Finishing current pageblock */ }; unsigned long From patchwork Fri Jan 18 17:51:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033516 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09B88C07EBF for ; Fri, 18 Jan 2019 17:53:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CF84D20850 for ; Fri, 18 Jan 2019 17:53:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728855AbfARRxK (ORCPT ); Fri, 18 Jan 2019 12:53:10 -0500 Received: from outbound-smtp13.blacknight.com ([46.22.139.230]:55130 "EHLO outbound-smtp13.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728426AbfARRxK (ORCPT ); Fri, 18 Jan 2019 12:53:10 -0500 Received: from mail.blacknight.com (unknown [81.17.254.16]) by outbound-smtp13.blacknight.com (Postfix) with ESMTPS id 7EC9B1C35AC for ; Fri, 18 Jan 2019 17:53:08 +0000 (GMT) Received: (qmail 2023 invoked from network); 18 Jan 2019 17:53:08 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:53:08 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 08/22] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Date: Fri, 18 Jan 2019 17:51:22 +0000 Message-Id: <20190118175136.31341-9-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When pageblocks get fragmented, watermarks are artifically boosted to reclaim pages to avoid further fragmentation events. However, compaction is often either fragmentation-neutral or moving movable pages away from unmovable/reclaimable pages. As the true watermarks are preserved, allow compaction to ignore the boost factor. The expected impact is very slight as the main benefit is that compaction is slightly more likely to succeed when the system has been fragmented very recently. On both 1-socket and 2-socket machines for THP-intensive allocation during fragmentation the success rate was increased by less than 1% which is marginal. However, detailed tracing indicated that failure of migration due to a premature ENOMEM triggered by watermark checks were eliminated. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fc769ff4fb2c..6607cb7131b0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3071,7 +3071,7 @@ int __isolate_free_page(struct page *page, unsigned int order) * watermark, because we already know our high-order page * exists. */ - watermark = min_wmark_pages(zone) + (1UL << order); + watermark = zone->_watermark[WMARK_MIN] + (1UL << order); if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) return 0; From patchwork Fri Jan 18 17:51:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033517 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D59CC07EBF for ; Fri, 18 Jan 2019 17:53:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BF8C720850 for ; Fri, 18 Jan 2019 17:53:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728877AbfARRxY (ORCPT ); Fri, 18 Jan 2019 12:53:24 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:47414 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728610AbfARRxX (ORCPT ); Fri, 18 Jan 2019 12:53:23 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id A9E4D1C35AF for ; Fri, 18 Jan 2019 17:53:18 +0000 (GMT) Received: (qmail 2691 invoked from network); 18 Jan 2019 17:53:18 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:53:18 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 09/22] mm, compaction: Use free lists to quickly locate a migration source Date: Fri, 18 Jan 2019 17:51:23 +0000 Message-Id: <20190118175136.31341-10-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The migration scanner is a linear scan of a zone with a potentiall large search space. Furthermore, many pageblocks are unusable such as those filled with reserved pages or partially filled with pages that cannot migrate. These still get scanned in the common case of allocating a THP and the cost accumulates. The patch uses a partial search of the free lists to locate a migration source candidate that is marked as MOVABLE when allocating a THP. It prefers picking a block with a larger number of free pages already on the basis that there are fewer pages to migrate to free the entire block. The lowest PFN found during searches is tracked as the basis of the start for the linear search after the first search of the free list fails. After the search, the free list is shuffled so that the next search will not encounter the same page. If the search fails then the subsequent searches will be shorter and the linear scanner is used. If this search fails, or if the request is for a small or unmovable/reclaimable allocation then the linear scanner is still used. It is somewhat pointless to use the list search in those cases. Small free pages must be used for the search and there is no guarantee that movable pages are located within that block that are contiguous. 5.0.0-rc1 5.0.0-rc1 noboost-v3r10 findmig-v3r15 Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%) Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%) Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%) Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%) Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%) Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%* Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%* Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%* 5.0.0-rc1 5.0.0-rc1 noboost-v3r10 findmig-v3r15 Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%) Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%) Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%) Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%) Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%) Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%) Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%) Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%) This shows an improvement in allocation latencies with similar allocation success rates. While not presented, there was a 31% reduction in migration scanning and a 8% reduction on system CPU usage. A 2-socket machine showed similar benefits. [vbabka@suse.cz: Migrate block that was found-fast, some optimisations] Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- mm/internal.h | 2 + 2 files changed, 175 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 3d11c209614a..92d10eb3d1c7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1040,6 +1040,12 @@ static bool suitable_migration_target(struct compact_control *cc, return false; } +static inline unsigned int +freelist_scan_limit(struct compact_control *cc) +{ + return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1; +} + /* * Test whether the free scanner has reached the same or lower pageblock than * the migration scanner, and compaction should thus terminate. @@ -1050,6 +1056,19 @@ static inline bool compact_scanners_met(struct compact_control *cc) <= (cc->migrate_pfn >> pageblock_order); } +/* Reorder the free list to reduce repeated future searches */ +static void +move_freelist_tail(struct list_head *freelist, struct page *freepage) +{ + LIST_HEAD(sublist); + + if (!list_is_last(freelist, &freepage->lru)) { + list_cut_position(&sublist, freelist, &freepage->lru); + if (!list_empty(&sublist)) + list_splice_tail(&sublist, freelist); + } +} + /* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. @@ -1207,6 +1226,146 @@ typedef enum { */ int sysctl_compact_unevictable_allowed __read_mostly = 1; +static inline void +update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) +{ + if (cc->fast_start_pfn == ULONG_MAX) + return; + + if (!cc->fast_start_pfn) + cc->fast_start_pfn = pfn; + + cc->fast_start_pfn = min(cc->fast_start_pfn, pfn); +} + +static inline void +reinit_migrate_pfn(struct compact_control *cc) +{ + if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX) + return; + + cc->migrate_pfn = cc->fast_start_pfn; + cc->fast_start_pfn = ULONG_MAX; +} + +/* + * Briefly search the free lists for a migration source that already has + * some free pages to reduce the number of pages that need migration + * before a pageblock is free. + */ +static unsigned long fast_find_migrateblock(struct compact_control *cc) +{ + unsigned int limit = freelist_scan_limit(cc); + unsigned int nr_scanned = 0; + unsigned long distance; + unsigned long pfn = cc->migrate_pfn; + unsigned long high_pfn; + int order; + + /* Skip hints are relied on to avoid repeats on the fast search */ + if (cc->ignore_skip_hint) + return pfn; + + /* + * If the migrate_pfn is not at the start of a zone or the start + * of a pageblock then assume this is a continuation of a previous + * scan restarted due to COMPACT_CLUSTER_MAX. + */ + if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn)) + return pfn; + + /* + * For smaller orders, just linearly scan as the number of pages + * to migrate should be relatively small and does not necessarily + * justify freeing up a large block for a small allocation. + */ + if (cc->order <= PAGE_ALLOC_COSTLY_ORDER) + return pfn; + + /* + * Only allow kcompactd and direct requests for movable pages to + * quickly clear out a MOVABLE pageblock for allocation. This + * reduces the risk that a large movable pageblock is freed for + * an unmovable/reclaimable small allocation. + */ + if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE) + return pfn; + + /* + * When starting the migration scanner, pick any pageblock within the + * first half of the search space. Otherwise try and pick a pageblock + * within the first eighth to reduce the chances that a migration + * target later becomes a source. + */ + distance = (cc->free_pfn - cc->migrate_pfn) >> 1; + if (cc->migrate_pfn != cc->zone->zone_start_pfn) + distance >>= 2; + high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance); + + for (order = cc->order - 1; + order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit; + order--) { + struct free_area *area = &cc->zone->free_area[order]; + struct list_head *freelist; + unsigned long flags; + struct page *freepage; + + if (!area->nr_free) + continue; + + spin_lock_irqsave(&cc->zone->lock, flags); + freelist = &area->free_list[MIGRATE_MOVABLE]; + list_for_each_entry(freepage, freelist, lru) { + unsigned long free_pfn; + + nr_scanned++; + free_pfn = page_to_pfn(freepage); + if (free_pfn < high_pfn) { + update_fast_start_pfn(cc, free_pfn); + + /* + * Avoid if skipped recently. Ideally it would + * move to the tail but even safe iteration of + * the list assumes an entry is deleted, not + * reordered. + */ + if (get_pageblock_skip(freepage)) { + if (list_is_last(freelist, &freepage->lru)) + break; + + continue; + } + + /* Reorder to so a future search skips recent pages */ + move_freelist_tail(freelist, freepage); + + pfn = pageblock_start_pfn(free_pfn); + cc->fast_search_fail = 0; + set_pageblock_skip(freepage); + break; + } + + if (nr_scanned >= limit) { + cc->fast_search_fail++; + move_freelist_tail(freelist, freepage); + break; + } + } + spin_unlock_irqrestore(&cc->zone->lock, flags); + } + + cc->total_migrate_scanned += nr_scanned; + + /* + * If fast scanning failed then use a cached entry for a page block + * that had free pages as the basis for starting a linear scan. + */ + if (pfn == cc->migrate_pfn) + reinit_migrate_pfn(cc); + + return pfn; +} + /* * Isolate all pages that can be migrated from the first suitable block, * starting at the block pointed to by the migrate scanner pfn within @@ -1222,16 +1381,25 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, const isolate_mode_t isolate_mode = (sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) | (cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0); + bool fast_find_block; /* * Start at where we last stopped, or beginning of the zone as - * initialized by compact_zone() + * initialized by compact_zone(). The first failure will use + * the lowest PFN as the starting point for linear scanning. */ - low_pfn = cc->migrate_pfn; + low_pfn = fast_find_migrateblock(cc); block_start_pfn = pageblock_start_pfn(low_pfn); if (block_start_pfn < zone->zone_start_pfn) block_start_pfn = zone->zone_start_pfn; + /* + * fast_find_migrateblock marks a pageblock skipped so to avoid + * the isolation_suitable check below, check whether the fast + * search was successful. + */ + fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail; + /* Only scan within a pageblock boundary */ block_end_pfn = pageblock_end_pfn(low_pfn); @@ -1240,6 +1408,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, * Do not cross the free scanner. */ for (; block_end_pfn <= cc->free_pfn; + fast_find_block = false, low_pfn = block_end_pfn, block_start_pfn = block_end_pfn, block_end_pfn += pageblock_nr_pages) { @@ -1259,7 +1428,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, continue; /* If isolation recently failed, do not retry */ - if (!isolation_suitable(cc, page)) + if (!isolation_suitable(cc, page) && !fast_find_block) continue; /* @@ -1550,6 +1719,7 @@ static enum compact_result compact_zone(struct compact_control *cc) * want to compact the whole zone), but check that it is initialised * by ensuring the values are within zone boundaries. */ + cc->fast_start_pfn = 0; if (cc->whole_zone) { cc->migrate_pfn = start_pfn; cc->free_pfn = pageblock_start_pfn(end_pfn - 1); diff --git a/mm/internal.h b/mm/internal.h index 9b32f4cab0ae..983cb975545f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -188,9 +188,11 @@ struct compact_control { unsigned int nr_migratepages; /* Number of pages to migrate */ unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ + unsigned long fast_start_pfn; /* a pfn to start linear scan from */ struct zone *zone; unsigned long total_migrate_scanned; unsigned long total_free_scanned; + unsigned int fast_search_fail; /* failures to use free list searches */ const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ int migratetype; /* migratetype of direct compactor */ From patchwork Fri Jan 18 17:51:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033518 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8287FC5AC81 for ; Fri, 18 Jan 2019 17:53:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5629520850 for ; Fri, 18 Jan 2019 17:53:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728891AbfARRxc (ORCPT ); Fri, 18 Jan 2019 12:53:32 -0500 Received: from outbound-smtp10.blacknight.com ([46.22.139.15]:58091 "EHLO outbound-smtp10.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728439AbfARRxc (ORCPT ); Fri, 18 Jan 2019 12:53:32 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp10.blacknight.com (Postfix) with ESMTPS id CBACE1C3145 for ; Fri, 18 Jan 2019 17:53:28 +0000 (GMT) Received: (qmail 3552 invoked from network); 18 Jan 2019 17:53:28 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:53:28 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 10/22] mm, compaction: Keep migration source private to a single compaction instance Date: Fri, 18 Jan 2019 17:51:24 +0000 Message-Id: <20190118175136.31341-11-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Due to either a fast search of the free list or a linear scan, it is possible for multiple compaction instances to pick the same pageblock for migration. This is lucky for one scanner and increased scanning for all the others. It also allows a race between requests on which first allocates the resulting free block. This patch tests and updates the pageblock skip for the migration scanner carefully. When isolating a block, it will check and skip if the block is already in use. Once the zone lock is acquired, it will be rechecked so that only one scanner can set the pageblock skip for exclusive use. Any scanner contending will continue with a linear scan. The skip bit is still set if no pages can be isolated in a range. While this may result in redundant scanning, it avoids unnecessarily acquiring the zone lock when there are no suitable migration sources. 1-socket thpscale Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 3390.40 ( 0.00%) 3024.41 ( 10.80%) Amean fault-both-5 5082.28 ( 0.00%) 4749.30 ( 6.55%) Amean fault-both-7 7012.51 ( 0.00%) 6454.95 ( 7.95%) Amean fault-both-12 11346.63 ( 0.00%) 10324.83 ( 9.01%) Amean fault-both-18 15324.19 ( 0.00%) 12896.82 * 15.84%* Amean fault-both-24 16088.50 ( 0.00%) 13470.60 * 16.27%* Amean fault-both-30 18723.42 ( 0.00%) 17143.99 ( 8.44%) Amean fault-both-32 18612.01 ( 0.00%) 17743.91 ( 4.66%) 5.0.0-rc1 5.0.0-rc1 findmig-v3r15 isolmig-v3r15 Percentage huge-3 89.83 ( 0.00%) 92.96 ( 3.48%) Percentage huge-5 91.96 ( 0.00%) 93.26 ( 1.41%) Percentage huge-7 92.85 ( 0.00%) 93.63 ( 0.84%) Percentage huge-12 92.74 ( 0.00%) 92.80 ( 0.07%) Percentage huge-18 91.71 ( 0.00%) 91.62 ( -0.10%) Percentage huge-24 92.13 ( 0.00%) 91.50 ( -0.69%) Percentage huge-30 93.79 ( 0.00%) 92.73 ( -1.13%) Percentage huge-32 91.27 ( 0.00%) 91.94 ( 0.74%) This shows a reasonable reduction in latency as multiple compaction scanners do not operate on the same blocks with a similar allocation success rate. Compaction migrate scanned 41093126 25646769 Migration scan rates are reduced by 38%. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 124 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 99 insertions(+), 25 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 92d10eb3d1c7..7c4c9cce7907 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -285,13 +285,52 @@ void reset_isolation_suitable(pg_data_t *pgdat) } } +/* + * Sets the pageblock skip bit if it was clear. Note that this is a hint as + * locks are not required for read/writers. Returns true if it was already set. + */ +static bool test_and_set_skip(struct compact_control *cc, struct page *page, + unsigned long pfn) +{ + bool skip; + + /* Do no update if skip hint is being ignored */ + if (cc->ignore_skip_hint) + return false; + + if (!IS_ALIGNED(pfn, pageblock_nr_pages)) + return false; + + skip = get_pageblock_skip(page); + if (!skip && !cc->no_set_skip_hint) + set_pageblock_skip(page); + + return skip; +} + +static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) +{ + struct zone *zone = cc->zone; + + pfn = pageblock_end_pfn(pfn); + + /* Set for isolation rather than compaction */ + if (cc->no_set_skip_hint) + return; + + if (pfn > zone->compact_cached_migrate_pfn[0]) + zone->compact_cached_migrate_pfn[0] = pfn; + if (cc->mode != MIGRATE_ASYNC && + pfn > zone->compact_cached_migrate_pfn[1]) + zone->compact_cached_migrate_pfn[1] = pfn; +} + /* * If no pages were isolated then mark this pageblock to be skipped in the * future. The information is later cleared by __reset_isolation_suitable(). */ static void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated, - bool migrate_scanner) + struct page *page, unsigned long nr_isolated) { struct zone *zone = cc->zone; unsigned long pfn; @@ -310,16 +349,8 @@ static void update_pageblock_skip(struct compact_control *cc, pfn = page_to_pfn(page); /* Update where async and sync compaction should restart */ - if (migrate_scanner) { - if (pfn > zone->compact_cached_migrate_pfn[0]) - zone->compact_cached_migrate_pfn[0] = pfn; - if (cc->mode != MIGRATE_ASYNC && - pfn > zone->compact_cached_migrate_pfn[1]) - zone->compact_cached_migrate_pfn[1] = pfn; - } else { - if (pfn < zone->compact_cached_free_pfn) - zone->compact_cached_free_pfn = pfn; - } + if (pfn < zone->compact_cached_free_pfn) + zone->compact_cached_free_pfn = pfn; } #else static inline bool isolation_suitable(struct compact_control *cc, @@ -334,10 +365,19 @@ static inline bool pageblock_skip_persistent(struct page *page) } static inline void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated, - bool migrate_scanner) + struct page *page, unsigned long nr_isolated) { } + +static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) +{ +} + +static bool test_and_set_skip(struct compact_control *cc, struct page *page, + unsigned long pfn) +{ + return false; +} #endif /* CONFIG_COMPACTION */ /* @@ -567,7 +607,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* Update the pageblock-skip if the whole pageblock was scanned */ if (blockpfn == end_pfn) - update_pageblock_skip(cc, valid_page, total_isolated, false); + update_pageblock_skip(cc, valid_page, total_isolated); cc->total_free_scanned += nr_scanned; if (total_isolated) @@ -702,6 +742,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, unsigned long start_pfn = low_pfn; bool skip_on_failure = false; unsigned long next_skip_pfn = 0; + bool skip_updated = false; /* * Ensure that there are not too many pages isolated from the LRU @@ -768,8 +809,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, page = pfn_to_page(low_pfn); - if (!valid_page) + /* + * Check if the pageblock has already been marked skipped. + * Only the aligned PFN is checked as the caller isolates + * COMPACT_CLUSTER_MAX at a time so the second call must + * not falsely conclude that the block should be skipped. + */ + if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { + if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { + low_pfn = end_pfn; + goto isolate_abort; + } valid_page = page; + } /* * Skip if free. We read page order here without zone lock @@ -850,8 +902,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!locked) { locked = compact_trylock_irqsave(zone_lru_lock(zone), &flags, cc); - if (!locked) + + /* Allow future scanning if the lock is contended */ + if (!locked) { + clear_pageblock_skip(page); break; + } + + /* Try get exclusive access under lock */ + if (!skip_updated) { + skip_updated = true; + if (test_and_set_skip(cc, page, low_pfn)) + goto isolate_abort; + } /* Recheck PageLRU and PageCompound under lock */ if (!PageLRU(page)) @@ -929,15 +992,20 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (unlikely(low_pfn > end_pfn)) low_pfn = end_pfn; +isolate_abort: if (locked) spin_unlock_irqrestore(zone_lru_lock(zone), flags); /* - * Update the pageblock-skip information and cached scanner pfn, - * if the whole pageblock was scanned without isolating any page. + * Updated the cached scanner pfn if the pageblock was scanned + * without isolating a page. The pageblock may not be marked + * skipped already if there were no LRU pages in the block. */ - if (low_pfn == end_pfn) - update_pageblock_skip(cc, valid_page, nr_isolated, true); + if (low_pfn == end_pfn && !nr_isolated) { + if (valid_page && !skip_updated) + set_pageblock_skip(valid_page); + update_cached_migrate(cc, low_pfn); + } trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn, nr_scanned, nr_isolated); @@ -1321,8 +1389,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) nr_scanned++; free_pfn = page_to_pfn(freepage); if (free_pfn < high_pfn) { - update_fast_start_pfn(cc, free_pfn); - /* * Avoid if skipped recently. Ideally it would * move to the tail but even safe iteration of @@ -1339,6 +1405,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) /* Reorder to so a future search skips recent pages */ move_freelist_tail(freelist, freepage); + update_fast_start_pfn(cc, free_pfn); pfn = pageblock_start_pfn(free_pfn); cc->fast_search_fail = 0; set_pageblock_skip(freepage); @@ -1427,8 +1494,15 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, if (!page) continue; - /* If isolation recently failed, do not retry */ - if (!isolation_suitable(cc, page) && !fast_find_block) + /* + * If isolation recently failed, do not retry. Only check the + * pageblock once. COMPACT_CLUSTER_MAX causes a pageblock + * to be visited multiple times. Assume skip was checked + * before making it "skip" so other compaction instances do + * not scan the same block. + */ + if (IS_ALIGNED(low_pfn, pageblock_nr_pages) && + !fast_find_block && !isolation_suitable(cc, page)) continue; /* From patchwork Fri Jan 18 17:51:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033519 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A9ECC07EBF for ; Fri, 18 Jan 2019 17:53:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5D48D20850 for ; Fri, 18 Jan 2019 17:53:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728924AbfARRxm (ORCPT ); Fri, 18 Jan 2019 12:53:42 -0500 Received: from outbound-smtp11.blacknight.com ([46.22.139.106]:39527 "EHLO outbound-smtp11.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728239AbfARRxm (ORCPT ); Fri, 18 Jan 2019 12:53:42 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp11.blacknight.com (Postfix) with ESMTPS id F3B301C35A3 for ; Fri, 18 Jan 2019 17:53:38 +0000 (GMT) Received: (qmail 3977 invoked from network); 18 Jan 2019 17:53:38 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:53:38 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 11/22] mm, compaction: Use free lists to quickly locate a migration target Date: Fri, 18 Jan 2019 17:51:25 +0000 Message-Id: <20190118175136.31341-12-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Similar to the migration scanner, this patch uses the free lists to quickly locate a migration target. The search is different in that lower orders will be searched for a suitable high PFN if necessary but the search is still bound. This is justified on the grounds that the free scanner typically scans linearly much more than the migration scanner. If a free page is found, it is isolated and compaction continues if enough pages were isolated. For SYNC* scanning, the full pageblock is scanned for any remaining free pages so that is can be marked for skipping in the near future. 1-socket thpfioscale 5.0.0-rc1 5.0.0-rc1 isolmig-v3r15 findfree-v3r16 Amean fault-both-3 3024.41 ( 0.00%) 3200.68 ( -5.83%) Amean fault-both-5 4749.30 ( 0.00%) 4847.75 ( -2.07%) Amean fault-both-7 6454.95 ( 0.00%) 6658.92 ( -3.16%) Amean fault-both-12 10324.83 ( 0.00%) 11077.62 ( -7.29%) Amean fault-both-18 12896.82 ( 0.00%) 12403.97 ( 3.82%) Amean fault-both-24 13470.60 ( 0.00%) 15607.10 * -15.86%* Amean fault-both-30 17143.99 ( 0.00%) 18752.27 ( -9.38%) Amean fault-both-32 17743.91 ( 0.00%) 21207.54 * -19.52%* The impact on latency is variable but the search is optimistic and sensitive to the exact system state. Success rates are similar but the major impact is to the rate of scanning 5.0.0-rc1 5.0.0-rc1 isolmig-v3r15 findfree-v3r16 Compaction migrate scanned 25646769 29507205 Compaction free scanned 201558184 100359571 The free scan rates are reduced by 50%. The 2-socket reductions for the free scanner are more dramatic which is a likely reflection that the machine has more memory. [dan.carpenter@oracle.com: Fix static checker warning] [vbabka@suse.cz: Correct number of pages scanned for lower orders] Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 218 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 213 insertions(+), 5 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7c4c9cce7907..19fea4a7b3f4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1124,7 +1124,29 @@ static inline bool compact_scanners_met(struct compact_control *cc) <= (cc->migrate_pfn >> pageblock_order); } -/* Reorder the free list to reduce repeated future searches */ +/* + * Used when scanning for a suitable migration target which scans freelists + * in reverse. Reorders the list such as the unscanned pages are scanned + * first on the next iteration of the free scanner + */ +static void +move_freelist_head(struct list_head *freelist, struct page *freepage) +{ + LIST_HEAD(sublist); + + if (!list_is_last(freelist, &freepage->lru)) { + list_cut_before(&sublist, freelist, &freepage->lru); + if (!list_empty(&sublist)) + list_splice_tail(&sublist, freelist); + } +} + +/* + * Similar to move_freelist_head except used by the migration scanner + * when scanning forward. It's possible for these list operations to + * move against each other if they search the free list exactly in + * lockstep. + */ static void move_freelist_tail(struct list_head *freelist, struct page *freepage) { @@ -1137,6 +1159,186 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage) } } +static void +fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated) +{ + unsigned long start_pfn, end_pfn; + struct page *page = pfn_to_page(pfn); + + /* Do not search around if there are enough pages already */ + if (cc->nr_freepages >= cc->nr_migratepages) + return; + + /* Minimise scanning during async compaction */ + if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC) + return; + + /* Pageblock boundaries */ + start_pfn = pageblock_start_pfn(pfn); + end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone)); + + /* Scan before */ + if (start_pfn != pfn) { + isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false); + if (cc->nr_freepages >= cc->nr_migratepages) + return; + } + + /* Scan after */ + start_pfn = pfn + nr_isolated; + if (start_pfn != end_pfn) + isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false); + + /* Skip this pageblock in the future as it's full or nearly full */ + if (cc->nr_freepages < cc->nr_migratepages) + set_pageblock_skip(page); +} + +static unsigned long +fast_isolate_freepages(struct compact_control *cc) +{ + unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1); + unsigned int nr_scanned = 0; + unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0; + unsigned long nr_isolated = 0; + unsigned long distance; + struct page *page = NULL; + bool scan_start = false; + int order; + + /* Full compaction passes in a negative order */ + if (cc->order <= 0) + return cc->free_pfn; + + /* + * If starting the scan, use a deeper search and use the highest + * PFN found if a suitable one is not found. + */ + if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) { + limit = pageblock_nr_pages >> 1; + scan_start = true; + } + + /* + * Preferred point is in the top quarter of the scan space but take + * a pfn from the top half if the search is problematic. + */ + distance = (cc->free_pfn - cc->migrate_pfn); + low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2)); + min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1)); + + if (WARN_ON_ONCE(min_pfn > low_pfn)) + low_pfn = min_pfn; + + for (order = cc->order - 1; + order >= 0 && !page; + order--) { + struct free_area *area = &cc->zone->free_area[order]; + struct list_head *freelist; + struct page *freepage; + unsigned long flags; + unsigned int order_scanned = 0; + + if (!area->nr_free) + continue; + + spin_lock_irqsave(&cc->zone->lock, flags); + freelist = &area->free_list[MIGRATE_MOVABLE]; + list_for_each_entry_reverse(freepage, freelist, lru) { + unsigned long pfn; + + order_scanned++; + nr_scanned++; + pfn = page_to_pfn(freepage); + + if (pfn >= highest) + highest = pageblock_start_pfn(pfn); + + if (pfn >= low_pfn) { + cc->fast_search_fail = 0; + page = freepage; + break; + } + + if (pfn >= min_pfn && pfn > high_pfn) { + high_pfn = pfn; + + /* Shorten the scan if a candidate is found */ + limit >>= 1; + } + + if (order_scanned >= limit) + break; + } + + /* Use a minimum pfn if a preferred one was not found */ + if (!page && high_pfn) { + page = pfn_to_page(high_pfn); + + /* Update freepage for the list reorder below */ + freepage = page; + } + + /* Reorder to so a future search skips recent pages */ + move_freelist_head(freelist, freepage); + + /* Isolate the page if available */ + if (page) { + if (__isolate_free_page(page, order)) { + set_page_private(page, order); + nr_isolated = 1 << order; + cc->nr_freepages += nr_isolated; + list_add_tail(&page->lru, &cc->freepages); + count_compact_events(COMPACTISOLATED, nr_isolated); + } else { + /* If isolation fails, abort the search */ + order = -1; + page = NULL; + } + } + + spin_unlock_irqrestore(&cc->zone->lock, flags); + + /* + * Smaller scan on next order so the total scan ig related + * to freelist_scan_limit. + */ + if (order_scanned >= limit) + limit = min(1U, limit >> 1); + } + + if (!page) { + cc->fast_search_fail++; + if (scan_start) { + /* + * Use the highest PFN found above min. If one was + * not found, be pessemistic for direct compaction + * and use the min mark. + */ + if (highest) { + page = pfn_to_page(highest); + cc->free_pfn = highest; + } else { + if (cc->direct_compaction) { + page = pfn_to_page(min_pfn); + cc->free_pfn = min_pfn; + } + } + } + } + + if (highest && highest > cc->zone->compact_cached_free_pfn) + cc->zone->compact_cached_free_pfn = highest; + + cc->total_free_scanned += nr_scanned; + if (!page) + return cc->free_pfn; + + low_pfn = page_to_pfn(page); + fast_isolate_around(cc, low_pfn, nr_isolated); + return low_pfn; +} + /* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. @@ -1151,6 +1353,11 @@ static void isolate_freepages(struct compact_control *cc) unsigned long low_pfn; /* lowest pfn scanner is able to scan */ struct list_head *freelist = &cc->freepages; + /* Try a small search of the free lists for a candidate */ + isolate_start_pfn = fast_isolate_freepages(cc); + if (cc->nr_freepages) + goto splitmap; + /* * Initialise the free scanner. The starting point is where we last * successfully isolated from, zone-cached value, or the end of the @@ -1163,7 +1370,7 @@ static void isolate_freepages(struct compact_control *cc) * is using. */ isolate_start_pfn = cc->free_pfn; - block_start_pfn = pageblock_start_pfn(cc->free_pfn); + block_start_pfn = pageblock_start_pfn(isolate_start_pfn); block_end_pfn = min(block_start_pfn + pageblock_nr_pages, zone_end_pfn(zone)); low_pfn = pageblock_end_pfn(cc->migrate_pfn); @@ -1227,9 +1434,6 @@ static void isolate_freepages(struct compact_control *cc) } } - /* __isolate_free_page() does not map the pages */ - split_map_pages(freelist); - /* * Record where the free scanner will restart next time. Either we * broke from the loop and set isolate_start_pfn based on the last @@ -1237,6 +1441,10 @@ static void isolate_freepages(struct compact_control *cc) * and the loop terminated due to isolate_start_pfn < low_pfn */ cc->free_pfn = isolate_start_pfn; + +splitmap: + /* __isolate_free_page() does not map the pages */ + split_map_pages(freelist); } /* From patchwork Fri Jan 18 17:51:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033520 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAD13C07EBF for ; Fri, 18 Jan 2019 17:53:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9C3512087E for ; Fri, 18 Jan 2019 17:53:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728911AbfARRxw (ORCPT ); Fri, 18 Jan 2019 12:53:52 -0500 Received: from outbound-smtp12.blacknight.com ([46.22.139.17]:35359 "EHLO outbound-smtp12.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728613AbfARRxw (ORCPT ); Fri, 18 Jan 2019 12:53:52 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp12.blacknight.com (Postfix) with ESMTPS id 2A6011C35BC for ; Fri, 18 Jan 2019 17:53:49 +0000 (GMT) Received: (qmail 4431 invoked from network); 18 Jan 2019 17:53:49 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:53:49 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 12/22] mm, compaction: Avoid rescanning the same pageblock multiple times Date: Fri, 18 Jan 2019 17:51:26 +0000 Message-Id: <20190118175136.31341-13-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Pageblocks are marked for skip when no pages are isolated after a scan. However, it's possible to hit corner cases where the migration scanner gets stuck near the boundary between the source and target scanner. Due to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that are migrated can be reallocated before the pageblock is complete. The pageblock is not necessarily skipped so it can be rescanned multiple times. Similarly, a pageblock with some dirty/writeback pages may fail to migrate and be rescanned until writeback completes which is wasteful. This patch tracks if a pageblock is being rescanned. If so, then the entire pageblock will be migrated as one operation. This narrows the race window during which pages can be reallocated during migration. Secondly, if there are pages that cannot be isolated then the pageblock will still be fully scanned and marked for skipping. On the second rescan, the pageblock skip is set and the migration scanner makes progress. 5.0.0-rc1 5.0.0-rc1 findfree-v3r16 norescan-v3r16 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 3200.68 ( 0.00%) 3002.07 ( 6.21%) Amean fault-both-5 4847.75 ( 0.00%) 4684.47 ( 3.37%) Amean fault-both-7 6658.92 ( 0.00%) 6815.54 ( -2.35%) Amean fault-both-12 11077.62 ( 0.00%) 10864.02 ( 1.93%) Amean fault-both-18 12403.97 ( 0.00%) 12247.52 ( 1.26%) Amean fault-both-24 15607.10 ( 0.00%) 15683.99 ( -0.49%) Amean fault-both-30 18752.27 ( 0.00%) 18620.02 ( 0.71%) Amean fault-both-32 21207.54 ( 0.00%) 19250.28 * 9.23%* 5.0.0-rc1 5.0.0-rc1 findfree-v3r16 norescan-v3r16 Percentage huge-3 96.86 ( 0.00%) 95.00 ( -1.91%) Percentage huge-5 93.72 ( 0.00%) 94.22 ( 0.53%) Percentage huge-7 94.31 ( 0.00%) 92.35 ( -2.08%) Percentage huge-12 92.66 ( 0.00%) 91.90 ( -0.82%) Percentage huge-18 91.51 ( 0.00%) 89.58 ( -2.11%) Percentage huge-24 90.50 ( 0.00%) 90.03 ( -0.52%) Percentage huge-30 91.57 ( 0.00%) 89.14 ( -2.65%) Percentage huge-32 91.00 ( 0.00%) 90.58 ( -0.46%) Negligible difference but this was likely a case when the specific corner case was not hit. A previous run of the same patch based on an earlier iteration of the series showed large differences where migration rates could be halved when the corner case was hit. The specific corner case where migration scan rates go through the roof was due to a dirty/writeback pageblock located at the boundary of the migration/free scanner did not happen in this case. When it does happen, the scan rates multipled by massive margins. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 32 ++++++++++++++++++++++++++------ mm/internal.h | 1 + 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 19fea4a7b3f4..a31fea7b3f96 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -949,8 +949,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, cc->nr_migratepages++; nr_isolated++; - /* Avoid isolating too much */ - if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) { + /* + * Avoid isolating too much unless this block is being + * rescanned (e.g. dirty/writeback pages, parallel allocation). + */ + if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) { ++low_pfn; break; } @@ -997,11 +1000,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, spin_unlock_irqrestore(zone_lru_lock(zone), flags); /* - * Updated the cached scanner pfn if the pageblock was scanned - * without isolating a page. The pageblock may not be marked - * skipped already if there were no LRU pages in the block. + * Updated the cached scanner pfn once the pageblock has been scanned + * Pages will either be migrated in which case there is no point + * scanning in the near future or migration failed in which case the + * failure reason may persist. The block is marked for skipping if + * there were no pages isolated in the block or if the block is + * rescanned twice in a row. */ - if (low_pfn == end_pfn && !nr_isolated) { + if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { if (valid_page && !skip_updated) set_pageblock_skip(valid_page); update_cached_migrate(cc, low_pfn); @@ -2033,6 +2039,20 @@ static enum compact_result compact_zone(struct compact_control *cc) int err; unsigned long start_pfn = cc->migrate_pfn; + /* + * Avoid multiple rescans which can happen if a page cannot be + * isolated (dirty/writeback in async mode) or if the migrated + * pages are being allocated before the pageblock is cleared. + * The first rescan will capture the entire pageblock for + * migration. If it fails, it'll be marked skip and scanning + * will proceed as normal. + */ + cc->rescan = false; + if (pageblock_start_pfn(last_migrated_pfn) == + pageblock_start_pfn(start_pfn)) { + cc->rescan = true; + } + switch (isolate_migratepages(cc->zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_CONTENDED; diff --git a/mm/internal.h b/mm/internal.h index 983cb975545f..d5b999e5eb5f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -205,6 +205,7 @@ struct compact_control { bool direct_compaction; /* False from kcompactd or /proc/... */ bool whole_zone; /* Whole zone should/has been scanned */ bool contended; /* Signal lock or sched contention */ + bool rescan; /* Rescanning the same pageblock */ }; unsigned long From patchwork Fri Jan 18 17:51:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033521 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41CB6C5AC81 for ; Fri, 18 Jan 2019 17:54:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0F28420850 for ; Fri, 18 Jan 2019 17:54:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728945AbfARRyC (ORCPT ); Fri, 18 Jan 2019 12:54:02 -0500 Received: from outbound-smtp04.blacknight.com ([81.17.249.35]:35215 "EHLO outbound-smtp04.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728607AbfARRyB (ORCPT ); Fri, 18 Jan 2019 12:54:01 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp04.blacknight.com (Postfix) with ESMTPS id 4910998C35 for ; Fri, 18 Jan 2019 17:53:59 +0000 (UTC) Received: (qmail 4915 invoked from network); 18 Jan 2019 17:53:59 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:53:59 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 13/22] mm, compaction: Finish pageblock scanning on contention Date: Fri, 18 Jan 2019 17:51:27 +0000 Message-Id: <20190118175136.31341-14-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Async migration aborts on spinlock contention but contention can be high when there are multiple compaction attempts and kswapd is active. The consequence is that the migration scanners move forward uselessly while still contending on locks for longer while leaving suitable migration sources behind. This patch will acquire the lock but track when contention occurs. When it does, the current pageblock will finish as compaction may succeed for that block and then abort. This will have a variable impact on latency as in some cases useless scanning is avoided (reduces latency) but a lock will be contended (increase latency) or a single contended pageblock is scanned that would otherwise have been skipped (increase latency). 5.0.0-rc1 5.0.0-rc1 norescan-v3r16 finishcontend-v3r16 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 3002.07 ( 0.00%) 3153.17 ( -5.03%) Amean fault-both-5 4684.47 ( 0.00%) 4280.52 ( 8.62%) Amean fault-both-7 6815.54 ( 0.00%) 5811.50 * 14.73%* Amean fault-both-12 10864.02 ( 0.00%) 9276.85 ( 14.61%) Amean fault-both-18 12247.52 ( 0.00%) 11032.67 ( 9.92%) Amean fault-both-24 15683.99 ( 0.00%) 14285.70 ( 8.92%) Amean fault-both-30 18620.02 ( 0.00%) 16293.76 * 12.49%* Amean fault-both-32 19250.28 ( 0.00%) 16721.02 * 13.14%* 5.0.0-rc1 5.0.0-rc1 norescan-v3r16 finishcontend-v3r16 Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%) Percentage huge-3 95.00 ( 0.00%) 96.82 ( 1.92%) Percentage huge-5 94.22 ( 0.00%) 95.40 ( 1.26%) Percentage huge-7 92.35 ( 0.00%) 95.92 ( 3.86%) Percentage huge-12 91.90 ( 0.00%) 96.73 ( 5.25%) Percentage huge-18 89.58 ( 0.00%) 96.77 ( 8.03%) Percentage huge-24 90.03 ( 0.00%) 96.05 ( 6.69%) Percentage huge-30 89.14 ( 0.00%) 96.81 ( 8.60%) Percentage huge-32 90.58 ( 0.00%) 97.41 ( 7.54%) There is a variable impact that is mostly good on latency while allocation success rates are slightly higher. System CPU usage is reduced by about 10% but scan rate impact is mixed Compaction migrate scanned 27997659.00 20148867 Compaction free scanned 120782791.00 118324914 Migration scan rates are reduced 28% which is expected as a pageblock is used by the async scanner instead of skipped. The impact on the free scanner is known to be variable. Overall the primary justification for this patch is that completing scanning of a pageblock is very important for later patches. [yuehaibing@huawei.com: Fix unused variable warning] Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 90 ++++++++++++++++++++++----------------------------------- 1 file changed, 34 insertions(+), 56 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a31fea7b3f96..b261c0bfac24 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -382,24 +382,25 @@ static bool test_and_set_skip(struct compact_control *cc, struct page *page, /* * Compaction requires the taking of some coarse locks that are potentially - * very heavily contended. For async compaction, back out if the lock cannot - * be taken immediately. For sync compaction, spin on the lock if needed. + * very heavily contended. For async compaction, trylock and record if the + * lock is contended. The lock will still be acquired but compaction will + * abort when the current block is finished regardless of success rate. + * Sync compaction acquires the lock. * - * Returns true if the lock is held - * Returns false if the lock is not held and compaction should abort + * Always returns true which makes it easier to track lock state in callers. */ -static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, +static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, struct compact_control *cc) { - if (cc->mode == MIGRATE_ASYNC) { - if (!spin_trylock_irqsave(lock, *flags)) { - cc->contended = true; - return false; - } - } else { - spin_lock_irqsave(lock, *flags); + /* Track if the lock is contended in async mode */ + if (cc->mode == MIGRATE_ASYNC && !cc->contended) { + if (spin_trylock_irqsave(lock, *flags)) + return true; + + cc->contended = true; } + spin_lock_irqsave(lock, *flags); return true; } @@ -432,10 +433,8 @@ static bool compact_unlock_should_abort(spinlock_t *lock, } if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { + if (cc->mode == MIGRATE_ASYNC) cc->contended = true; - return true; - } cond_resched(); } @@ -455,10 +454,8 @@ static inline bool compact_should_abort(struct compact_control *cc) { /* async compaction aborts if contended */ if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { + if (cc->mode == MIGRATE_ASYNC) cc->contended = true; - return true; - } cond_resched(); } @@ -535,18 +532,8 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, * recheck as well. */ if (!locked) { - /* - * The zone lock must be held to isolate freepages. - * Unfortunately this is a very coarse lock and can be - * heavily contended if there are parallel allocations - * or parallel compactions. For async compaction do not - * spin on the lock and we acquire the lock as late as - * possible. - */ - locked = compact_trylock_irqsave(&cc->zone->lock, + locked = compact_lock_irqsave(&cc->zone->lock, &flags, cc); - if (!locked) - break; /* Recheck this is a buddy page under lock */ if (!PageBuddy(page)) @@ -900,15 +887,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* If we already hold the lock, we can skip some rechecking */ if (!locked) { - locked = compact_trylock_irqsave(zone_lru_lock(zone), + locked = compact_lock_irqsave(zone_lru_lock(zone), &flags, cc); - /* Allow future scanning if the lock is contended */ - if (!locked) { - clear_pageblock_skip(page); - break; - } - /* Try get exclusive access under lock */ if (!skip_updated) { skip_updated = true; @@ -951,9 +932,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* * Avoid isolating too much unless this block is being - * rescanned (e.g. dirty/writeback pages, parallel allocation). + * rescanned (e.g. dirty/writeback pages, parallel allocation) + * or a lock is contended. For contention, isolate quickly to + * potentially remove one source of contention. */ - if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) { + if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && + !cc->rescan && !cc->contended) { ++low_pfn; break; } @@ -1416,12 +1400,8 @@ static void isolate_freepages(struct compact_control *cc) isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn, freelist, false); - /* - * If we isolated enough freepages, or aborted due to lock - * contention, terminate. - */ - if ((cc->nr_freepages >= cc->nr_migratepages) - || cc->contended) { + /* Are enough freepages isolated? */ + if (cc->nr_freepages >= cc->nr_migratepages) { if (isolate_start_pfn >= block_end_pfn) { /* * Restart at previous pageblock if more @@ -1463,13 +1443,8 @@ static struct page *compaction_alloc(struct page *migratepage, struct compact_control *cc = (struct compact_control *)data; struct page *freepage; - /* - * Isolate free pages if necessary, and if we are not aborting due to - * contention. - */ if (list_empty(&cc->freepages)) { - if (!cc->contended) - isolate_freepages(cc); + isolate_freepages(cc); if (list_empty(&cc->freepages)) return NULL; @@ -1731,7 +1706,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, low_pfn = isolate_migratepages_block(cc, low_pfn, block_end_pfn, isolate_mode); - if (!low_pfn || cc->contended) + if (!low_pfn) return ISOLATE_ABORT; /* @@ -1761,9 +1736,7 @@ static enum compact_result __compact_finished(struct compact_control *cc) { unsigned int order; const int migratetype = cc->migratetype; - - if (cc->contended || fatal_signal_pending(current)) - return COMPACT_CONTENDED; + int ret; /* Compaction run completes if the migrate and free scanner meet */ if (compact_scanners_met(cc)) { @@ -1798,6 +1771,7 @@ static enum compact_result __compact_finished(struct compact_control *cc) return COMPACT_CONTINUE; /* Direct compactor: Is a suitable page free? */ + ret = COMPACT_NO_SUITABLE_PAGE; for (order = cc->order; order < MAX_ORDER; order++) { struct free_area *area = &cc->zone->free_area[order]; bool can_steal; @@ -1837,11 +1811,15 @@ static enum compact_result __compact_finished(struct compact_control *cc) return COMPACT_SUCCESS; } - return COMPACT_CONTINUE; + ret = COMPACT_CONTINUE; + break; } } - return COMPACT_NO_SUITABLE_PAGE; + if (cc->contended || fatal_signal_pending(current)) + ret = COMPACT_CONTENDED; + + return ret; } static enum compact_result compact_finished(struct compact_control *cc) From patchwork Fri Jan 18 17:51:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033522 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5453C07EBF for ; Fri, 18 Jan 2019 17:54:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C780A20850 for ; Fri, 18 Jan 2019 17:54:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728947AbfARRyM (ORCPT ); Fri, 18 Jan 2019 12:54:12 -0500 Received: from outbound-smtp26.blacknight.com ([81.17.249.194]:56042 "EHLO outbound-smtp26.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728576AbfARRyL (ORCPT ); Fri, 18 Jan 2019 12:54:11 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp26.blacknight.com (Postfix) with ESMTPS id 75487B879A for ; Fri, 18 Jan 2019 17:54:09 +0000 (GMT) Received: (qmail 5490 invoked from network); 18 Jan 2019 17:54:09 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:54:09 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 14/22] mm, compaction: Check early for huge pages encountered by the migration scanner Date: Fri, 18 Jan 2019 17:51:28 +0000 Message-Id: <20190118175136.31341-15-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When scanning for sources or targets, PageCompound is checked for huge pages as they can be skipped quickly but it happens relatively late after a lot of setup and checking. This patch short-cuts the check to make it earlier. It might still change when the lock is acquired but this has less overhead overall. The free scanner advances but the migration scanner does not. Typically the free scanner encounters more movable blocks that change state over the lifetime of the system and also tends to scan more aggressively as it's actively filling its portion of the physical address space with data. This could change in the future but for the moment, this worked better in practice and incurred fewer scan restarts. The impact on latency and allocation success rates is marginal but the free scan rates are reduced by 15% and system CPU usage is reduced by 3.3%. The 2-socket results are not materially different. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index b261c0bfac24..14bb66d48392 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1061,6 +1061,9 @@ static bool suitable_migration_source(struct compact_control *cc, { int block_mt; + if (pageblock_skip_persistent(page)) + return false; + if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; @@ -1695,12 +1698,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, continue; /* - * For async compaction, also only scan in MOVABLE blocks. - * Async compaction is optimistic to see if the minimum amount - * of work satisfies the allocation. + * For async compaction, also only scan in MOVABLE blocks + * without huge pages. Async compaction is optimistic to see + * if the minimum amount of work satisfies the allocation. + * The cached PFN is updated as it's possible that all + * remaining blocks between source and target are unsuitable + * and the compaction scanners fail to meet. */ - if (!suitable_migration_source(cc, page)) + if (!suitable_migration_source(cc, page)) { + update_cached_migrate(cc, block_end_pfn); continue; + } /* Perform the isolation */ low_pfn = isolate_migratepages_block(cc, low_pfn, From patchwork Fri Jan 18 17:51:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033523 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41DDBC07EBF for ; Fri, 18 Jan 2019 17:54:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 21AC620850 for ; Fri, 18 Jan 2019 17:54:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728988AbfARRyW (ORCPT ); Fri, 18 Jan 2019 12:54:22 -0500 Received: from outbound-smtp25.blacknight.com ([81.17.249.193]:44223 "EHLO outbound-smtp25.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728949AbfARRyV (ORCPT ); Fri, 18 Jan 2019 12:54:21 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 9BA53B879A for ; Fri, 18 Jan 2019 17:54:19 +0000 (GMT) Received: (qmail 5918 invoked from network); 18 Jan 2019 17:54:19 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:54:19 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 15/22] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks Date: Fri, 18 Jan 2019 17:51:29 +0000 Message-Id: <20190118175136.31341-16-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Migrate has separate cached PFNs for ASYNC and SYNC* migration on the basis that some migrations will fail in ASYNC mode. However, if the cached PFNs match at the start of scanning and pageblocks are skipped due to having no isolation candidates, then the sync state does not matter. This patch keeps matching cached PFNs in sync until a pageblock with isolation candidates is found. The actual benefit is marginal given that the sync scanner following the async scanner will often skip a number of pageblocks but it's useless work. Any benefit depends heavily on whether the scanners restarted recently. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 14bb66d48392..829540f6f3da 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1969,6 +1969,7 @@ static enum compact_result compact_zone(struct compact_control *cc) unsigned long end_pfn = zone_end_pfn(cc->zone); unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; + bool update_cached; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags, @@ -2016,6 +2017,17 @@ static enum compact_result compact_zone(struct compact_control *cc) last_migrated_pfn = 0; + /* + * Migrate has separate cached PFNs for ASYNC and SYNC* migration on + * the basis that some migrations will fail in ASYNC mode. However, + * if the cached PFNs match and pageblocks are skipped due to having + * no isolation candidates, then the sync state does not matter. + * Until a pageblock with isolation candidates is found, keep the + * cached PFNs in sync to avoid revisiting the same blocks. + */ + update_cached = !sync && + cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1]; + trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); @@ -2047,6 +2059,11 @@ static enum compact_result compact_zone(struct compact_control *cc) last_migrated_pfn = 0; goto out; case ISOLATE_NONE: + if (update_cached) { + cc->zone->compact_cached_migrate_pfn[1] = + cc->zone->compact_cached_migrate_pfn[0]; + } + /* * We haven't isolated and migrated anything, but * there might still be unflushed migrations from @@ -2054,6 +2071,7 @@ static enum compact_result compact_zone(struct compact_control *cc) */ goto check_drain; case ISOLATE_SUCCESS: + update_cached = false; last_migrated_pfn = start_pfn; ; } From patchwork Fri Jan 18 17:51:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033524 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A91FDC5AC81 for ; Fri, 18 Jan 2019 17:54:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 884EB20850 for ; Fri, 18 Jan 2019 17:54:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728996AbfARRyd (ORCPT ); Fri, 18 Jan 2019 12:54:33 -0500 Received: from outbound-smtp02.blacknight.com ([81.17.249.8]:52809 "EHLO outbound-smtp02.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728574AbfARRyc (ORCPT ); Fri, 18 Jan 2019 12:54:32 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp02.blacknight.com (Postfix) with ESMTPS id BFCDD98C6A for ; Fri, 18 Jan 2019 17:54:29 +0000 (UTC) Received: (qmail 6551 invoked from network); 18 Jan 2019 17:54:29 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:54:29 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 16/22] mm, compaction: Rework compact_should_abort as compact_check_resched Date: Fri, 18 Jan 2019 17:51:30 +0000 Message-Id: <20190118175136.31341-17-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With incremental changes, compact_should_abort no longer makes any documented sense. Rename to compact_check_resched and update the associated comments. There is no benefit other than reducing redundant code and making the intent slightly clearer. It could potentially be merged with earlier patches but it just makes the review slightly harder. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 61 ++++++++++++++++++++++----------------------------------- 1 file changed, 23 insertions(+), 38 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 829540f6f3da..9aa71945255d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -404,6 +404,21 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } +/* + * Aside from avoiding lock contention, compaction also periodically checks + * need_resched() and records async compaction as contended if necessary. + */ +static inline void compact_check_resched(struct compact_control *cc) +{ + /* async compaction aborts if contended */ + if (need_resched()) { + if (cc->mode == MIGRATE_ASYNC) + cc->contended = true; + + cond_resched(); + } +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avoid @@ -432,33 +447,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock, return true; } - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) - cc->contended = true; - cond_resched(); - } - - return false; -} - -/* - * Aside from avoiding lock contention, compaction also periodically checks - * need_resched() and either schedules in sync compaction or aborts async - * compaction. This is similar to what compact_unlock_should_abort() does, but - * is used where no lock is concerned. - * - * Returns false when no scheduling was needed, or sync compaction scheduled. - * Returns true when async compaction should abort. - */ -static inline bool compact_should_abort(struct compact_control *cc) -{ - /* async compaction aborts if contended */ - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) - cc->contended = true; - - cond_resched(); - } + compact_check_resched(cc); return false; } @@ -747,8 +736,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, return 0; } - if (compact_should_abort(cc)) - return 0; + compact_check_resched(cc); if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) { skip_on_failure = true; @@ -1379,12 +1367,10 @@ static void isolate_freepages(struct compact_control *cc) isolate_start_pfn = block_start_pfn) { /* * This can iterate a massively long zone without finding any - * suitable migration targets, so periodically check if we need - * to schedule, or even abort async compaction. + * suitable migration targets, so periodically check resched. */ - if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)) - && compact_should_abort(cc)) - break; + if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) + compact_check_resched(cc); page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); @@ -1675,11 +1661,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* * This can potentially iterate a massively long zone with * many pageblocks unsuitable, so periodically check if we - * need to schedule, or even abort async compaction. + * need to schedule. */ - if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)) - && compact_should_abort(cc)) - break; + if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) + compact_check_resched(cc); page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); From patchwork Fri Jan 18 17:51:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033525 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14201C07EBF for ; Fri, 18 Jan 2019 17:54:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D7FD420850 for ; Fri, 18 Jan 2019 17:54:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729012AbfARRym (ORCPT ); Fri, 18 Jan 2019 12:54:42 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:41757 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728574AbfARRym (ORCPT ); Fri, 18 Jan 2019 12:54:42 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id EF0A11C3080 for ; Fri, 18 Jan 2019 17:54:39 +0000 (GMT) Received: (qmail 6988 invoked from network); 18 Jan 2019 17:54:39 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:54:39 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 17/22] mm, compaction: Do not consider a need to reschedule as contention Date: Fri, 18 Jan 2019 17:51:31 +0000 Message-Id: <20190118175136.31341-18-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Scanning on large machines can take a considerable length of time and eventually need to be rescheduled. This is treated as an abort event but that's not appropriate as the attempt is likely to be retried after making numerous checks and taking another cycle through the page allocator. This patch will check the need to reschedule if necessary but continue the scanning. The main benefit is reduced scanning when compaction is taking a long time or the machine is over-saturated. It also avoids an unnecessary exit of compaction that ends up being retried by the page allocator in the outer loop. 5.0.0-rc1 5.0.0-rc1 synccached-v3r16 noresched-v3r17 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 2958.27 ( 0.00%) 2965.68 ( -0.25%) Amean fault-both-5 4091.90 ( 0.00%) 3995.90 ( 2.35%) Amean fault-both-7 5803.05 ( 0.00%) 5842.12 ( -0.67%) Amean fault-both-12 9481.06 ( 0.00%) 9550.87 ( -0.74%) Amean fault-both-18 14141.51 ( 0.00%) 13304.72 ( 5.92%) Amean fault-both-24 16438.00 ( 0.00%) 14618.59 ( 11.07%) Amean fault-both-30 17531.72 ( 0.00%) 16650.96 ( 5.02%) Amean fault-both-32 17101.96 ( 0.00%) 17145.15 ( -0.25%) Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 23 ++++------------------- 1 file changed, 4 insertions(+), 19 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9aa71945255d..293d9a9e6f00 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -404,21 +404,6 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } -/* - * Aside from avoiding lock contention, compaction also periodically checks - * need_resched() and records async compaction as contended if necessary. - */ -static inline void compact_check_resched(struct compact_control *cc) -{ - /* async compaction aborts if contended */ - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) - cc->contended = true; - - cond_resched(); - } -} - /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avoid @@ -447,7 +432,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock, return true; } - compact_check_resched(cc); + cond_resched(); return false; } @@ -736,7 +721,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, return 0; } - compact_check_resched(cc); + cond_resched(); if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) { skip_on_failure = true; @@ -1370,7 +1355,7 @@ static void isolate_freepages(struct compact_control *cc) * suitable migration targets, so periodically check resched. */ if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) - compact_check_resched(cc); + cond_resched(); page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); @@ -1664,7 +1649,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, * need to schedule. */ if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) - compact_check_resched(cc); + cond_resched(); page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); From patchwork Fri Jan 18 17:51:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033526 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 593B2C5AC81 for ; Fri, 18 Jan 2019 17:54:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 39C0020850 for ; Fri, 18 Jan 2019 17:54:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729044AbfARRyx (ORCPT ); Fri, 18 Jan 2019 12:54:53 -0500 Received: from outbound-smtp11.blacknight.com ([46.22.139.106]:56603 "EHLO outbound-smtp11.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728574AbfARRyx (ORCPT ); Fri, 18 Jan 2019 12:54:53 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp11.blacknight.com (Postfix) with ESMTPS id 20DA81C2CD2 for ; Fri, 18 Jan 2019 17:54:50 +0000 (GMT) Received: (qmail 7288 invoked from network); 18 Jan 2019 17:54:50 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:54:50 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 18/22] mm, compaction: Reduce premature advancement of the migration target scanner Date: Fri, 18 Jan 2019 17:51:32 +0000 Message-Id: <20190118175136.31341-19-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The fast isolation of free pages allows the cached PFN of the free scanner to advance faster than necessary depending on the contents of the free list. The key is that fast_isolate_freepages() can update zone->compact_cached_free_pfn via isolate_freepages_block(). When the fast search fails, the linear scan can start from a point that has skipped valid migration targets, particularly pageblocks with just low-order free pages. This can cause the migration source/target scanners to meet prematurely causing a reset. This patch starts by avoiding an update of the pageblock skip information and cached PFN from isolate_freepages_block() and puts the responsibility of updating that information in the callers. The fast scanner will update the cached PFN if and only if it finds a block that is higher than the existing cached PFN and sets the skip if the pageblock is full or nearly full. The linear scanner will update skipped information and the cached PFN only when a block is completely scanned. The total impact is that the free scanner advances more slowly as it is primarily driven by the linear scanner instead of the fast search. 5.0.0-rc1 5.0.0-rc1 noresched-v3r17 slowfree-v3r17 Amean fault-both-3 2965.68 ( 0.00%) 3036.75 ( -2.40%) Amean fault-both-5 3995.90 ( 0.00%) 4522.24 * -13.17%* Amean fault-both-7 5842.12 ( 0.00%) 6365.35 ( -8.96%) Amean fault-both-12 9550.87 ( 0.00%) 10340.93 ( -8.27%) Amean fault-both-18 13304.72 ( 0.00%) 14732.46 ( -10.73%) Amean fault-both-24 14618.59 ( 0.00%) 16288.96 ( -11.43%) Amean fault-both-30 16650.96 ( 0.00%) 16346.21 ( 1.83%) Amean fault-both-32 17145.15 ( 0.00%) 19317.49 ( -12.67%) The impact to latency is higher than the last version but it appears to be due to a slight increase in the free scan rates which is a potential side-effect of the patch. However, this is necessary for later patches that are more careful about how pageblocks are treated as earlier iterations of those patches hit corner cases where the restarts were punishing and very visible. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 293d9a9e6f00..04ec7d4da719 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -330,10 +330,9 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) * future. The information is later cleared by __reset_isolation_suitable(). */ static void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated) + struct page *page, unsigned long pfn) { struct zone *zone = cc->zone; - unsigned long pfn; if (cc->no_set_skip_hint) return; @@ -341,13 +340,8 @@ static void update_pageblock_skip(struct compact_control *cc, if (!page) return; - if (nr_isolated) - return; - set_pageblock_skip(page); - pfn = page_to_pfn(page); - /* Update where async and sync compaction should restart */ if (pfn < zone->compact_cached_free_pfn) zone->compact_cached_free_pfn = pfn; @@ -365,7 +359,7 @@ static inline bool pageblock_skip_persistent(struct page *page) } static inline void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated) + struct page *page, unsigned long pfn) { } @@ -449,7 +443,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, bool strict) { int nr_scanned = 0, total_isolated = 0; - struct page *cursor, *valid_page = NULL; + struct page *cursor; unsigned long flags = 0; bool locked = false; unsigned long blockpfn = *start_pfn; @@ -476,9 +470,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (!pfn_valid_within(blockpfn)) goto isolate_fail; - if (!valid_page) - valid_page = page; - /* * For compound pages such as THP and hugetlbfs, we can save * potentially a lot of iterations if we skip them at once. @@ -566,10 +557,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (strict && blockpfn < end_pfn) total_isolated = 0; - /* Update the pageblock-skip if the whole pageblock was scanned */ - if (blockpfn == end_pfn) - update_pageblock_skip(cc, valid_page, total_isolated); - cc->total_free_scanned += nr_scanned; if (total_isolated) count_compact_events(COMPACTISOLATED, total_isolated); @@ -1293,8 +1280,10 @@ fast_isolate_freepages(struct compact_control *cc) } } - if (highest && highest > cc->zone->compact_cached_free_pfn) + if (highest && highest >= cc->zone->compact_cached_free_pfn) { + highest -= pageblock_nr_pages; cc->zone->compact_cached_free_pfn = highest; + } cc->total_free_scanned += nr_scanned; if (!page) @@ -1374,6 +1363,10 @@ static void isolate_freepages(struct compact_control *cc) isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn, freelist, false); + /* Update the skip hint if the full pageblock was scanned */ + if (isolate_start_pfn == block_end_pfn) + update_pageblock_skip(cc, page, block_start_pfn); + /* Are enough freepages isolated? */ if (cc->nr_freepages >= cc->nr_migratepages) { if (isolate_start_pfn >= block_end_pfn) { From patchwork Fri Jan 18 17:51:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033527 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 533D0C07EBF for ; Fri, 18 Jan 2019 17:55:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 355E320850 for ; Fri, 18 Jan 2019 17:55:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729053AbfARRzD (ORCPT ); Fri, 18 Jan 2019 12:55:03 -0500 Received: from outbound-smtp02.blacknight.com ([81.17.249.8]:46636 "EHLO outbound-smtp02.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727673AbfARRzD (ORCPT ); Fri, 18 Jan 2019 12:55:03 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp02.blacknight.com (Postfix) with ESMTPS id 44C7998BD2 for ; Fri, 18 Jan 2019 17:55:00 +0000 (UTC) Received: (qmail 7671 invoked from network); 18 Jan 2019 17:55:00 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:55:00 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 19/22] mm, compaction: Round-robin the order while searching the free lists for a target Date: Fri, 18 Jan 2019 17:51:33 +0000 Message-Id: <20190118175136.31341-20-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As compaction proceeds and creates high-order blocks, the free list search gets less efficient as the larger blocks are used as compaction targets. Eventually, the larger blocks will be behind the migration scanner for partially migrated pageblocks and the search fails. This patch round-robins what orders are searched so that larger blocks can be ignored and find smaller blocks that can be used as migration targets. The overall impact was small on 1-socket but it avoids corner cases where the migration/free scanners meet prematurely or situations where many of the pageblocks encountered by the free scanner are almost full instead of being properly packed. Previous testing had indicated that without this patch there were occasional large spikes in the free scanner without this patch. [dan.carpenter@oracle.com: Fix static checker warning] Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 33 ++++++++++++++++++++++++++++++--- mm/internal.h | 3 ++- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 04ec7d4da719..7d39ff08269a 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1147,6 +1147,24 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long set_pageblock_skip(page); } +/* Search orders in round-robin fashion */ +static int next_search_order(struct compact_control *cc, int order) +{ + order--; + if (order < 0) + order = cc->order - 1; + + /* Search wrapped around? */ + if (order == cc->search_order) { + cc->search_order--; + if (cc->search_order < 0) + cc->search_order = cc->order - 1; + return -1; + } + + return order; +} + static unsigned long fast_isolate_freepages(struct compact_control *cc) { @@ -1183,9 +1201,15 @@ fast_isolate_freepages(struct compact_control *cc) if (WARN_ON_ONCE(min_pfn > low_pfn)) low_pfn = min_pfn; - for (order = cc->order - 1; - order >= 0 && !page; - order--) { + /* + * Search starts from the last successful isolation order or the next + * order to search after a previous failure + */ + cc->search_order = min_t(unsigned int, cc->order - 1, cc->search_order); + + for (order = cc->search_order; + !page && order >= 0; + order = next_search_order(cc, order)) { struct free_area *area = &cc->zone->free_area[order]; struct list_head *freelist; struct page *freepage; @@ -1209,6 +1233,7 @@ fast_isolate_freepages(struct compact_control *cc) if (pfn >= low_pfn) { cc->fast_search_fail = 0; + cc->search_order = order; page = freepage; break; } @@ -2136,6 +2161,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, .total_migrate_scanned = 0, .total_free_scanned = 0, .order = order, + .search_order = order, .gfp_mask = gfp_mask, .zone = zone, .mode = (prio == COMPACT_PRIO_ASYNC) ? @@ -2367,6 +2393,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) struct zone *zone; struct compact_control cc = { .order = pgdat->kcompactd_max_order, + .search_order = pgdat->kcompactd_max_order, .total_migrate_scanned = 0, .total_free_scanned = 0, .classzone_idx = pgdat->kcompactd_classzone_idx, diff --git a/mm/internal.h b/mm/internal.h index d5b999e5eb5f..31bb0be6fd52 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -192,7 +192,8 @@ struct compact_control { struct zone *zone; unsigned long total_migrate_scanned; unsigned long total_free_scanned; - unsigned int fast_search_fail; /* failures to use free list searches */ + unsigned short fast_search_fail;/* failures to use free list searches */ + short search_order; /* order to start a fast search at */ const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ int migratetype; /* migratetype of direct compactor */ From patchwork Fri Jan 18 17:51:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033528 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B97CC5AC81 for ; Fri, 18 Jan 2019 17:55:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C81920850 for ; Fri, 18 Jan 2019 17:55:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729066AbfARRzM (ORCPT ); Fri, 18 Jan 2019 12:55:12 -0500 Received: from outbound-smtp04.blacknight.com ([81.17.249.35]:35915 "EHLO outbound-smtp04.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbfARRzM (ORCPT ); Fri, 18 Jan 2019 12:55:12 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp04.blacknight.com (Postfix) with ESMTPS id 6DD3898C47 for ; Fri, 18 Jan 2019 17:55:10 +0000 (UTC) Received: (qmail 8177 invoked from network); 18 Jan 2019 17:55:10 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:55:10 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 20/22] mm, compaction: Sample pageblocks for free pages Date: Fri, 18 Jan 2019 17:51:34 +0000 Message-Id: <20190118175136.31341-21-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Once fast searching finishes, there is a possibility that the linear scanner is scanning full blocks found by the fast scanner earlier. This patch uses an adaptive stride to sample pageblocks for free pages. The more consecutive full pageblocks encountered, the larger the stride until a pageblock with free pages is found. The scanners might meet slightly sooner but it is an acceptable risk given that the search of the free lists may still encounter the pages and adjust the cached PFN of the free scanner accordingly. 5.0.0-rc1 5.0.0-rc1 roundrobin-v3r17 samplefree-v3r17 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 2752.37 ( 0.00%) 2729.95 ( 0.81%) Amean fault-both-5 4341.69 ( 0.00%) 4397.80 ( -1.29%) Amean fault-both-7 6308.75 ( 0.00%) 6097.61 ( 3.35%) Amean fault-both-12 10241.81 ( 0.00%) 9407.15 ( 8.15%) Amean fault-both-18 13736.09 ( 0.00%) 10857.63 * 20.96%* Amean fault-both-24 16853.95 ( 0.00%) 13323.24 * 20.95%* Amean fault-both-30 15862.61 ( 0.00%) 17345.44 ( -9.35%) Amean fault-both-32 18450.85 ( 0.00%) 16892.00 ( 8.45%) The latency is mildly improved offseting some overhead from earlier patches that are prerequisites for the rest of the series. However, a major impact is on the free scan rate with an 82% reduction. 5.0.0-rc1 5.0.0-rc1 roundrobin-v3r17 samplefree-v3r17 Compaction migrate scanned 21607271 20116887 Compaction free scanned 95336406 16668703 It's also the first time in the series where the number of pages scanned by the migration scanner is greater than the free scanner due to the increased search efficiency. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7d39ff08269a..74bf620e3dcd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -440,6 +440,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, unsigned long *start_pfn, unsigned long end_pfn, struct list_head *freelist, + unsigned int stride, bool strict) { int nr_scanned = 0, total_isolated = 0; @@ -449,10 +450,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, unsigned long blockpfn = *start_pfn; unsigned int order; + /* Strict mode is for isolation, speed is secondary */ + if (strict) + stride = 1; + cursor = pfn_to_page(blockpfn); /* Isolate free pages. */ - for (; blockpfn < end_pfn; blockpfn++, cursor++) { + for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) { int isolated; struct page *page = cursor; @@ -614,7 +619,7 @@ isolate_freepages_range(struct compact_control *cc, break; isolated = isolate_freepages_block(cc, &isolate_start_pfn, - block_end_pfn, &freelist, true); + block_end_pfn, &freelist, 0, true); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -1132,7 +1137,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long /* Scan before */ if (start_pfn != pfn) { - isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false); + isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false); if (cc->nr_freepages >= cc->nr_migratepages) return; } @@ -1140,7 +1145,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long /* Scan after */ start_pfn = pfn + nr_isolated; if (start_pfn != end_pfn) - isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false); + isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); /* Skip this pageblock in the future as it's full or nearly full */ if (cc->nr_freepages < cc->nr_migratepages) @@ -1332,6 +1337,7 @@ static void isolate_freepages(struct compact_control *cc) unsigned long block_end_pfn; /* end of current pageblock */ unsigned long low_pfn; /* lowest pfn scanner is able to scan */ struct list_head *freelist = &cc->freepages; + unsigned int stride; /* Try a small search of the free lists for a candidate */ isolate_start_pfn = fast_isolate_freepages(cc); @@ -1354,6 +1360,7 @@ static void isolate_freepages(struct compact_control *cc) block_end_pfn = min(block_start_pfn + pageblock_nr_pages, zone_end_pfn(zone)); low_pfn = pageblock_end_pfn(cc->migrate_pfn); + stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1; /* * Isolate free pages until enough are available to migrate the @@ -1364,6 +1371,8 @@ static void isolate_freepages(struct compact_control *cc) block_end_pfn = block_start_pfn, block_start_pfn -= pageblock_nr_pages, isolate_start_pfn = block_start_pfn) { + unsigned long nr_isolated; + /* * This can iterate a massively long zone without finding any * suitable migration targets, so periodically check resched. @@ -1385,8 +1394,8 @@ static void isolate_freepages(struct compact_control *cc) continue; /* Found a block suitable for isolating free pages from. */ - isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn, - freelist, false); + nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn, + block_end_pfn, freelist, stride, false); /* Update the skip hint if the full pageblock was scanned */ if (isolate_start_pfn == block_end_pfn) @@ -1410,6 +1419,13 @@ static void isolate_freepages(struct compact_control *cc) */ break; } + + /* Adjust stride depending on isolation */ + if (nr_isolated) { + stride = 1; + continue; + } + stride = min_t(unsigned int, COMPACT_CLUSTER_MAX, stride << 1); } /* From patchwork Fri Jan 18 17:51:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033529 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E83EEC07EBF for ; Fri, 18 Jan 2019 17:55:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C95FB20850 for ; Fri, 18 Jan 2019 17:55:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728846AbfARRzY (ORCPT ); Fri, 18 Jan 2019 12:55:24 -0500 Received: from outbound-smtp13.blacknight.com ([46.22.139.230]:37265 "EHLO outbound-smtp13.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbfARRzX (ORCPT ); Fri, 18 Jan 2019 12:55:23 -0500 Received: from mail.blacknight.com (unknown [81.17.254.16]) by outbound-smtp13.blacknight.com (Postfix) with ESMTPS id E977E1C3185 for ; Fri, 18 Jan 2019 17:55:20 +0000 (GMT) Received: (qmail 8547 invoked from network); 18 Jan 2019 17:55:20 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:55:20 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 21/22] mm, compaction: Be selective about what pageblocks to clear skip hints Date: Fri, 18 Jan 2019 17:51:35 +0000 Message-Id: <20190118175136.31341-22-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Pageblock hints are cleared when compaction restarts or kswapd makes enough progress that it can sleep but it's over-eager in that the bit is cleared for migration sources with no LRU pages and migration targets with no free pages. As pageblock skip hint flushes are relatively rare and out-of-band with respect to kswapd, this patch makes a few more expensive checks to see if it's appropriate to even clear the bit. Every pageblock that is not cleared will avoid 512 pages being scanned unnecessarily on x86-64. The impact is variable with different workloads showing small differences in latency, success rates and scan rates. This is expected as clearing the hints is not that common but doing a small amount of work out-of-band to avoid a large amount of work in-band later is generally a good thing. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 2 + mm/compaction.c | 125 ++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 109 insertions(+), 18 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 842f9189537b..90c13cdeefb5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -480,6 +480,8 @@ struct zone { unsigned long compact_cached_free_pfn; /* pfn where async and sync compaction migration scanner should start */ unsigned long compact_cached_migrate_pfn[2]; + unsigned long compact_init_migrate_pfn; + unsigned long compact_init_free_pfn; #endif #ifdef CONFIG_COMPACTION diff --git a/mm/compaction.c b/mm/compaction.c index 74bf620e3dcd..de558f110319 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -237,6 +237,71 @@ static bool pageblock_skip_persistent(struct page *page) return false; } +static bool +__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source, + bool check_target) +{ + struct page *page = pfn_to_online_page(pfn); + struct page *end_page; + unsigned long block_pfn; + + if (!page) + return false; + if (zone != page_zone(page)) + return false; + if (pageblock_skip_persistent(page)) + return false; + + /* + * If skip is already cleared do no further checking once the + * restart points have been set. + */ + if (check_source && check_target && !get_pageblock_skip(page)) + return true; + + /* + * If clearing skip for the target scanner, do not select a + * non-movable pageblock as the starting point. + */ + if (!check_source && check_target && + get_pageblock_migratetype(page) != MIGRATE_MOVABLE) + return false; + + /* + * Only clear the hint if a sample indicates there is either a + * free page or an LRU page in the block. One or other condition + * is necessary for the block to be a migration source/target. + */ + block_pfn = pageblock_start_pfn(pfn); + pfn = max(block_pfn, zone->zone_start_pfn); + page = pfn_to_page(pfn); + if (zone != page_zone(page)) + return false; + pfn = block_pfn + pageblock_nr_pages; + pfn = min(pfn, zone_end_pfn(zone)); + end_page = pfn_to_page(pfn); + + do { + if (!pfn_valid_within(pfn)) + continue; + + if (check_source && PageLRU(page)) { + clear_pageblock_skip(page); + return true; + } + + if (check_target && PageBuddy(page)) { + clear_pageblock_skip(page); + return true; + } + + page += (1 << PAGE_ALLOC_COSTLY_ORDER); + pfn += (1 << PAGE_ALLOC_COSTLY_ORDER); + } while (page < end_page); + + return false; +} + /* * This function is called to clear all cached information on pageblocks that * should be skipped for page isolation when the migrate and free page scanner @@ -244,30 +309,54 @@ static bool pageblock_skip_persistent(struct page *page) */ static void __reset_isolation_suitable(struct zone *zone) { - unsigned long start_pfn = zone->zone_start_pfn; - unsigned long end_pfn = zone_end_pfn(zone); - unsigned long pfn; + unsigned long migrate_pfn = zone->zone_start_pfn; + unsigned long free_pfn = zone_end_pfn(zone); + unsigned long reset_migrate = free_pfn; + unsigned long reset_free = migrate_pfn; + bool source_set = false; + bool free_set = false; - zone->compact_blockskip_flush = false; + if (!zone->compact_blockskip_flush) + return; - /* Walk the zone and mark every pageblock as suitable for isolation */ - for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { - struct page *page; + zone->compact_blockskip_flush = false; + /* + * Walk the zone and update pageblock skip information. Source looks + * for PageLRU while target looks for PageBuddy. When the scanner + * is found, both PageBuddy and PageLRU are checked as the pageblock + * is suitable as both source and target. + */ + for (; migrate_pfn < free_pfn; migrate_pfn += pageblock_nr_pages, + free_pfn -= pageblock_nr_pages) { cond_resched(); - page = pfn_to_online_page(pfn); - if (!page) - continue; - if (zone != page_zone(page)) - continue; - if (pageblock_skip_persistent(page)) - continue; + /* Update the migrate PFN */ + if (__reset_isolation_pfn(zone, migrate_pfn, true, source_set) && + migrate_pfn < reset_migrate) { + source_set = true; + reset_migrate = migrate_pfn; + zone->compact_init_migrate_pfn = reset_migrate; + zone->compact_cached_migrate_pfn[0] = reset_migrate; + zone->compact_cached_migrate_pfn[1] = reset_migrate; + } - clear_pageblock_skip(page); + /* Update the free PFN */ + if (__reset_isolation_pfn(zone, free_pfn, free_set, true) && + free_pfn > reset_free) { + free_set = true; + reset_free = free_pfn; + zone->compact_init_free_pfn = reset_free; + zone->compact_cached_free_pfn = reset_free; + } } - reset_cached_positions(zone); + /* Leave no distance if no suitable block was reset */ + if (reset_migrate >= reset_free) { + zone->compact_cached_migrate_pfn[0] = migrate_pfn; + zone->compact_cached_migrate_pfn[1] = migrate_pfn; + zone->compact_cached_free_pfn = free_pfn; + } } void reset_isolation_suitable(pg_data_t *pgdat) @@ -1190,7 +1279,7 @@ fast_isolate_freepages(struct compact_control *cc) * If starting the scan, use a deeper search and use the highest * PFN found if a suitable one is not found. */ - if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) { + if (cc->free_pfn >= cc->zone->compact_init_free_pfn) { limit = pageblock_nr_pages >> 1; scan_start = true; } @@ -2015,7 +2104,7 @@ static enum compact_result compact_zone(struct compact_control *cc) cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn; } - if (cc->migrate_pfn == start_pfn) + if (cc->migrate_pfn <= cc->zone->compact_init_migrate_pfn) cc->whole_zone = true; } From patchwork Fri Jan 18 17:51:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 1033530 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C63BC5AC81 for ; Fri, 18 Jan 2019 17:55:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D85B320850 for ; Fri, 18 Jan 2019 17:55:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729072AbfARRzg (ORCPT ); Fri, 18 Jan 2019 12:55:36 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:56164 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728635AbfARRzf (ORCPT ); Fri, 18 Jan 2019 12:55:35 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id C18411C3110 for ; Fri, 18 Jan 2019 17:55:30 +0000 (GMT) Received: (qmail 9316 invoked from network); 18 Jan 2019 17:55:30 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPA; 18 Jan 2019 17:55:30 -0000 From: Mel Gorman To: Andrew Morton Cc: David Rientjes , Andrea Arcangeli , Vlastimil Babka , Linux List Kernel Mailing , Linux-MM , Mel Gorman Subject: [PATCH 22/22] mm, compaction: Capture a page under direct compaction Date: Fri, 18 Jan 2019 17:51:36 +0000 Message-Id: <20190118175136.31341-23-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190118175136.31341-1-mgorman@techsingularity.net> References: <20190118175136.31341-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Compaction is inherently race-prone as a suitable page freed during compaction can be allocated by any parallel task. This patch uses a capture_control structure to isolate a page immediately when it is freed by a direct compactor in the slow path of the page allocator. The intent is to avoid redundant scanning. 5.0.0-rc1 5.0.0-rc1 selective-v3r17 capture-v3r19 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%) Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%) Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%) Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%) Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%) Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%* Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%) Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%) Latency is only moderately affected but the devil is in the details. A closer examination indicates that base page fault latency is reduced but latency of huge pages is increased as it takes creater care to succeed. Part of the "problem" is that allocation success rates are close to 100% even when under pressure and compaction gets harder 5.0.0-rc1 5.0.0-rc1 selective-v3r17 capture-v3r19 Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%) Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%) Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%) Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%) Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%) Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%) Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%) Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%) And scan rates are reduced as expected by 6% for the migration scanner and 29% for the free scanner indicating that there is less redundant work. Compaction migrate scanned 20815362 19573286 Compaction free scanned 16352612 11510663 Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- include/linux/compaction.h | 3 +- include/linux/sched.h | 4 +++ kernel/sched/core.c | 3 ++ mm/compaction.c | 31 ++++++++++++++----- mm/internal.h | 9 ++++++ mm/page_alloc.c | 74 +++++++++++++++++++++++++++++++++++++++++++--- 6 files changed, 112 insertions(+), 12 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 70d0256edd31..c960923d9ec2 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -93,7 +93,8 @@ extern int sysctl_compact_unevictable_allowed; extern int fragmentation_index(struct zone *zone, unsigned int order); extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, - const struct alloc_context *ac, enum compact_priority prio); + const struct alloc_context *ac, enum compact_priority prio, + struct page **page); extern void reset_isolation_suitable(pg_data_t *pgdat); extern enum compact_result compaction_suitable(struct zone *zone, int order, unsigned int alloc_flags, int classzone_idx); diff --git a/include/linux/sched.h b/include/linux/sched.h index 9a46243e6585..5e6690042497 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -47,6 +47,7 @@ struct pid_namespace; struct pipe_inode_info; struct rcu_node; struct reclaim_state; +struct capture_control; struct robust_list_head; struct sched_attr; struct sched_param; @@ -964,6 +965,9 @@ struct task_struct { struct io_context *io_context; +#ifdef CONFIG_COMPACTION + struct capture_control *capture_control; +#endif /* Ptrace state: */ unsigned long ptrace_message; kernel_siginfo_t *last_siginfo; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a674c7db2f29..ae5beb3ed09e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2177,6 +2177,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) INIT_HLIST_HEAD(&p->preempt_notifiers); #endif +#ifdef CONFIG_COMPACTION + p->capture_control = NULL; +#endif init_numa_balancing(clone_flags, p); } diff --git a/mm/compaction.c b/mm/compaction.c index de558f110319..2a6240d940e9 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2055,7 +2055,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, return false; } -static enum compact_result compact_zone(struct compact_control *cc) +static enum compact_result +compact_zone(struct compact_control *cc, struct capture_control *capc) { enum compact_result ret; unsigned long start_pfn = cc->zone->zone_start_pfn; @@ -2224,6 +2225,11 @@ static enum compact_result compact_zone(struct compact_control *cc) } } + /* Stop if a page has been captured */ + if (capc && capc->page) { + ret = COMPACT_SUCCESS; + break; + } } out: @@ -2257,7 +2263,8 @@ static enum compact_result compact_zone(struct compact_control *cc) static enum compact_result compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, enum compact_priority prio, - unsigned int alloc_flags, int classzone_idx) + unsigned int alloc_flags, int classzone_idx, + struct page **capture) { enum compact_result ret; struct compact_control cc = { @@ -2278,14 +2285,24 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY), .ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY) }; + struct capture_control capc = { + .cc = &cc, + .page = NULL, + }; + + if (capture) + current->capture_control = &capc; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - ret = compact_zone(&cc); + ret = compact_zone(&cc, &capc); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); + *capture = capc.page; + current->capture_control = NULL; + return ret; } @@ -2303,7 +2320,7 @@ int sysctl_extfrag_threshold = 500; */ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, - enum compact_priority prio) + enum compact_priority prio, struct page **capture) { int may_perform_io = gfp_mask & __GFP_IO; struct zoneref *z; @@ -2331,7 +2348,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, } status = compact_zone_order(zone, order, gfp_mask, prio, - alloc_flags, ac_classzone_idx(ac)); + alloc_flags, ac_classzone_idx(ac), capture); rc = max(status, rc); /* The allocation should succeed, stop compacting */ @@ -2399,7 +2416,7 @@ static void compact_node(int nid) INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - compact_zone(&cc); + compact_zone(&cc, NULL); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); @@ -2534,7 +2551,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) if (kthread_should_stop()) return; - status = compact_zone(&cc); + status = compact_zone(&cc, NULL); if (status == COMPACT_SUCCESS) { compaction_defer_reset(zone, cc.order, false); diff --git a/mm/internal.h b/mm/internal.h index 31bb0be6fd52..9eeaf2b95166 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -209,6 +209,15 @@ struct compact_control { bool rescan; /* Rescanning the same pageblock */ }; +/* + * Used in direct compaction when a page should be taken from the freelists + * immediately when one is created during the free path. + */ +struct capture_control { + struct compact_control *cc; + struct page *page; +}; + unsigned long isolate_freepages_range(struct compact_control *cc, unsigned long start_pfn, unsigned long end_pfn); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6607cb7131b0..d61174bb0333 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -786,6 +786,57 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; } +#ifdef CONFIG_COMPACTION +static inline struct capture_control *task_capc(struct zone *zone) +{ + struct capture_control *capc = current->capture_control; + + return capc && + !(current->flags & PF_KTHREAD) && + !capc->page && + capc->cc->zone == zone && + capc->cc->direct_compaction ? capc : NULL; +} + +static inline bool +compaction_capture(struct capture_control *capc, struct page *page, + int order, int migratetype) +{ + if (!capc || order != capc->cc->order) + return false; + + /* Do not accidentally pollute CMA or isolated regions*/ + if (is_migrate_cma(migratetype) || + is_migrate_isolate(migratetype)) + return false; + + /* + * Do not let lower order allocations polluate a movable pageblock. + * This might let an unmovable request use a reclaimable pageblock + * and vice-versa but no more than normal fallback logic which can + * have trouble finding a high-order free page. + */ + if (order < pageblock_order && migratetype == MIGRATE_MOVABLE) + return false; + + capc->page = page; + return true; +} + +#else +static inline struct capture_control *task_capc(struct zone *zone) +{ + return NULL; +} + +static inline bool +compaction_capture(struct capture_control *capc, struct page *page, + int order, int migratetype) +{ + return false; +} +#endif /* CONFIG_COMPACTION */ + /* * Freeing function for a buddy system allocator. * @@ -819,6 +870,7 @@ static inline void __free_one_page(struct page *page, unsigned long uninitialized_var(buddy_pfn); struct page *buddy; unsigned int max_order; + struct capture_control *capc = task_capc(zone); max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); @@ -834,6 +886,12 @@ static inline void __free_one_page(struct page *page, continue_merging: while (order < max_order - 1) { + if (compaction_capture(capc, page, order, migratetype)) { + if (likely(!is_migrate_isolate(migratetype))) + __mod_zone_freepage_state(zone, -(1 << order), + migratetype); + return; + } buddy_pfn = __find_buddy_pfn(pfn, order); buddy = page + (buddy_pfn - pfn); @@ -3819,7 +3877,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, enum compact_priority prio, enum compact_result *compact_result) { - struct page *page; + struct page *page = NULL; unsigned long pflags; unsigned int noreclaim_flag; @@ -3830,13 +3888,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, noreclaim_flag = memalloc_noreclaim_save(); *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, - prio); + prio, &page); memalloc_noreclaim_restore(noreclaim_flag); psi_memstall_leave(&pflags); - if (*compact_result <= COMPACT_INACTIVE) + if (*compact_result <= COMPACT_INACTIVE) { + WARN_ON_ONCE(page); return NULL; + } /* * At least in one zone compaction wasn't deferred or skipped, so let's @@ -3844,7 +3904,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, */ count_vm_event(COMPACTSTALL); - page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); + /* Prep a captured page if available */ + if (page) + prep_new_page(page, order, gfp_mask, alloc_flags); + + /* Try get a page from the freelist if available */ + if (!page) + page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); if (page) { struct zone *zone = page_zone(page);