From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757090Ab0KRNS4 (ORCPT ); Thu, 18 Nov 2010 08:18:56 -0500 Received: from gir.skynet.ie ([193.1.99.77]:39171 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755557Ab0KRNSy (ORCPT ); Thu, 18 Nov 2010 08:18:54 -0500 Date: Thu, 18 Nov 2010 13:18:39 +0000 From: Mel Gorman To: Andrea Arcangeli Cc: linux-mm@kvack.org, Linus Torvalds , Andrew Morton , linux-kernel@vger.kernel.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Hugh Dickins , Rik van Riel , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , "Michael S. Tsirkin" , Peter Zijlstra , Johannes Weiner , Daisuke Nishimura , Chris Mason , Borislav Petkov Subject: Re: [PATCH 28 of 66] _GFP_NO_KSWAPD Message-ID: <20101118131839.GR8135@csn.ul.ie> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 03, 2010 at 04:28:03PM +0100, Andrea Arcangeli wrote: > From: Andrea Arcangeli > > Transparent hugepage allocations must be allowed not to invoke kswapd or any > other kind of indirect reclaim (especially when the defrag sysfs is control > disabled). It's unacceptable to swap out anonymous pages (potentially > anonymous transparent hugepages) in order to create new transparent hugepages. > This is true for the MADV_HUGEPAGE areas too (swapping out a kvm virtual > machine and so having it suffer an unbearable slowdown, so another one with > guest physical memory marked MADV_HUGEPAGE can run 30% faster if it is running > memory intensive workloads, makes no sense). If a transparent hugepage > allocation fails the slowdown is minor and there is total fallback, so kswapd > should never be asked to swapout memory to allow the high order allocation to > succeed. > > Signed-off-by: Andrea Arcangeli > Acked-by: Rik van Riel > --- > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -81,13 +81,15 @@ struct vm_area_struct; > #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ > #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ > > +#define __GFP_NO_KSWAPD ((__force gfp_t)0x400000u) > + This is not an exact merge with what's currently in mm. Look at the top of gfp.h and see "Plain integer GFP bitmasks. Do not use this directly.". The 0x400000u definition needs to go there and this becomes #define __GFP_NO_KSWAPD ((__force_gfp_t)____0x400000u) What you have just generates sparse warnings (I believe) so it's harmless. > /* > * This may seem redundant, but it's a way of annotating false positives vs. > * allocations that simply cannot be supported (e.g. page tables). > */ > #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) > > -#define __GFP_BITS_SHIFT 22 /* Room for 22 __GFP_FOO bits */ > +#define __GFP_BITS_SHIFT 23 /* Room for 23 __GFP_FOO bits */ > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > /* This equals 0, but use constants in case they ever change */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1996,7 +1996,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, u > goto nopage; > > restart: > - wake_all_kswapd(order, zonelist, high_zoneidx); > + if (!(gfp_mask & __GFP_NO_KSWAPD)) > + wake_all_kswapd(order, zonelist, high_zoneidx); > Other than needing to define ____GFP_NO_KSWAPD Acked-by: Mel Gorman > /* > * OK, we're below the kswapd watermark and have kicked background > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 2BC866B0087 for ; Thu, 18 Nov 2010 08:18:58 -0500 (EST) Date: Thu, 18 Nov 2010 13:18:39 +0000 From: Mel Gorman Subject: Re: [PATCH 28 of 66] _GFP_NO_KSWAPD Message-ID: <20101118131839.GR8135@csn.ul.ie> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Andrea Arcangeli Cc: linux-mm@kvack.org, Linus Torvalds , Andrew Morton , linux-kernel@vger.kernel.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Hugh Dickins , Rik van Riel , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , "Michael S. Tsirkin" , Peter Zijlstra , Johannes Weiner , Daisuke Nishimura , Chris Mason , Borislav Petkov List-ID: On Wed, Nov 03, 2010 at 04:28:03PM +0100, Andrea Arcangeli wrote: > From: Andrea Arcangeli > > Transparent hugepage allocations must be allowed not to invoke kswapd or any > other kind of indirect reclaim (especially when the defrag sysfs is control > disabled). It's unacceptable to swap out anonymous pages (potentially > anonymous transparent hugepages) in order to create new transparent hugepages. > This is true for the MADV_HUGEPAGE areas too (swapping out a kvm virtual > machine and so having it suffer an unbearable slowdown, so another one with > guest physical memory marked MADV_HUGEPAGE can run 30% faster if it is running > memory intensive workloads, makes no sense). If a transparent hugepage > allocation fails the slowdown is minor and there is total fallback, so kswapd > should never be asked to swapout memory to allow the high order allocation to > succeed. > > Signed-off-by: Andrea Arcangeli > Acked-by: Rik van Riel > --- > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -81,13 +81,15 @@ struct vm_area_struct; > #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ > #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ > > +#define __GFP_NO_KSWAPD ((__force gfp_t)0x400000u) > + This is not an exact merge with what's currently in mm. Look at the top of gfp.h and see "Plain integer GFP bitmasks. Do not use this directly.". The 0x400000u definition needs to go there and this becomes #define __GFP_NO_KSWAPD ((__force_gfp_t)____0x400000u) What you have just generates sparse warnings (I believe) so it's harmless. > /* > * This may seem redundant, but it's a way of annotating false positives vs. > * allocations that simply cannot be supported (e.g. page tables). > */ > #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) > > -#define __GFP_BITS_SHIFT 22 /* Room for 22 __GFP_FOO bits */ > +#define __GFP_BITS_SHIFT 23 /* Room for 23 __GFP_FOO bits */ > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > /* This equals 0, but use constants in case they ever change */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1996,7 +1996,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, u > goto nopage; > > restart: > - wake_all_kswapd(order, zonelist, high_zoneidx); > + if (!(gfp_mask & __GFP_NO_KSWAPD)) > + wake_all_kswapd(order, zonelist, high_zoneidx); > Other than needing to define ____GFP_NO_KSWAPD Acked-by: Mel Gorman > /* > * OK, we're below the kswapd watermark and have kicked background > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org