From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757110AbcC2ONm (ORCPT ); Tue, 29 Mar 2016 10:13:42 -0400 Received: from mail-wm0-f49.google.com ([74.125.82.49]:34562 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755953AbcC2ONi convert rfc822-to-8bit (ORCPT ); Tue, 29 Mar 2016 10:13:38 -0400 From: Michal Nazarewicz To: Vlastimil Babka , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Leizhen , Sasha Levin , qiuxishi , Catalin Marinas , Will Deacon , Arnd Bergmann , dingtinahong , chenjie6@huawei.com, Lucas Stach , Vlastimil Babka , "Aneesh Kumar K.V" , Hanjun Guo , Johannes Weiner , Joonsoo Kim , "Kirill A. Shutemov" , Laura Abbott , Mel Gorman , Minchan Kim , Naoya Horiguchi , stable@vger.kernel.org, Yasuaki Ishimatsu , Zhang Yanfei Subject: Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks In-Reply-To: <1458726023-27005-1-git-send-email-vbabka@suse.cz> Organization: http://mina86.com/ References: <1458726023-27005-1-git-send-email-vbabka@suse.cz> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.1 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix,O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:160329:minchan@kernel.org::409bkqVkeE6p6fUK:00000000000000000000000000000000000000000000V3Y X-Hashcash: 1:20:160329:kirill@shutemov.name::zF1P4f88B9e0efa7:00000000000000000000000000000000000000000025m X-Hashcash: 1:20:160329:vbabka@suse.cz::XBYtqTx0Ovfl2QBD:0000GRo X-Hashcash: 1:20:160329:catalin.marinas@arm.com::TykYTcrbreSPvdTk:000000000000000000000000000000000000000s5a X-Hashcash: 1:20:160329:will.deacon@arm.com::7X3CgZFN37BMok1L:0000000000000000000000000000000000000000000vbR X-Hashcash: 1:20:160329:n-horiguchi@ah.jp.nec.com::La2saJHCfDTCVy19:0000000000000000000000000000000000000KwZ X-Hashcash: 1:20:160329:arnd@arndb.de::c+FyMFHMtoOZQm37:00001BZl X-Hashcash: 1:20:160329:sasha.levin@oracle.com::jmkCYmVpeBUsMjd0:0000000000000000000000000000000000000000yxq X-Hashcash: 1:20:160329:linux-kernel@vger.kernel.org::XqZt1CRchX9Sjf3g:0000000000000000000000000000000001gqS X-Hashcash: 1:20:160329:linux-mm@kvack.org::Q45YCHkV8XlsmC1h:00000000000000000000000000000000000000000001SIu X-Hashcash: 1:20:160329:vbabka@suse.cz::8Zl3mMrp8RBhyLdp:0002CsT X-Hashcash: 1:20:160329:chenjie6@huawei.com::p5e/gmW04/Np8nvK:0000000000000000000000000000000000000000002KvD X-Hashcash: 1:20:160329:mgorman@techsingularity.net::gDKpUcDvPNKYPUqb:00000000000000000000000000000000001xWV X-Hashcash: 1:20:160329:zhangyanfei@cn.fujitsu.com::p5OMGqXYkacFNFeg:000000000000000000000000000000000001hBs X-Hashcash: 1:20:160329:stable@vger.kernel.org::q3PLDZTkxIOKFnKy:0000000000000000000000000000000000000002gI6 X-Hashcash: 1:20:160329:hannes@cmpxchg.org::tnYgmtY25LXWx5U9:00000000000000000000000000000000000000000003ueA X-Hashcash: 1:20:160329:l.stach@pengutronix.de::ioj17yap+MS4951U:000000000000000000000000000000000000000416g X-Hashcash: 1:20:160329:isimatu.yasuaki@jp.fujitsu.com::sRA0z2o35JcingUj:00000000000000000000000000000003jKe X-Hashcash: 1:20:160329:thunder.leizhen@huawei.com::jQN2uKVLJI1h6Ed1:000000000000000000000000000000000004vlk X-Hashcash: 1:20:160329:aneesh.kumar@linux.vnet.ibm.com::heEp891dj1DmnVLF:0000000000000000000000000000004Uea X-Hashcash: 1:20:160329:iamjoonsoo.kim@lge.com::SJzf0GJL39YDdj/b:0000000000000000000000000000000000000005kSk X-Hashcash: 1:20:160329:qiuxishi@huawei.com::qOdBsIraY5T0RoxE:0000000000000000000000000000000000000000007Kke X-Hashcash: 1:20:160329:dingtianhong@huawei.com::FECkHCQ460lrTz5L:000000000000000000000000000000000000008I4F X-Hashcash: 1:20:160329:akpm@linux-foundation.org::1P5x7SQhD7fz/+tP:000000000000000000000000000000000000ApjQ X-Hashcash: 1:20:160329:guohanjun@huawei.com::YAaVKnhI/m/Z7i93:00000000000000000000000000000000000000000DmKn X-Hashcash: 1:20:160329:labbott@redhat.com::Sc2QKuFevO07Ls45:0000000000000000000000000000000000000000000EqQK Date: Tue, 29 Mar 2016 16:13:33 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 23 2016, Vlastimil Babka wrote: > Hanjun Guo has reported that a CMA stress test causes broken accounting of > CMA and free pages: > >> Before the test, I got: >> -bash-4.3# cat /proc/meminfo | grep Cma >> CmaTotal: 204800 kB >> CmaFree: 195044 kB >> >> >> After running the test: >> -bash-4.3# cat /proc/meminfo | grep Cma >> CmaTotal: 204800 kB >> CmaFree: 6602584 kB >> >> So the freed CMA memory is more than total.. >> >> Also the the MemFree is more than mem total: >> >> -bash-4.3# cat /proc/meminfo >> MemTotal: 16342016 kB >> MemFree: 22367268 kB >> MemAvailable: 22370528 kB > > Laura Abbott has confirmed the issue and suspected the freepage accounting > rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is > caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA > pageblocks: > >> CMA isolates MAX_ORDER aligned blocks, but, during the process, >> partialy isolated block exists. If MAX_ORDER is 11 and >> pageblock_order is 9, two pageblocks make up MAX_ORDER >> aligned block and I can think following scenario because pageblock >> (un)isolation would be done one by one. >> >> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, >> MIGRATE_ISOLATE, respectively. >> >> CC -> IC -> II (Isolation) >> II -> CI -> CC (Un-isolation) >> >> If some pages are freed at this intermediate state such as IC or CI, >> that page could be merged to the other page that is resident on >> different type of pageblock and it will cause wrong freepage count. > > This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but > since it doesn't hold the zone->lock between pageblocks, a race window does > exist. > > It's also likely that unexpected merging can occur between MIGRATE_ISOLATE > and non-CMA pageblocks. This should be prevented in __free_one_page() since > commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated > pageblock"). However, we only check the migratetype of the pageblock where > buddy merging has been initiated, not the migratetype of the buddy pageblock > (or group of pageblocks) which can be MIGRATE_ISOLATE. > > Joonsoo has suggested checking for buddy migratetype as part of > page_is_buddy(), but that would add extra checks in allocator hotpath and > bloat-o-meter has shown significant code bloat (the function is inline). > > This patch reduces the bloat at some expense of more complicated code. The > buddy-merging while-loop in __free_one_page() is initially bounded to > pageblock_border and without any migratetype checks. The checks are placed > outside, bumping the max_order if merging is allowed, and returning to the > while-loop with a statement which can't be possibly considered harmful. > > This fixes the accounting bug and also removes the arguably weird state in the > original commit 3c605096d315 where buddies could be left unmerged. > > Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") > Link: https://lkml.org/lkml/2016/3/2/280 > Reported-by: Hanjun Guo > Debugged-by: Laura Abbott > Debugged-by: Joonsoo Kim > Cc: Mel Gorman > Cc: "Kirill A. Shutemov" > Cc: Johannes Weiner > Cc: Minchan Kim > Cc: Yasuaki Ishimatsu > Cc: Zhang Yanfei > Cc: Michal Nazarewicz Acked-by: Michal Nazarewicz > Cc: Naoya Horiguchi > Cc: "Aneesh Kumar K.V" > Cc: # v3.18+ > Signed-off-by: Vlastimil Babka I wonder if with this change, ret = start_isolate_page_range(pfn_max_align_down(start), pfn_max_align_up(end), migratetype, false); in alloc_contig_range could be loosen up to align to pageblocks instead of having to use max(pageblock, max_page). It feels that it should be possible, but on the other hand, I’m not certain how buddy allocator will behave if a max_order page spans MIGRATE_CMA and MIGRATE_ISOLATE pageblocks. I guess start_isolate_page_range would have to split such pages? > --- > mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 33 insertions(+), 13 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c46b75d14b6f..b9785af4fae2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *page, > unsigned long combined_idx; > unsigned long uninitialized_var(buddy_idx); > struct page *buddy; > - unsigned int max_order = MAX_ORDER; > + unsigned int max_order; > + > + max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); > > VM_BUG_ON(!zone_is_initialized(zone)); > VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); > > VM_BUG_ON(migratetype == -1); > - if (is_migrate_isolate(migratetype)) { > - /* > - * We restrict max order of merging to prevent merge > - * between freepages on isolate pageblock and normal > - * pageblock. Without this, pageblock isolation > - * could cause incorrect freepage accounting. > - */ > - max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); > - } else { > + if (likely(!is_migrate_isolate(migratetype))) > __mod_zone_freepage_state(zone, 1 << order, migratetype); > - } > > - page_idx = pfn & ((1 << max_order) - 1); > + page_idx = pfn & ((1 << MAX_ORDER) - 1); > > VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page); > VM_BUG_ON_PAGE(bad_range(zone, page), page); > > +continue_merging: > while (order < max_order - 1) { > buddy_idx = __find_buddy_index(page_idx, order); > buddy = page + (buddy_idx - page_idx); > if (!page_is_buddy(page, buddy, order)) > - break; > + goto done_merging; > /* > * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, > * merge with it and move up one order. > @@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page, > page_idx = combined_idx; > order++; > } > + if (max_order < MAX_ORDER) { > + /* If we are here, it means order is >= pageblock_order. > + * We want to prevent merge between freepages on isolate > + * pageblock and normal pageblock. Without this, pageblock > + * isolation could cause incorrect freepage or CMA accounting. > + * > + * We don't want to hit this code for the more frequent > + * low-order merging. > + */ > + if (unlikely(has_isolate_pageblock(zone))) { > + int buddy_mt; > + > + buddy_idx = __find_buddy_index(page_idx, order); > + buddy = page + (buddy_idx - page_idx); > + buddy_mt = get_pageblock_migratetype(buddy); > + > + if (migratetype != buddy_mt > + && (is_migrate_isolate(migratetype) || > + is_migrate_isolate(buddy_mt))) > + goto done_merging; > + } > + max_order++; > + goto continue_merging; > + } > + > +done_merging: > set_page_order(page, order); > > /* > -- > 2.7.3 > -- Best regards ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ «If at first you don’t succeed, give up skydiving» From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Michal Nazarewicz To: Vlastimil Babka , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Leizhen , Sasha Levin , qiuxishi , Catalin Marinas , Will Deacon , Arnd Bergmann , dingtinahong , chenjie6@huawei.com, Lucas Stach , Vlastimil Babka , "Aneesh Kumar K.V" , Hanjun Guo , Johannes Weiner , Joonsoo Kim , "Kirill A. Shutemov" , Laura Abbott , Mel Gorman , Minchan Kim , Naoya Horiguchi , stable@vger.kernel.org, Yasuaki Ishimatsu , Zhang Yanfei Subject: Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks In-Reply-To: <1458726023-27005-1-git-send-email-vbabka@suse.cz> References: <1458726023-27005-1-git-send-email-vbabka@suse.cz> Date: Tue, 29 Mar 2016 16:13:33 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: On Wed, Mar 23 2016, Vlastimil Babka wrote: > Hanjun Guo has reported that a CMA stress test causes broken accounting of > CMA and free pages: > >> Before the test, I got: >> -bash-4.3# cat /proc/meminfo | grep Cma >> CmaTotal: 204800 kB >> CmaFree: 195044 kB >> >> >> After running the test: >> -bash-4.3# cat /proc/meminfo | grep Cma >> CmaTotal: 204800 kB >> CmaFree: 6602584 kB >> >> So the freed CMA memory is more than total.. >> >> Also the the MemFree is more than mem total: >> >> -bash-4.3# cat /proc/meminfo >> MemTotal: 16342016 kB >> MemFree: 22367268 kB >> MemAvailable: 22370528 kB > > Laura Abbott has confirmed the issue and suspected the freepage accounting > rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is > caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA > pageblocks: > >> CMA isolates MAX_ORDER aligned blocks, but, during the process, >> partialy isolated block exists. If MAX_ORDER is 11 and >> pageblock_order is 9, two pageblocks make up MAX_ORDER >> aligned block and I can think following scenario because pageblock >> (un)isolation would be done one by one. >> >> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, >> MIGRATE_ISOLATE, respectively. >> >> CC -> IC -> II (Isolation) >> II -> CI -> CC (Un-isolation) >> >> If some pages are freed at this intermediate state such as IC or CI, >> that page could be merged to the other page that is resident on >> different type of pageblock and it will cause wrong freepage count. > > This was supposed to be prevented by CMA operating on MAX_ORDER blocks, b= ut > since it doesn't hold the zone->lock between pageblocks, a race window do= es > exist. > > It's also likely that unexpected merging can occur between MIGRATE_ISOLATE > and non-CMA pageblocks. This should be prevented in __free_one_page() sin= ce > commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on iso= lated > pageblock"). However, we only check the migratetype of the pageblock where > buddy merging has been initiated, not the migratetype of the buddy pagebl= ock > (or group of pageblocks) which can be MIGRATE_ISOLATE. > > Joonsoo has suggested checking for buddy migratetype as part of > page_is_buddy(), but that would add extra checks in allocator hotpath and > bloat-o-meter has shown significant code bloat (the function is inline). > > This patch reduces the bloat at some expense of more complicated code. The > buddy-merging while-loop in __free_one_page() is initially bounded to > pageblock_border and without any migratetype checks. The checks are placed > outside, bumping the max_order if merging is allowed, and returning to the > while-loop with a statement which can't be possibly considered harmful. > > This fixes the accounting bug and also removes the arguably weird state i= n the > original commit 3c605096d315 where buddies could be left unmerged. > > Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on iso= lated pageblock") > Link: https://lkml.org/lkml/2016/3/2/280 > Reported-by: Hanjun Guo > Debugged-by: Laura Abbott > Debugged-by: Joonsoo Kim > Cc: Mel Gorman > Cc: "Kirill A. Shutemov" > Cc: Johannes Weiner > Cc: Minchan Kim > Cc: Yasuaki Ishimatsu > Cc: Zhang Yanfei > Cc: Michal Nazarewicz Acked-by: Michal Nazarewicz > Cc: Naoya Horiguchi > Cc: "Aneesh Kumar K.V" > Cc: # v3.18+ > Signed-off-by: Vlastimil Babka I wonder if with this change, ret =3D start_isolate_page_range(pfn_max_align_down(start), pfn_max_align_up(end), migratetype, false); in alloc_contig_range could be loosen up to align to pageblocks instead of having to use max(pageblock, max_page). It feels that it should be possible, but on the other hand, I=E2=80=99m not certain how buddy allocator will behave if a max_order page spans MIGRATE_CMA and MIGRATE_ISOLATE pageblocks. I guess start_isolate_page_range would have to split such pages? > --- > mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 33 insertions(+), 13 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c46b75d14b6f..b9785af4fae2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *pag= e, > unsigned long combined_idx; > unsigned long uninitialized_var(buddy_idx); > struct page *buddy; > - unsigned int max_order =3D MAX_ORDER; > + unsigned int max_order; > + > + max_order =3D min_t(unsigned int, MAX_ORDER, pageblock_order + 1); >=20=20 > VM_BUG_ON(!zone_is_initialized(zone)); > VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); >=20=20 > VM_BUG_ON(migratetype =3D=3D -1); > - if (is_migrate_isolate(migratetype)) { > - /* > - * We restrict max order of merging to prevent merge > - * between freepages on isolate pageblock and normal > - * pageblock. Without this, pageblock isolation > - * could cause incorrect freepage accounting. > - */ > - max_order =3D min_t(unsigned int, MAX_ORDER, pageblock_order + 1); > - } else { > + if (likely(!is_migrate_isolate(migratetype))) > __mod_zone_freepage_state(zone, 1 << order, migratetype); > - } >=20=20 > - page_idx =3D pfn & ((1 << max_order) - 1); > + page_idx =3D pfn & ((1 << MAX_ORDER) - 1); >=20=20 > VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page); > VM_BUG_ON_PAGE(bad_range(zone, page), page); >=20=20 > +continue_merging: > while (order < max_order - 1) { > buddy_idx =3D __find_buddy_index(page_idx, order); > buddy =3D page + (buddy_idx - page_idx); > if (!page_is_buddy(page, buddy, order)) > - break; > + goto done_merging; > /* > * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, > * merge with it and move up one order. > @@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page, > page_idx =3D combined_idx; > order++; > } > + if (max_order < MAX_ORDER) { > + /* If we are here, it means order is >=3D pageblock_order. > + * We want to prevent merge between freepages on isolate > + * pageblock and normal pageblock. Without this, pageblock > + * isolation could cause incorrect freepage or CMA accounting. > + * > + * We don't want to hit this code for the more frequent > + * low-order merging. > + */ > + if (unlikely(has_isolate_pageblock(zone))) { > + int buddy_mt; > + > + buddy_idx =3D __find_buddy_index(page_idx, order); > + buddy =3D page + (buddy_idx - page_idx); > + buddy_mt =3D get_pageblock_migratetype(buddy); > + > + if (migratetype !=3D buddy_mt > + && (is_migrate_isolate(migratetype) || > + is_migrate_isolate(buddy_mt))) > + goto done_merging; > + } > + max_order++; > + goto continue_merging; > + } > + > +done_merging: > set_page_order(page, order); >=20=20 > /* > --=20 > 2.7.3 > --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by kanga.kvack.org (Postfix) with ESMTP id 6DC826B007E for ; Tue, 29 Mar 2016 10:13:38 -0400 (EDT) Received: by mail-wm0-f51.google.com with SMTP id 127so60408133wmu.1 for ; Tue, 29 Mar 2016 07:13:38 -0700 (PDT) Received: from mail-wm0-x235.google.com (mail-wm0-x235.google.com. [2a00:1450:400c:c09::235]) by mx.google.com with ESMTPS id ew3si34490324wjd.140.2016.03.29.07.13.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Mar 2016 07:13:37 -0700 (PDT) Received: by mail-wm0-x235.google.com with SMTP id 20so28261910wmh.1 for ; Tue, 29 Mar 2016 07:13:37 -0700 (PDT) From: Michal Nazarewicz Subject: Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks In-Reply-To: <1458726023-27005-1-git-send-email-vbabka@suse.cz> References: <1458726023-27005-1-git-send-email-vbabka@suse.cz> Date: Tue, 29 Mar 2016 16:13:33 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Leizhen , Sasha Levin , qiuxishi , Catalin Marinas , Will Deacon , Arnd Bergmann , dingtinahong , chenjie6@huawei.com, Lucas Stach , "Aneesh Kumar K.V" , Hanjun Guo , Johannes Weiner , Joonsoo Kim , "Kirill A. Shutemov" , Laura Abbott , Mel Gorman , Minchan Kim , Naoya Horiguchi , stable@vger.kernel.org, Yasuaki Ishimatsu , Zhang Yanfei On Wed, Mar 23 2016, Vlastimil Babka wrote: > Hanjun Guo has reported that a CMA stress test causes broken accounting of > CMA and free pages: > >> Before the test, I got: >> -bash-4.3# cat /proc/meminfo | grep Cma >> CmaTotal: 204800 kB >> CmaFree: 195044 kB >> >> >> After running the test: >> -bash-4.3# cat /proc/meminfo | grep Cma >> CmaTotal: 204800 kB >> CmaFree: 6602584 kB >> >> So the freed CMA memory is more than total.. >> >> Also the the MemFree is more than mem total: >> >> -bash-4.3# cat /proc/meminfo >> MemTotal: 16342016 kB >> MemFree: 22367268 kB >> MemAvailable: 22370528 kB > > Laura Abbott has confirmed the issue and suspected the freepage accounting > rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is > caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA > pageblocks: > >> CMA isolates MAX_ORDER aligned blocks, but, during the process, >> partialy isolated block exists. If MAX_ORDER is 11 and >> pageblock_order is 9, two pageblocks make up MAX_ORDER >> aligned block and I can think following scenario because pageblock >> (un)isolation would be done one by one. >> >> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, >> MIGRATE_ISOLATE, respectively. >> >> CC -> IC -> II (Isolation) >> II -> CI -> CC (Un-isolation) >> >> If some pages are freed at this intermediate state such as IC or CI, >> that page could be merged to the other page that is resident on >> different type of pageblock and it will cause wrong freepage count. > > This was supposed to be prevented by CMA operating on MAX_ORDER blocks, b= ut > since it doesn't hold the zone->lock between pageblocks, a race window do= es > exist. > > It's also likely that unexpected merging can occur between MIGRATE_ISOLATE > and non-CMA pageblocks. This should be prevented in __free_one_page() sin= ce > commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on iso= lated > pageblock"). However, we only check the migratetype of the pageblock where > buddy merging has been initiated, not the migratetype of the buddy pagebl= ock > (or group of pageblocks) which can be MIGRATE_ISOLATE. > > Joonsoo has suggested checking for buddy migratetype as part of > page_is_buddy(), but that would add extra checks in allocator hotpath and > bloat-o-meter has shown significant code bloat (the function is inline). > > This patch reduces the bloat at some expense of more complicated code. The > buddy-merging while-loop in __free_one_page() is initially bounded to > pageblock_border and without any migratetype checks. The checks are placed > outside, bumping the max_order if merging is allowed, and returning to the > while-loop with a statement which can't be possibly considered harmful. > > This fixes the accounting bug and also removes the arguably weird state i= n the > original commit 3c605096d315 where buddies could be left unmerged. > > Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on iso= lated pageblock") > Link: https://lkml.org/lkml/2016/3/2/280 > Reported-by: Hanjun Guo > Debugged-by: Laura Abbott > Debugged-by: Joonsoo Kim > Cc: Mel Gorman > Cc: "Kirill A. Shutemov" > Cc: Johannes Weiner > Cc: Minchan Kim > Cc: Yasuaki Ishimatsu > Cc: Zhang Yanfei > Cc: Michal Nazarewicz Acked-by: Michal Nazarewicz > Cc: Naoya Horiguchi > Cc: "Aneesh Kumar K.V" > Cc: # v3.18+ > Signed-off-by: Vlastimil Babka I wonder if with this change, ret =3D start_isolate_page_range(pfn_max_align_down(start), pfn_max_align_up(end), migratetype, false); in alloc_contig_range could be loosen up to align to pageblocks instead of having to use max(pageblock, max_page). It feels that it should be possible, but on the other hand, I=E2=80=99m not certain how buddy allocator will behave if a max_order page spans MIGRATE_CMA and MIGRATE_ISOLATE pageblocks. I guess start_isolate_page_range would have to split such pages? > --- > mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 33 insertions(+), 13 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c46b75d14b6f..b9785af4fae2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *pag= e, > unsigned long combined_idx; > unsigned long uninitialized_var(buddy_idx); > struct page *buddy; > - unsigned int max_order =3D MAX_ORDER; > + unsigned int max_order; > + > + max_order =3D min_t(unsigned int, MAX_ORDER, pageblock_order + 1); >=20=20 > VM_BUG_ON(!zone_is_initialized(zone)); > VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); >=20=20 > VM_BUG_ON(migratetype =3D=3D -1); > - if (is_migrate_isolate(migratetype)) { > - /* > - * We restrict max order of merging to prevent merge > - * between freepages on isolate pageblock and normal > - * pageblock. Without this, pageblock isolation > - * could cause incorrect freepage accounting. > - */ > - max_order =3D min_t(unsigned int, MAX_ORDER, pageblock_order + 1); > - } else { > + if (likely(!is_migrate_isolate(migratetype))) > __mod_zone_freepage_state(zone, 1 << order, migratetype); > - } >=20=20 > - page_idx =3D pfn & ((1 << max_order) - 1); > + page_idx =3D pfn & ((1 << MAX_ORDER) - 1); >=20=20 > VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page); > VM_BUG_ON_PAGE(bad_range(zone, page), page); >=20=20 > +continue_merging: > while (order < max_order - 1) { > buddy_idx =3D __find_buddy_index(page_idx, order); > buddy =3D page + (buddy_idx - page_idx); > if (!page_is_buddy(page, buddy, order)) > - break; > + goto done_merging; > /* > * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, > * merge with it and move up one order. > @@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page, > page_idx =3D combined_idx; > order++; > } > + if (max_order < MAX_ORDER) { > + /* If we are here, it means order is >=3D pageblock_order. > + * We want to prevent merge between freepages on isolate > + * pageblock and normal pageblock. Without this, pageblock > + * isolation could cause incorrect freepage or CMA accounting. > + * > + * We don't want to hit this code for the more frequent > + * low-order merging. > + */ > + if (unlikely(has_isolate_pageblock(zone))) { > + int buddy_mt; > + > + buddy_idx =3D __find_buddy_index(page_idx, order); > + buddy =3D page + (buddy_idx - page_idx); > + buddy_mt =3D get_pageblock_migratetype(buddy); > + > + if (migratetype !=3D buddy_mt > + && (is_migrate_isolate(migratetype) || > + is_migrate_isolate(buddy_mt))) > + goto done_merging; > + } > + max_order++; > + goto continue_merging; > + } > + > +done_merging: > set_page_order(page, order); >=20=20 > /* > --=20 > 2.7.3 > --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org