From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965493AbeEXIo0 (ORCPT ); Thu, 24 May 2018 04:44:26 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:38312 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965042AbeEXIoU (ORCPT ); Thu, 24 May 2018 04:44:20 -0400 Subject: Re: [PATCH v2] mm/ksm: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm From: Suzuki K Poulose To: Andrew Morton , Jia He Cc: Andrea Arcangeli , Minchan Kim , Claudio Imbrenda , Arvind Yadav , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org, jia.he@hxt-semitech.com, Hugh Dickins References: <20180503124415.3f9d38aa@p-imbrenda.boeblingen.de.ibm.com> <1525403506-6750-1-git-send-email-hejianet@gmail.com> <20180509163101.02f23de1842a822c61fc68ff@linux-foundation.org> <2cd6b39b-1496-bbd5-9e31-5e3dcb31feda@arm.com> Message-ID: <6c417ab1-a808-72ea-9618-3d76ec203684@arm.com> Date: Thu, 24 May 2018 09:44:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <2cd6b39b-1496-bbd5-9e31-5e3dcb31feda@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/05/18 10:45, Suzuki K Poulose wrote: > On 10/05/18 00:31, Andrew Morton wrote: >> On Fri,  4 May 2018 11:11:46 +0800 Jia He wrote: >> >>> In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by PAGE_SIZE >>> unaligned for rmap_item->address under memory pressure tests(start 20 guests >>> and run memhog in the host). >>> >>> ... >>> >>> In rmap_walk_ksm, the rmap_item->address might still have the STABLE_FLAG, >>> then the start and end in handle_hva_to_gpa might not be PAGE_SIZE aligned. >>> Thus it will cause exceptions in handle_hva_to_gpa on arm64. >>> >>> This patch fixes it by ignoring(not removing) the low bits of address when >>> doing rmap_walk_ksm. >>> >>> Signed-off-by: jia.he@hxt-semitech.com >> >> I assumed you wanted this patch to be committed as >> From:jia.he@hxt-semitech.com rather than From:hejianet@gmail.com, so I >> made that change.  Please let me know if this was inappropriate. >> >> You can do this yourself by adding an explicit From: line to the very >> start of the patch's email text. >> >> Also, a storm of WARN_ONs is pretty poor behaviour.  Is that the only >> misbehaviour which this bug causes?  Do you think the fix should be >> backported into earlier kernels? >> Jia, Andrew, What is the status of this patch ? Suzuki > > I think its just not the WARN_ON(). We do more than what is probably > intended with an unaligned address. i.e, We could be modifying the > flags for other pages that were not affected. > > e.g : > > In the original report [0], the trace looked like : > > > [  800.511498] [] kvm_age_hva_handler+0xcc/0xd4 > [  800.517324] [] handle_hva_to_gpa+0xec/0x15c > [  800.523063] [] kvm_age_hva+0x5c/0xcc > [  800.528194] [] kvm_mmu_notifier_clear_flush_young+0x54/0x90 > [  800.535324] [] __mmu_notifier_clear_flush_young+0x6c/0xa8 > [  800.542279] [] page_referenced_one+0x1e0/0x1fc > [  800.548279] [] rmap_walk_ksm+0x124/0x1a0 > [  800.553759] [] rmap_walk+0x94/0x98 > [  800.558717] [] page_referenced+0x120/0x180 > [  800.564369] [] shrink_active_list+0x218/0x4a4 > [  800.570281] [] shrink_node_memcg+0x58c/0x6fc > [  800.576107] [] shrink_node+0xe4/0x328 > [  800.581325] [] do_try_to_free_pages+0xe4/0x3b8 > [  800.587324] [] try_to_free_pages+0x124/0x234 > [  800.593150] [] __alloc_pages_nodemask+0x564/0xf7c > [  800.599412] [] khugepaged_alloc_page+0x38/0xb8 > [  800.605411] [] collapse_huge_page+0x74/0xd70 > [  800.611238] [] khugepaged_scan_mm_slot+0x654/0xa98 > [  800.617585] [] khugepaged+0x2bc/0x49c > [  800.622803] [] kthread+0x124/0x150 > [  800.627762] [] ret_from_fork+0x10/0x1c > [  800.633066] ---[ end trace 944c130b5252fb01 ]--- > > Now, the ksm wants to mark *a page* as referenced via page_referenced_one(), > passing it an unaligned address. This could eventually turn out to be > one of : > > ptep_clear_flush_young_notify(address, address + PAGE_SIZE) > > or > > pmdp_clear_flush_young_notify(address, address + PMD_SIZE) > > which now spans two pages/pmds and the notifier consumer might > take an action on the second page as well, which is not something > intended. So, I do think that old behavior is wrong and has other > side effects as mentioned above. > > [0] https://lkml.kernel.org/r/1525244911-5519-1-git-send-email-hejianet@gmail.com > > Suzuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f71.google.com (mail-oi0-f71.google.com [209.85.218.71]) by kanga.kvack.org (Postfix) with ESMTP id 3DB126B0007 for ; Thu, 24 May 2018 04:44:23 -0400 (EDT) Received: by mail-oi0-f71.google.com with SMTP id k13-v6so519815oiw.3 for ; Thu, 24 May 2018 01:44:23 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id t22-v6si7469085oth.323.2018.05.24.01.44.20 for ; Thu, 24 May 2018 01:44:20 -0700 (PDT) Subject: Re: [PATCH v2] mm/ksm: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm From: Suzuki K Poulose References: <20180503124415.3f9d38aa@p-imbrenda.boeblingen.de.ibm.com> <1525403506-6750-1-git-send-email-hejianet@gmail.com> <20180509163101.02f23de1842a822c61fc68ff@linux-foundation.org> <2cd6b39b-1496-bbd5-9e31-5e3dcb31feda@arm.com> Message-ID: <6c417ab1-a808-72ea-9618-3d76ec203684@arm.com> Date: Thu, 24 May 2018 09:44:16 +0100 MIME-Version: 1.0 In-Reply-To: <2cd6b39b-1496-bbd5-9e31-5e3dcb31feda@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , Jia He Cc: Andrea Arcangeli , Minchan Kim , Claudio Imbrenda , Arvind Yadav , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org, jia.he@hxt-semitech.com, Hugh Dickins On 14/05/18 10:45, Suzuki K Poulose wrote: > On 10/05/18 00:31, Andrew Morton wrote: >> On Fri,A 4 May 2018 11:11:46 +0800 Jia He wrote: >> >>> In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by PAGE_SIZE >>> unaligned for rmap_item->address under memory pressure tests(start 20 guests >>> and run memhog in the host). >>> >>> ... >>> >>> In rmap_walk_ksm, the rmap_item->address might still have the STABLE_FLAG, >>> then the start and end in handle_hva_to_gpa might not be PAGE_SIZE aligned. >>> Thus it will cause exceptions in handle_hva_to_gpa on arm64. >>> >>> This patch fixes it by ignoring(not removing) the low bits of address when >>> doing rmap_walk_ksm. >>> >>> Signed-off-by: jia.he@hxt-semitech.com >> >> I assumed you wanted this patch to be committed as >> From:jia.he@hxt-semitech.com rather than From:hejianet@gmail.com, so I >> made that change.A Please let me know if this was inappropriate. >> >> You can do this yourself by adding an explicit From: line to the very >> start of the patch's email text. >> >> Also, a storm of WARN_ONs is pretty poor behaviour.A Is that the only >> misbehaviour which this bug causes?A Do you think the fix should be >> backported into earlier kernels? >> Jia, Andrew, What is the status of this patch ? Suzuki > > I think its just not the WARN_ON(). We do more than what is probably > intended with an unaligned address. i.e, We could be modifying the > flags for other pages that were not affected. > > e.g : > > In the original report [0], the trace looked like : > > > [A 800.511498] [] kvm_age_hva_handler+0xcc/0xd4 > [A 800.517324] [] handle_hva_to_gpa+0xec/0x15c > [A 800.523063] [] kvm_age_hva+0x5c/0xcc > [A 800.528194] [] kvm_mmu_notifier_clear_flush_young+0x54/0x90 > [A 800.535324] [] __mmu_notifier_clear_flush_young+0x6c/0xa8 > [A 800.542279] [] page_referenced_one+0x1e0/0x1fc > [A 800.548279] [] rmap_walk_ksm+0x124/0x1a0 > [A 800.553759] [] rmap_walk+0x94/0x98 > [A 800.558717] [] page_referenced+0x120/0x180 > [A 800.564369] [] shrink_active_list+0x218/0x4a4 > [A 800.570281] [] shrink_node_memcg+0x58c/0x6fc > [A 800.576107] [] shrink_node+0xe4/0x328 > [A 800.581325] [] do_try_to_free_pages+0xe4/0x3b8 > [A 800.587324] [] try_to_free_pages+0x124/0x234 > [A 800.593150] [] __alloc_pages_nodemask+0x564/0xf7c > [A 800.599412] [] khugepaged_alloc_page+0x38/0xb8 > [A 800.605411] [] collapse_huge_page+0x74/0xd70 > [A 800.611238] [] khugepaged_scan_mm_slot+0x654/0xa98 > [A 800.617585] [] khugepaged+0x2bc/0x49c > [A 800.622803] [] kthread+0x124/0x150 > [A 800.627762] [] ret_from_fork+0x10/0x1c > [A 800.633066] ---[ end trace 944c130b5252fb01 ]--- > > Now, the ksm wants to mark *a page* as referenced via page_referenced_one(), > passing it an unaligned address. This could eventually turn out to be > one of : > > ptep_clear_flush_young_notify(address, address + PAGE_SIZE) > > or > > pmdp_clear_flush_young_notify(address, address + PMD_SIZE) > > which now spans two pages/pmds and the notifier consumer might > take an action on the second page as well, which is not something > intended. So, I do think that old behavior is wrong and has other > side effects as mentioned above. > > [0] https://lkml.kernel.org/r/1525244911-5519-1-git-send-email-hejianet@gmail.com > > Suzuki