From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA58AC10F0B for ; Thu, 18 Apr 2019 05:28:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A8A2821479 for ; Thu, 18 Apr 2019 05:28:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387940AbfDRF2h (ORCPT ); Thu, 18 Apr 2019 01:28:37 -0400 Received: from foss.arm.com ([217.140.101.70]:55336 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725836AbfDRF2h (ORCPT ); Thu, 18 Apr 2019 01:28:37 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 54CD980D; Wed, 17 Apr 2019 22:28:36 -0700 (PDT) Received: from [192.168.0.129] (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E6E9D3F68F; Wed, 17 Apr 2019 22:28:30 -0700 (PDT) From: Anshuman Khandual Subject: Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove To: Mark Rutland Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, akpm@linux-foundation.org, will.deacon@arm.com, catalin.marinas@arm.com, mhocko@suse.com, mgorman@techsingularity.net, james.morse@arm.com, robin.murphy@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, osalvador@suse.de, david@redhat.com, cai@lca.pw, logang@deltatee.com, ira.weiny@intel.com References: <1555221553-18845-1-git-send-email-anshuman.khandual@arm.com> <1555221553-18845-3-git-send-email-anshuman.khandual@arm.com> <20190415134841.GC13990@lakrids.cambridge.arm.com> <2faba38b-ab79-2dda-1b3c-ada5054d91fa@arm.com> <20190417142154.GA393@lakrids.cambridge.arm.com> <20190417173948.GB15589@lakrids.cambridge.arm.com> Message-ID: <1bdae67b-fcd6-7868-8a92-c8a306c04ec6@arm.com> Date: Thu, 18 Apr 2019 10:58:29 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190417173948.GB15589@lakrids.cambridge.arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/17/2019 11:09 PM, Mark Rutland wrote: > On Wed, Apr 17, 2019 at 10:15:35PM +0530, Anshuman Khandual wrote: >> On 04/17/2019 07:51 PM, Mark Rutland wrote: >>> On Wed, Apr 17, 2019 at 03:28:18PM +0530, Anshuman Khandual wrote: >>>> On 04/15/2019 07:18 PM, Mark Rutland wrote: >>>>> On Sun, Apr 14, 2019 at 11:29:13AM +0530, Anshuman Khandual wrote: > >>>>>> + spin_unlock(&init_mm.page_table_lock); >>>>> >>>>> What precisely is the page_table_lock intended to protect? >>>> >>>> Concurrent modification to kernel page table (init_mm) while clearing entries. >>> >>> Concurrent modification by what code? >>> >>> If something else can *modify* the portion of the table that we're >>> manipulating, then I don't see how we can safely walk the table up to >>> this point without holding the lock, nor how we can safely add memory. >>> >>> Even if this is to protect something else which *reads* the tables, >>> other code in arm64 which modifies the kernel page tables doesn't take >>> the lock. >>> >>> Usually, if you can do a lockless walk you have to verify that things >>> didn't change once you've taken the lock, but we don't follow that >>> pattern here. >>> >>> As things stand it's not clear to me whether this is necessary or >>> sufficient. >> >> Hence lets take more conservative approach and wrap the entire process of >> remove_pagetable() under init_mm.page_table_lock which looks safe unless >> in the worst case when free_pages() gets stuck for some reason in which >> case we have bigger memory problem to deal with than a soft lock up. > > Sorry, but I'm not happy with _any_ solution until we understand where > and why we need to take the init_mm ptl, and have made some effort to > ensure that the kernel correctly does so elsewhere. It is not sufficient > to consider this code in isolation. We will have to take the kernel page table lock to prevent assumption regarding present or future possible kernel VA space layout. Wrapping around the entire remove_pagetable() will be at coarse granularity but I dont see why it should not sufficient atleast from this particular tear down operation regardless of how this might affect other kernel pgtable walkers. IIUC your concern is regarding other parts of kernel code (arm64/generic) which assume that kernel page table wont be changing and hence they normally walk the table without holding pgtable lock. Hence those current pgtabe walker will be affected after this change. > > IIUC, before this patch we never clear non-leaf entries in the kernel > page tables, so readers don't presently need to take the ptl in order to > safely walk down to a leaf entry. Got it. Will look into this. > > For example, the arm64 ptdump code never takes the ptl, and as of this > patch it will blow up if it races with a hot-remove, regardless of > whether the hot-remove code itself holds the ptl. Got it. Are there there more such examples where this can be problematic. I will be happy to investigate all such places and change/add locking scheme in there to make them work with memory hot remove. > > Note that the same applies to the x86 ptdump code; we cannot assume that > just because x86 does something that it happens to be correct. I understand. Will look into other non-x86 platforms as well on how they are dealing with this. > > I strongly suspect there are other cases that would fall afoul of this, > in both arm64 and generic code. Will start looking into all such possible cases both on arm64 and generic. Mean while more such pointers would be really helpful. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BBA1C10F0B for ; Thu, 18 Apr 2019 05:28:44 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5A83021479 for ; Thu, 18 Apr 2019 05:28:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="iVEKnFWv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A83021479 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:References:To:Subject:From:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=5/Q1npXF+m+iPG+JOcvsFCDrobnyAp2OOJSNqs4lTaw=; b=iVEKnFWvdkf4uT tKgvqI37gajuOWU1obWOJE6gvOYZsFQR8wb9uZP3j4ZsNDeZATMxnzp9OOdQzv764yR4znMngZwJ3 qsh8IdSv2fx/1lfqiN2dpwnhlFsn7prYAIrFbFri3/hf80ustozqLFD/eN5IyX5XIC9vwhFOqjfsD T0En3XFl+EvRqvmtRhBpWJTTys7Uly6/FONDThJNfOqG89223CQYYLx9JXR2f1m23aymMvN2xMh6F UKO3KRqi2s5EmN9Uf6T+3H/yY1o1xc4VrpMsPuB0QMQzTFPNGDgC5AcGsysyAYVsIJeynGxSzDINX 68LPlK60zvY+nYyaFJ+Q==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hGzbE-0000Ho-EJ; Thu, 18 Apr 2019 05:28:40 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hGzbB-0000Gw-9i for linux-arm-kernel@lists.infradead.org; Thu, 18 Apr 2019 05:28:39 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 54CD980D; Wed, 17 Apr 2019 22:28:36 -0700 (PDT) Received: from [192.168.0.129] (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E6E9D3F68F; Wed, 17 Apr 2019 22:28:30 -0700 (PDT) From: Anshuman Khandual Subject: Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove To: Mark Rutland References: <1555221553-18845-1-git-send-email-anshuman.khandual@arm.com> <1555221553-18845-3-git-send-email-anshuman.khandual@arm.com> <20190415134841.GC13990@lakrids.cambridge.arm.com> <2faba38b-ab79-2dda-1b3c-ada5054d91fa@arm.com> <20190417142154.GA393@lakrids.cambridge.arm.com> <20190417173948.GB15589@lakrids.cambridge.arm.com> Message-ID: <1bdae67b-fcd6-7868-8a92-c8a306c04ec6@arm.com> Date: Thu, 18 Apr 2019 10:58:29 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190417173948.GB15589@lakrids.cambridge.arm.com> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190417_222837_346163_679F1692 X-CRM114-Status: GOOD ( 22.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cai@lca.pw, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com, catalin.marinas@arm.com, will.deacon@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, logang@deltatee.com, james.morse@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, akpm@linux-foundation.org, osalvador@suse.de, mgorman@techsingularity.net, dan.j.williams@intel.com, linux-arm-kernel@lists.infradead.org, robin.murphy@arm.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 04/17/2019 11:09 PM, Mark Rutland wrote: > On Wed, Apr 17, 2019 at 10:15:35PM +0530, Anshuman Khandual wrote: >> On 04/17/2019 07:51 PM, Mark Rutland wrote: >>> On Wed, Apr 17, 2019 at 03:28:18PM +0530, Anshuman Khandual wrote: >>>> On 04/15/2019 07:18 PM, Mark Rutland wrote: >>>>> On Sun, Apr 14, 2019 at 11:29:13AM +0530, Anshuman Khandual wrote: > >>>>>> + spin_unlock(&init_mm.page_table_lock); >>>>> >>>>> What precisely is the page_table_lock intended to protect? >>>> >>>> Concurrent modification to kernel page table (init_mm) while clearing entries. >>> >>> Concurrent modification by what code? >>> >>> If something else can *modify* the portion of the table that we're >>> manipulating, then I don't see how we can safely walk the table up to >>> this point without holding the lock, nor how we can safely add memory. >>> >>> Even if this is to protect something else which *reads* the tables, >>> other code in arm64 which modifies the kernel page tables doesn't take >>> the lock. >>> >>> Usually, if you can do a lockless walk you have to verify that things >>> didn't change once you've taken the lock, but we don't follow that >>> pattern here. >>> >>> As things stand it's not clear to me whether this is necessary or >>> sufficient. >> >> Hence lets take more conservative approach and wrap the entire process of >> remove_pagetable() under init_mm.page_table_lock which looks safe unless >> in the worst case when free_pages() gets stuck for some reason in which >> case we have bigger memory problem to deal with than a soft lock up. > > Sorry, but I'm not happy with _any_ solution until we understand where > and why we need to take the init_mm ptl, and have made some effort to > ensure that the kernel correctly does so elsewhere. It is not sufficient > to consider this code in isolation. We will have to take the kernel page table lock to prevent assumption regarding present or future possible kernel VA space layout. Wrapping around the entire remove_pagetable() will be at coarse granularity but I dont see why it should not sufficient atleast from this particular tear down operation regardless of how this might affect other kernel pgtable walkers. IIUC your concern is regarding other parts of kernel code (arm64/generic) which assume that kernel page table wont be changing and hence they normally walk the table without holding pgtable lock. Hence those current pgtabe walker will be affected after this change. > > IIUC, before this patch we never clear non-leaf entries in the kernel > page tables, so readers don't presently need to take the ptl in order to > safely walk down to a leaf entry. Got it. Will look into this. > > For example, the arm64 ptdump code never takes the ptl, and as of this > patch it will blow up if it races with a hot-remove, regardless of > whether the hot-remove code itself holds the ptl. Got it. Are there there more such examples where this can be problematic. I will be happy to investigate all such places and change/add locking scheme in there to make them work with memory hot remove. > > Note that the same applies to the x86 ptdump code; we cannot assume that > just because x86 does something that it happens to be correct. I understand. Will look into other non-x86 platforms as well on how they are dealing with this. > > I strongly suspect there are other cases that would fall afoul of this, > in both arm64 and generic code. Will start looking into all such possible cases both on arm64 and generic. Mean while more such pointers would be really helpful. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel