From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5B73C433E2 for ; Tue, 15 Sep 2020 18:54:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7DB39206E6 for ; Tue, 15 Sep 2020 18:54:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="gdIp7xtB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727924AbgIOSyV (ORCPT ); Tue, 15 Sep 2020 14:54:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727771AbgIORXk (ORCPT ); Tue, 15 Sep 2020 13:23:40 -0400 Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42BE8C061355 for ; Tue, 15 Sep 2020 10:14:23 -0700 (PDT) Received: by mail-io1-xd44.google.com with SMTP id r25so4994074ioj.0 for ; Tue, 15 Sep 2020 10:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=gdIp7xtBO/MSpEb9o8jN2IyQr26hmZe4Y1Inos4WGDR9JaBIQIi1PU3JIQChI0g3d7 LNCuIXI9NjrLEa4Z7VzoHoYibkJTvYYNe4XMxYvI8Key8Lz+Dn5TOJoG3OirB4lNf/0v CZwnoRL2aTjSfXTMe2vCnRhhGcF5/px2guqwjtyoY+pY4/WqMrRsa3daDkNGpvajy3Os F5adkpgafrtJqtA1mdt1vlIxNc/cvIEHgPvBIGAHbOJRmRqdTTGQSyxSLDdT0bWH+lVs aoX0hyxrh/WWLqV3qXKOZbrIk4oQpUdoGJ5imOxIC9ur6GCE2cmivVAVgws5orQ0743v GgWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=hoz4ueGfdS9vwf9Um1PzSBvDXNmMMvHDyixCCsjudIsQMwYKC7NZavViIaIh2Jm1VG KwO5XImn2yb2kBpiBbwwlieqniemgBS5OUH/zsKOD9ROsxA9JzedUuCgXZwR/5S0xahz rQlt/QW4Y1lO6rCZnBqSwDweIjEPUNPQ/4fta8Hdx4WFJkeGMkKFvq8DqZN1mAacahrp 7odd1LPN8L9uOjTRMJacT3EPAT9eTHRdCxs4HcEDw8076cjgxM06zGxkMNwdR0gsBoBh mgz+q9wyvVjc1gI6MW5x3eojt1Jrn1V9vQnG9dFsPaP67cyJoapQgjgZpv9lHt0rMGUp MOxw== X-Gm-Message-State: AOAM533rXIsXeMmnJU+lPkcMtr2BaS4mdXNXwkifNvkEPBRc7lKfmte3 FL5xf3V7tS3UahKeZd/Xh/ER1w== X-Google-Smtp-Source: ABdhPJzPmldAEg1RI32WpLRYJaop5kMVcEE7x7cIiRQrPffUZq0WsPOosFaK5gGj43Fd4thLTpOr6w== X-Received: by 2002:a5e:9916:: with SMTP id t22mr16004622ioj.163.1600190061826; Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id z2sm4640548ilz.37.2020.09.15.10.14.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kIEX6-006Vzt-2h; Tue, 15 Sep 2020 14:14:20 -0300 Date: Tue, 15 Sep 2020 14:14:20 -0300 From: Jason Gunthorpe To: Vasily Gorbik Cc: John Hubbard , Linus Torvalds , Gerald Schaefer , Alexander Gordeev , Peter Zijlstra , Dave Hansen , LKML , linux-mm , linux-arch , Andrew Morton , Russell King , Mike Rapoport , Catalin Marinas , Will Deacon , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Jeff Dike , Richard Weinberger , Dave Hansen , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Andrey Ryabinin , linux-x86 , linux-arm , linux-power , linux-sparc , linux-um , linux-s390 , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda Subject: Re: [PATCH v2] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200915171420.GK1221970@ziepe.ca> References: <20200911200511.GC1221970@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 11, 2020 at 10:36:43PM +0200, Vasily Gorbik wrote: > Currently to make sure that every page table entry is read just once > gup_fast walks perform READ_ONCE and pass pXd value down to the next > gup_pXd_range function by value e.g.: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > ... > pudp = pud_offset(&p4d, addr); > > This function passes a reference on that local value copy to pXd_offset, > and might get the very same pointer in return. This happens when the > level is folded (on most arches), and that pointer should not be iterated. > > On s390 due to the fact that each task might have different 5,4 or > 3-level address translation and hence different levels folded the logic > is more complex and non-iteratable pointer to a local copy leads to > severe problems. > > Here is an example of what happens with gup_fast on s390, for a task > with 3-levels paging, crossing a 2 GB pud boundary: > > // addr = 0x1007ffff000, end = 0x10080001000 > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > { > unsigned long next; > pud_t *pudp; > > // pud_offset returns &p4d itself (a pointer to a value on stack) > pudp = pud_offset(&p4d, addr); > do { > // on second iteratation reading "random" stack value > pud_t pud = READ_ONCE(*pudp); > > // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 > next = pud_addr_end(addr, end); > ... > } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack > > return 1; > } > > This happens since s390 moved to common gup code with > commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") > and commit 1a42010cdc26 ("s390/mm: convert to the generic > get_user_pages_fast code"). s390 tried to mimic static level folding by > changing pXd_offset primitives to always calculate top level page table > offset in pgd_offset and just return the value passed when pXd_offset > has to act as folded. > > What is crucial for gup_fast and what has been overlooked is > that PxD_SIZE/MASK and thus pXd_addr_end should also change > correspondingly. And the latter is not possible with dynamic folding. > > To fix the issue in addition to pXd values pass original > pXdp pointers down to gup_pXd_range functions. And introduce > pXd_offset_lockless helpers, which take an additional pXd > entry value parameter. This has already been discussed in > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > > Cc: # 5.2+ > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") > Reviewed-by: Gerald Schaefer > Reviewed-by: Alexander Gordeev > Signed-off-by: Vasily Gorbik > --- > v2: added brackets &pgd -> &(pgd) Reviewed-by: Jason Gunthorpe Regards, Jason From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Date: Tue, 15 Sep 2020 17:14:20 +0000 Subject: Re: [PATCH v2] mm/gup: fix gup_fast with dynamic page table folding Message-Id: <20200915171420.GK1221970@ziepe.ca> List-Id: References: <20200911200511.GC1221970@ziepe.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Vasily Gorbik Cc: John Hubbard , Linus Torvalds , Gerald Schaefer , Alexander Gordeev , Peter Zijlstra , Dave Hansen , LKML , linux-mm , linux-arch , Andrew Morton , Russell King , Mike Rapoport , Catalin Marinas , Will Deacon , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Jeff Dike , Richard Weinberger , Dave Hansen , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Andrey Ryabinin , linux-x86 , linux-arm , linux-power , linux-sparc , linux-um , linux-s390 , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda On Fri, Sep 11, 2020 at 10:36:43PM +0200, Vasily Gorbik wrote: > Currently to make sure that every page table entry is read just once > gup_fast walks perform READ_ONCE and pass pXd value down to the next > gup_pXd_range function by value e.g.: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > ... > pudp = pud_offset(&p4d, addr); > > This function passes a reference on that local value copy to pXd_offset, > and might get the very same pointer in return. This happens when the > level is folded (on most arches), and that pointer should not be iterated. > > On s390 due to the fact that each task might have different 5,4 or > 3-level address translation and hence different levels folded the logic > is more complex and non-iteratable pointer to a local copy leads to > severe problems. > > Here is an example of what happens with gup_fast on s390, for a task > with 3-levels paging, crossing a 2 GB pud boundary: > > // addr = 0x1007ffff000, end = 0x10080001000 > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > { > unsigned long next; > pud_t *pudp; > > // pud_offset returns &p4d itself (a pointer to a value on stack) > pudp = pud_offset(&p4d, addr); > do { > // on second iteratation reading "random" stack value > pud_t pud = READ_ONCE(*pudp); > > // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 > next = pud_addr_end(addr, end); > ... > } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack > > return 1; > } > > This happens since s390 moved to common gup code with > commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") > and commit 1a42010cdc26 ("s390/mm: convert to the generic > get_user_pages_fast code"). s390 tried to mimic static level folding by > changing pXd_offset primitives to always calculate top level page table > offset in pgd_offset and just return the value passed when pXd_offset > has to act as folded. > > What is crucial for gup_fast and what has been overlooked is > that PxD_SIZE/MASK and thus pXd_addr_end should also change > correspondingly. And the latter is not possible with dynamic folding. > > To fix the issue in addition to pXd values pass original > pXdp pointers down to gup_pXd_range functions. And introduce > pXd_offset_lockless helpers, which take an additional pXd > entry value parameter. This has already been discussed in > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > > Cc: # 5.2+ > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") > Reviewed-by: Gerald Schaefer > Reviewed-by: Alexander Gordeev > Signed-off-by: Vasily Gorbik > --- > v2: added brackets &pgd -> &(pgd) Reviewed-by: Jason Gunthorpe Regards, Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AA29C2D0E0 for ; Tue, 15 Sep 2020 17:16:30 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7DFED20809 for ; Tue, 15 Sep 2020 17:16:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="gdIp7xtB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DFED20809 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4BrVJC26cczDqSs for ; Wed, 16 Sep 2020 03:16:27 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=ziepe.ca (client-ip=2607:f8b0:4864:20::d43; helo=mail-io1-xd43.google.com; envelope-from=jgg@ziepe.ca; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=ziepe.ca header.i=@ziepe.ca header.a=rsa-sha256 header.s=google header.b=gdIp7xtB; dkim-atps=neutral Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4BrVFv5KqlzDqNJ for ; Wed, 16 Sep 2020 03:14:25 +1000 (AEST) Received: by mail-io1-xd43.google.com with SMTP id z25so4907693iol.10 for ; Tue, 15 Sep 2020 10:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=gdIp7xtBO/MSpEb9o8jN2IyQr26hmZe4Y1Inos4WGDR9JaBIQIi1PU3JIQChI0g3d7 LNCuIXI9NjrLEa4Z7VzoHoYibkJTvYYNe4XMxYvI8Key8Lz+Dn5TOJoG3OirB4lNf/0v CZwnoRL2aTjSfXTMe2vCnRhhGcF5/px2guqwjtyoY+pY4/WqMrRsa3daDkNGpvajy3Os F5adkpgafrtJqtA1mdt1vlIxNc/cvIEHgPvBIGAHbOJRmRqdTTGQSyxSLDdT0bWH+lVs aoX0hyxrh/WWLqV3qXKOZbrIk4oQpUdoGJ5imOxIC9ur6GCE2cmivVAVgws5orQ0743v GgWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=BLD97mWPKUnHlKBjDn+3i2JU6jYtbhuFV66UM+ApWZi+5OptO86xF+Is9GhsETxu5Y zFSiag/mJgcOW30/xXRR/MAw3sY218t7K8LIipNCdbe/7PRi4frcZkxLQc1B9mRsgdvs pFbxMI7ZZfPpBhIxNmc5ArTtX+k/WfpFS5Hla5+0OkZ3gIenwwYuTQ7KeSN2R0Z2aF5h g+hunI9dAvHKiVealr7v6EN7Q0TGUbaiD3pZ8O+bwLb0F1rGMtkt0zA9Ofk1jlPMzZJg PfLWJxRHuE6TCV04FFegk0Btw7G0+4k2YZFtG35aX4HYHI2RFxZi7CKFoEoUwL3H5ITr F+Lg== X-Gm-Message-State: AOAM531pLsfoBCJtR1ZB0alNc4T5lzZn5h69z9fF7Y5F47kDawPZ0OjD lkfvNcHnUSomy1YkiInUhwI0MA== X-Google-Smtp-Source: ABdhPJzPmldAEg1RI32WpLRYJaop5kMVcEE7x7cIiRQrPffUZq0WsPOosFaK5gGj43Fd4thLTpOr6w== X-Received: by 2002:a5e:9916:: with SMTP id t22mr16004622ioj.163.1600190061826; Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id z2sm4640548ilz.37.2020.09.15.10.14.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kIEX6-006Vzt-2h; Tue, 15 Sep 2020 14:14:20 -0300 Date: Tue, 15 Sep 2020 14:14:20 -0300 From: Jason Gunthorpe To: Vasily Gorbik Subject: Re: [PATCH v2] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200915171420.GK1221970@ziepe.ca> References: <20200911200511.GC1221970@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Zijlstra , Dave Hansen , linux-mm , Paul Mackerras , linux-sparc , Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-arch , linux-s390 , Richard Weinberger , linux-x86 , Russell King , Christian Borntraeger , Ingo Molnar , Catalin Marinas , Andrey Ryabinin , Gerald Schaefer , Heiko Carstens , Arnd Bergmann , John Hubbard , Jeff Dike , linux-um , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , linux-arm , Dave Hansen , linux-power , LKML , Andrew Morton , Linus Torvalds , Mike Rapoport Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Fri, Sep 11, 2020 at 10:36:43PM +0200, Vasily Gorbik wrote: > Currently to make sure that every page table entry is read just once > gup_fast walks perform READ_ONCE and pass pXd value down to the next > gup_pXd_range function by value e.g.: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > ... > pudp = pud_offset(&p4d, addr); > > This function passes a reference on that local value copy to pXd_offset, > and might get the very same pointer in return. This happens when the > level is folded (on most arches), and that pointer should not be iterated. > > On s390 due to the fact that each task might have different 5,4 or > 3-level address translation and hence different levels folded the logic > is more complex and non-iteratable pointer to a local copy leads to > severe problems. > > Here is an example of what happens with gup_fast on s390, for a task > with 3-levels paging, crossing a 2 GB pud boundary: > > // addr = 0x1007ffff000, end = 0x10080001000 > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > { > unsigned long next; > pud_t *pudp; > > // pud_offset returns &p4d itself (a pointer to a value on stack) > pudp = pud_offset(&p4d, addr); > do { > // on second iteratation reading "random" stack value > pud_t pud = READ_ONCE(*pudp); > > // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 > next = pud_addr_end(addr, end); > ... > } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack > > return 1; > } > > This happens since s390 moved to common gup code with > commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") > and commit 1a42010cdc26 ("s390/mm: convert to the generic > get_user_pages_fast code"). s390 tried to mimic static level folding by > changing pXd_offset primitives to always calculate top level page table > offset in pgd_offset and just return the value passed when pXd_offset > has to act as folded. > > What is crucial for gup_fast and what has been overlooked is > that PxD_SIZE/MASK and thus pXd_addr_end should also change > correspondingly. And the latter is not possible with dynamic folding. > > To fix the issue in addition to pXd values pass original > pXdp pointers down to gup_pXd_range functions. And introduce > pXd_offset_lockless helpers, which take an additional pXd > entry value parameter. This has already been discussed in > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > > Cc: # 5.2+ > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") > Reviewed-by: Gerald Schaefer > Reviewed-by: Alexander Gordeev > Signed-off-by: Vasily Gorbik > --- > v2: added brackets &pgd -> &(pgd) Reviewed-by: Jason Gunthorpe Regards, Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F14CCC433E2 for ; Tue, 15 Sep 2020 17:16:21 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 979B920809 for ; Tue, 15 Sep 2020 17:16:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="sDbtCeiS"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="gdIp7xtB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 979B920809 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=v/sEa14is9ZHlCK1hPSw7VoXa+2kK+q0wmnxziIBg1Q=; b=sDbtCeiS5S42ccBmcPgAgs4JT 4QRPEUKcc33NM6OGYdJ7l0I5xqp6pOMa2sGhNJ6JnKH2i0t9Qh2M7wopVNFt0rLeWkA9PRxvwW330 7/Gg01d+2FJZjdh37D2AG34bEWS3821BEKP69e7RNPg1TKLWNv9GQKswNmKMarrFYjTXEjDAUUsmv aiwcbjPCOarB5OemN4uVX5f45amZc2VKhaQly1v9WTT7h/QBnqVIV5Rb2BkSU3Pq6hPOiDnO0vMFo ZO49sqMpgBv0PLlAMhNPVF3CG4LFnjlm+wjHlIDg0V60qXJQ6cLuxLURS50VvF+lSajtoJMOzqgyA eAHVFQGdg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kIEXG-0008Tw-Vr; Tue, 15 Sep 2020 17:14:31 +0000 Received: from mail-io1-xd41.google.com ([2607:f8b0:4864:20::d41]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kIEXB-0008RG-Hp for linux-arm-kernel@lists.infradead.org; Tue, 15 Sep 2020 17:14:27 +0000 Received: by mail-io1-xd41.google.com with SMTP id y13so4954532iow.4 for ; Tue, 15 Sep 2020 10:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=gdIp7xtBO/MSpEb9o8jN2IyQr26hmZe4Y1Inos4WGDR9JaBIQIi1PU3JIQChI0g3d7 LNCuIXI9NjrLEa4Z7VzoHoYibkJTvYYNe4XMxYvI8Key8Lz+Dn5TOJoG3OirB4lNf/0v CZwnoRL2aTjSfXTMe2vCnRhhGcF5/px2guqwjtyoY+pY4/WqMrRsa3daDkNGpvajy3Os F5adkpgafrtJqtA1mdt1vlIxNc/cvIEHgPvBIGAHbOJRmRqdTTGQSyxSLDdT0bWH+lVs aoX0hyxrh/WWLqV3qXKOZbrIk4oQpUdoGJ5imOxIC9ur6GCE2cmivVAVgws5orQ0743v GgWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=lsWQ8OqvdQuy+2fudVGWbOWlQs7HkYOGz8Ff0/MrU+4mN8J5HWKRU9Ajf2wDLAjDJY WGDTlKdxbhXdfGAFFZSE+7+tdB3Z9FJMl18Ury3HkMObC536AYYBqQ/242+Ba5QQo6rO FT/pNyBMNixAfaHGKyrb1K3N0LvICUpyYrH9xOCCJjs6+qYH2VCZbnTrxDPf4gyWgnlR tsVqWkztQiWJ8S1ResUmyBwFLZ3UACP6pMt4zmqljwsvKn3dQCsLnufn0wpK3DYkkXBN /7CGgvrie7bezvqpZbk7b/bvXx8Z4zfPsoAGiDtzJ2NERPOBpg7ax5P6XHJmT7I2+9KZ maaw== X-Gm-Message-State: AOAM530C0wh8hoxwpT9MbDIVVfeVAiMwWBhAFjuphTsTXpcjAM8nBF6c ellXtWnN0KBV06bOyO10/Wmrvg== X-Google-Smtp-Source: ABdhPJzPmldAEg1RI32WpLRYJaop5kMVcEE7x7cIiRQrPffUZq0WsPOosFaK5gGj43Fd4thLTpOr6w== X-Received: by 2002:a5e:9916:: with SMTP id t22mr16004622ioj.163.1600190061826; Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id z2sm4640548ilz.37.2020.09.15.10.14.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kIEX6-006Vzt-2h; Tue, 15 Sep 2020 14:14:20 -0300 Date: Tue, 15 Sep 2020 14:14:20 -0300 From: Jason Gunthorpe To: Vasily Gorbik Subject: Re: [PATCH v2] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200915171420.GK1221970@ziepe.ca> References: <20200911200511.GC1221970@ziepe.ca> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200915_131425_761176_4FFF7268 X-CRM114-Status: GOOD ( 27.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Zijlstra , Benjamin Herrenschmidt , Dave Hansen , linux-mm , Paul Mackerras , linux-sparc , Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-arch , linux-s390 , Richard Weinberger , linux-x86 , Russell King , Christian Borntraeger , Ingo Molnar , Catalin Marinas , Andrey Ryabinin , Gerald Schaefer , Heiko Carstens , Arnd Bergmann , John Hubbard , Jeff Dike , linux-um , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , linux-arm , Dave Hansen , linux-power , LKML , Michael Ellerman , Andrew Morton , Linus Torvalds , Mike Rapoport Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Sep 11, 2020 at 10:36:43PM +0200, Vasily Gorbik wrote: > Currently to make sure that every page table entry is read just once > gup_fast walks perform READ_ONCE and pass pXd value down to the next > gup_pXd_range function by value e.g.: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > ... > pudp = pud_offset(&p4d, addr); > > This function passes a reference on that local value copy to pXd_offset, > and might get the very same pointer in return. This happens when the > level is folded (on most arches), and that pointer should not be iterated. > > On s390 due to the fact that each task might have different 5,4 or > 3-level address translation and hence different levels folded the logic > is more complex and non-iteratable pointer to a local copy leads to > severe problems. > > Here is an example of what happens with gup_fast on s390, for a task > with 3-levels paging, crossing a 2 GB pud boundary: > > // addr = 0x1007ffff000, end = 0x10080001000 > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > { > unsigned long next; > pud_t *pudp; > > // pud_offset returns &p4d itself (a pointer to a value on stack) > pudp = pud_offset(&p4d, addr); > do { > // on second iteratation reading "random" stack value > pud_t pud = READ_ONCE(*pudp); > > // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 > next = pud_addr_end(addr, end); > ... > } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack > > return 1; > } > > This happens since s390 moved to common gup code with > commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") > and commit 1a42010cdc26 ("s390/mm: convert to the generic > get_user_pages_fast code"). s390 tried to mimic static level folding by > changing pXd_offset primitives to always calculate top level page table > offset in pgd_offset and just return the value passed when pXd_offset > has to act as folded. > > What is crucial for gup_fast and what has been overlooked is > that PxD_SIZE/MASK and thus pXd_addr_end should also change > correspondingly. And the latter is not possible with dynamic folding. > > To fix the issue in addition to pXd values pass original > pXdp pointers down to gup_pXd_range functions. And introduce > pXd_offset_lockless helpers, which take an additional pXd > entry value parameter. This has already been discussed in > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > > Cc: # 5.2+ > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") > Reviewed-by: Gerald Schaefer > Reviewed-by: Alexander Gordeev > Signed-off-by: Vasily Gorbik > --- > v2: added brackets &pgd -> &(pgd) Reviewed-by: Jason Gunthorpe Regards, Jason _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd42.google.com ([2607:f8b0:4864:20::d42]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kIEXB-0008RF-E3 for linux-um@lists.infradead.org; Tue, 15 Sep 2020 17:14:27 +0000 Received: by mail-io1-xd42.google.com with SMTP id z13so4920490iom.8 for ; Tue, 15 Sep 2020 10:14:23 -0700 (PDT) Date: Tue, 15 Sep 2020 14:14:20 -0300 From: Jason Gunthorpe Subject: Re: [PATCH v2] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200915171420.GK1221970@ziepe.ca> References: <20200911200511.GC1221970@ziepe.ca> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: Vasily Gorbik Cc: Peter Zijlstra , Benjamin Herrenschmidt , Dave Hansen , linux-mm , Paul Mackerras , linux-sparc , Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-arch , linux-s390 , Richard Weinberger , linux-x86 , Russell King , Christian Borntraeger , Ingo Molnar , Catalin Marinas , Andrey Ryabinin , Gerald Schaefer , Heiko Carstens , Arnd Bergmann , John Hubbard , Jeff Dike , linux-um , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , linux-arm , Dave Hansen , linux-power , LKML , Michael Ellerman , Andrew Morton , Linus Torvalds , Mike Rapoport On Fri, Sep 11, 2020 at 10:36:43PM +0200, Vasily Gorbik wrote: > Currently to make sure that every page table entry is read just once > gup_fast walks perform READ_ONCE and pass pXd value down to the next > gup_pXd_range function by value e.g.: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > ... > pudp = pud_offset(&p4d, addr); > > This function passes a reference on that local value copy to pXd_offset, > and might get the very same pointer in return. This happens when the > level is folded (on most arches), and that pointer should not be iterated. > > On s390 due to the fact that each task might have different 5,4 or > 3-level address translation and hence different levels folded the logic > is more complex and non-iteratable pointer to a local copy leads to > severe problems. > > Here is an example of what happens with gup_fast on s390, for a task > with 3-levels paging, crossing a 2 GB pud boundary: > > // addr = 0x1007ffff000, end = 0x10080001000 > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > { > unsigned long next; > pud_t *pudp; > > // pud_offset returns &p4d itself (a pointer to a value on stack) > pudp = pud_offset(&p4d, addr); > do { > // on second iteratation reading "random" stack value > pud_t pud = READ_ONCE(*pudp); > > // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 > next = pud_addr_end(addr, end); > ... > } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack > > return 1; > } > > This happens since s390 moved to common gup code with > commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") > and commit 1a42010cdc26 ("s390/mm: convert to the generic > get_user_pages_fast code"). s390 tried to mimic static level folding by > changing pXd_offset primitives to always calculate top level page table > offset in pgd_offset and just return the value passed when pXd_offset > has to act as folded. > > What is crucial for gup_fast and what has been overlooked is > that PxD_SIZE/MASK and thus pXd_addr_end should also change > correspondingly. And the latter is not possible with dynamic folding. > > To fix the issue in addition to pXd values pass original > pXdp pointers down to gup_pXd_range functions. And introduce > pXd_offset_lockless helpers, which take an additional pXd > entry value parameter. This has already been discussed in > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > > Cc: # 5.2+ > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") > Reviewed-by: Gerald Schaefer > Reviewed-by: Alexander Gordeev > Signed-off-by: Vasily Gorbik > --- > v2: added brackets &pgd -> &(pgd) Reviewed-by: Jason Gunthorpe Regards, Jason _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um