From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AD8BC433E0 for ; Wed, 10 Feb 2021 03:41:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D215864DEE for ; Wed, 10 Feb 2021 03:41:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231670AbhBJDlD (ORCPT ); Tue, 9 Feb 2021 22:41:03 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:9979 "EHLO hqnvemgate26.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229678AbhBJDk5 (ORCPT ); Tue, 9 Feb 2021 22:40:57 -0500 Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Tue, 09 Feb 2021 19:40:15 -0800 Received: from DRHQMAIL107.nvidia.com (10.27.9.16) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 10 Feb 2021 03:40:15 +0000 Received: from nvdebian.localnet (172.20.145.6) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 10 Feb 2021 03:40:12 +0000 From: Alistair Popple To: Jason Gunthorpe CC: , , , , , , , , , , Subject: Re: [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate Date: Wed, 10 Feb 2021 14:40:10 +1100 Message-ID: <1780857.6Ip0F2Sa4d@nvdebian> In-Reply-To: <20210209133932.GD4718@ziepe.ca> References: <20210209010722.13839-1-apopple@nvidia.com> <20210209010722.13839-2-apopple@nvidia.com> <20210209133932.GD4718@ziepe.ca> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To DRHQMAIL107.nvidia.com (10.27.9.16) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1612928415; bh=tAmJ0+G97SMFAbiRRa3d8nEsiux+3wqoFMSUz5oWXEY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Transfer-Encoding:Content-Type: X-Originating-IP:X-ClientProxiedBy; b=HROK2Gi63VmhyROJ40EZiAFFzcCoLcWvPFvxuOfp205Epk8rw5YTtBCGIYoAb9hxk CnfqGNLFdzz2Xtsj3yumfUK/9+hBtfXrJEmX4iVqiMoaXgM+TMGxt18tqaG5sw0ScD 43ZxLw3DvD9R33aTIvurLvOsng5qWOn/AzmOA29piHPJdAChtXqQDQvf303F0aYOBT svQatmXEX23sU3Ao/P9UZXWmvJHz/I0b/xjuoB1mUuhQHJ7wLVFQzTgKe0in2pLJIP kkoNwUFXagV0YIb9Ks2CHAtmM5KrXk8JUZRq0BVnbpF6TD5kPWxc0Cx+2iFqHy0KUt zNFD/OabsiQEw== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote: > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote: > > Device private pages are used to represent device memory that is not > > directly accessible from the CPU. Extra references to a device private > > page are only used to ensure the struct page itself remains valid whilst > > waiting for migration entries. Therefore extra references should not > > prevent device private page migration as this can lead to failures to > > migrate pages back to the CPU which are fatal to the user process. > > This should identify the extra references in expected_count, just > disabling this protection seems unsafe, ZONE_DEVICE is not so special > that the refcount means nothing This is similar to what migarte_vma_check_page() does now. The issue is that a migration wait takes a reference on the device private page so you can end up with one thread stuck waiting for migration whilst the other can't migrate due to the extra refcount. Given device private pages can't undergo GUP and that it's not possible to differentiate the migration wait refcount from any other refcount we assume any possible extra reference must be from migration wait. > Is this a side effect of the extra refcounts that Ralph was trying to > get rid of? I'd rather see that work finished :) I'd like to see that finished too but I don't think it would help here as this is not a side effect of that. - Alistair > Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AB90C433DB for ; Wed, 10 Feb 2021 03:40:23 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D241564DE7 for ; Wed, 10 Feb 2021 03:40:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D241564DE7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=nouveau-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6C6CC6EC24; Wed, 10 Feb 2021 03:40:18 +0000 (UTC) Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id F31A36E14D; Wed, 10 Feb 2021 03:40:16 +0000 (UTC) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Tue, 09 Feb 2021 19:40:15 -0800 Received: from DRHQMAIL107.nvidia.com (10.27.9.16) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 10 Feb 2021 03:40:15 +0000 Received: from nvdebian.localnet (172.20.145.6) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 10 Feb 2021 03:40:12 +0000 From: Alistair Popple To: Jason Gunthorpe Date: Wed, 10 Feb 2021 14:40:10 +1100 Message-ID: <1780857.6Ip0F2Sa4d@nvdebian> In-Reply-To: <20210209133932.GD4718@ziepe.ca> References: <20210209010722.13839-1-apopple@nvidia.com> <20210209010722.13839-2-apopple@nvidia.com> <20210209133932.GD4718@ziepe.ca> MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To DRHQMAIL107.nvidia.com (10.27.9.16) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1612928415; bh=tAmJ0+G97SMFAbiRRa3d8nEsiux+3wqoFMSUz5oWXEY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Transfer-Encoding:Content-Type: X-Originating-IP:X-ClientProxiedBy; b=HROK2Gi63VmhyROJ40EZiAFFzcCoLcWvPFvxuOfp205Epk8rw5YTtBCGIYoAb9hxk CnfqGNLFdzz2Xtsj3yumfUK/9+hBtfXrJEmX4iVqiMoaXgM+TMGxt18tqaG5sw0ScD 43ZxLw3DvD9R33aTIvurLvOsng5qWOn/AzmOA29piHPJdAChtXqQDQvf303F0aYOBT svQatmXEX23sU3Ao/P9UZXWmvJHz/I0b/xjuoB1mUuhQHJ7wLVFQzTgKe0in2pLJIP kkoNwUFXagV0YIb9Ks2CHAtmM5KrXk8JUZRq0BVnbpF6TD5kPWxc0Cx+2iFqHy0KUt zNFD/OabsiQEw== Subject: Re: [Nouveau] [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate X-BeenThere: nouveau@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Nouveau development list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rcampbell@nvidia.com, linux-doc@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-mm@kvack.org, bskeggs@redhat.com, akpm@linux-foundation.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: nouveau-bounces@lists.freedesktop.org Sender: "Nouveau" On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote: > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote: > > Device private pages are used to represent device memory that is not > > directly accessible from the CPU. Extra references to a device private > > page are only used to ensure the struct page itself remains valid whilst > > waiting for migration entries. Therefore extra references should not > > prevent device private page migration as this can lead to failures to > > migrate pages back to the CPU which are fatal to the user process. > > This should identify the extra references in expected_count, just > disabling this protection seems unsafe, ZONE_DEVICE is not so special > that the refcount means nothing This is similar to what migarte_vma_check_page() does now. The issue is that a migration wait takes a reference on the device private page so you can end up with one thread stuck waiting for migration whilst the other can't migrate due to the extra refcount. Given device private pages can't undergo GUP and that it's not possible to differentiate the migration wait refcount from any other refcount we assume any possible extra reference must be from migration wait. > Is this a side effect of the extra refcounts that Ralph was trying to > get rid of? I'd rather see that work finished :) I'd like to see that finished too but I don't think it would help here as this is not a side effect of that. - Alistair > Jason _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10F9DC433E0 for ; Wed, 10 Feb 2021 03:40:19 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD5A664DDF for ; Wed, 10 Feb 2021 03:40:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD5A664DDF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9CC886E14D; Wed, 10 Feb 2021 03:40:17 +0000 (UTC) Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id F31A36E14D; Wed, 10 Feb 2021 03:40:16 +0000 (UTC) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Tue, 09 Feb 2021 19:40:15 -0800 Received: from DRHQMAIL107.nvidia.com (10.27.9.16) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 10 Feb 2021 03:40:15 +0000 Received: from nvdebian.localnet (172.20.145.6) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 10 Feb 2021 03:40:12 +0000 From: Alistair Popple To: Jason Gunthorpe Subject: Re: [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate Date: Wed, 10 Feb 2021 14:40:10 +1100 Message-ID: <1780857.6Ip0F2Sa4d@nvdebian> In-Reply-To: <20210209133932.GD4718@ziepe.ca> References: <20210209010722.13839-1-apopple@nvidia.com> <20210209010722.13839-2-apopple@nvidia.com> <20210209133932.GD4718@ziepe.ca> MIME-Version: 1.0 X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To DRHQMAIL107.nvidia.com (10.27.9.16) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1612928415; bh=tAmJ0+G97SMFAbiRRa3d8nEsiux+3wqoFMSUz5oWXEY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Transfer-Encoding:Content-Type: X-Originating-IP:X-ClientProxiedBy; b=HROK2Gi63VmhyROJ40EZiAFFzcCoLcWvPFvxuOfp205Epk8rw5YTtBCGIYoAb9hxk CnfqGNLFdzz2Xtsj3yumfUK/9+hBtfXrJEmX4iVqiMoaXgM+TMGxt18tqaG5sw0ScD 43ZxLw3DvD9R33aTIvurLvOsng5qWOn/AzmOA29piHPJdAChtXqQDQvf303F0aYOBT svQatmXEX23sU3Ao/P9UZXWmvJHz/I0b/xjuoB1mUuhQHJ7wLVFQzTgKe0in2pLJIP kkoNwUFXagV0YIb9Ks2CHAtmM5KrXk8JUZRq0BVnbpF6TD5kPWxc0Cx+2iFqHy0KUt zNFD/OabsiQEw== X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rcampbell@nvidia.com, linux-doc@vger.kernel.org, nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-mm@kvack.org, jglisse@redhat.com, bskeggs@redhat.com, jhubbard@nvidia.com, akpm@linux-foundation.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote: > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote: > > Device private pages are used to represent device memory that is not > > directly accessible from the CPU. Extra references to a device private > > page are only used to ensure the struct page itself remains valid whilst > > waiting for migration entries. Therefore extra references should not > > prevent device private page migration as this can lead to failures to > > migrate pages back to the CPU which are fatal to the user process. > > This should identify the extra references in expected_count, just > disabling this protection seems unsafe, ZONE_DEVICE is not so special > that the refcount means nothing This is similar to what migarte_vma_check_page() does now. The issue is that a migration wait takes a reference on the device private page so you can end up with one thread stuck waiting for migration whilst the other can't migrate due to the extra refcount. Given device private pages can't undergo GUP and that it's not possible to differentiate the migration wait refcount from any other refcount we assume any possible extra reference must be from migration wait. > Is this a side effect of the extra refcounts that Ralph was trying to > get rid of? I'd rather see that work finished :) I'd like to see that finished too but I don't think it would help here as this is not a side effect of that. - Alistair > Jason _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alistair Popple Date: Wed, 10 Feb 2021 03:40:10 +0000 Subject: Re: [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate Message-Id: <1780857.6Ip0F2Sa4d@nvdebian> List-Id: References: <20210209010722.13839-1-apopple@nvidia.com> <20210209010722.13839-2-apopple@nvidia.com> <20210209133932.GD4718@ziepe.ca> In-Reply-To: <20210209133932.GD4718@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Jason Gunthorpe Cc: linux-mm@kvack.org, nouveau@lists.freedesktop.org, bskeggs@redhat.com, akpm@linux-foundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, dri-devel@lists.freedesktop.org, jhubbard@nvidia.com, rcampbell@nvidia.com, jglisse@redhat.com On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote: > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote: > > Device private pages are used to represent device memory that is not > > directly accessible from the CPU. Extra references to a device private > > page are only used to ensure the struct page itself remains valid whilst > > waiting for migration entries. Therefore extra references should not > > prevent device private page migration as this can lead to failures to > > migrate pages back to the CPU which are fatal to the user process. > > This should identify the extra references in expected_count, just > disabling this protection seems unsafe, ZONE_DEVICE is not so special > that the refcount means nothing This is similar to what migarte_vma_check_page() does now. The issue is that a migration wait takes a reference on the device private page so you can end up with one thread stuck waiting for migration whilst the other can't migrate due to the extra refcount. Given device private pages can't undergo GUP and that it's not possible to differentiate the migration wait refcount from any other refcount we assume any possible extra reference must be from migration wait. > Is this a side effect of the extra refcounts that Ralph was trying to > get rid of? I'd rather see that work finished :) I'd like to see that finished too but I don't think it would help here as this is not a side effect of that. - Alistair > Jason