From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756396AbdGKS31 (ORCPT ); Tue, 11 Jul 2017 14:29:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55850 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756102AbdGKS30 (ORCPT ); Tue, 11 Jul 2017 14:29:26 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 81DB8C0587D8 Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jglisse@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 81DB8C0587D8 Date: Tue, 11 Jul 2017 14:29:22 -0400 From: Jerome Glisse To: Evgeny Baskakov Cc: "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , John Hubbard , David Nellans , Mark Hairgrove , Sherry Cheung , Subhash Gutti Subject: Re: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4 Message-ID: <20170711182922.GC5347@redhat.com> References: <20170522165206.6284-1-jglisse@redhat.com> <20170522165206.6284-13-jglisse@redhat.com> <5f476e8c-8256-13a8-2228-a2b9e5650586@nvidia.com> <20170701005749.GA7232@redhat.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="dDRMvlgZJXvWKvBx" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 11 Jul 2017 18:29:25 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --dDRMvlgZJXvWKvBx Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Mon, Jul 10, 2017 at 04:44:38PM -0700, Evgeny Baskakov wrote: > On 6/30/17 5:57 PM, Jerome Glisse wrote: > > ... > > Hi Jerome, > > I am working on a sporadic data corruption seen in highly contented use > cases. So far, I've been able to re-create a sporadic hang that happens when > multiple threads compete to migrate the same page to and from device memory. > The reproducer uses only the dummy driver from hmm-next. > > Please find attached. This is how it hangs on my 12-core Intel i7-5930K SMT > system: > Can you test if attached patch helps ? I am having trouble reproducing this from inside a vm. My theory is that 2 concurrent CPU page fault happens. First one manage to start the migration back to system memory but second one see the migration special entry and call migration_entry_wait() which increase page refcount and this happen before first one check page refcount are ok for migration. For regular migration such scenario is ok as the migration bails out and because page is CPU accessible there is no need to kick again the migration for other thread that CPU fault to migrate. I am looking into how i can change migration_entry_wait() not to refcount pages. Let me know if the attached patch helps. Thank you Jérôme --dDRMvlgZJXvWKvBx Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0001-TEST-THEORY-ABOUT-MIGRATION-AND-DEVICE.patch" >>From 97ca95e004883a1223a84844e985f45222593734 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= Date: Tue, 11 Jul 2017 14:24:59 -0400 Subject: [PATCH] TEST THEORY ABOUT MIGRATION AND DEVICE --- mm/migrate.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 643ea61ca9bb..10e99770da91 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2235,7 +2235,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, */ page_remove_rmap(page, false); put_page(page); - unmapped++; + + if (pte_present(pte)) + unmapped++; } next: @@ -2313,6 +2315,7 @@ static bool migrate_vma_check_page(struct page *page) if (is_zone_device_page(page)) { if (is_device_private_page(page)) { extra++; + return true; } else /* Other ZONE_DEVICE memory type are not supported */ return false; -- 2.13.0 --dDRMvlgZJXvWKvBx--