All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>
Subject: Re: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4
Date: Tue, 11 Jul 2017 14:49:19 -0400	[thread overview]
Message-ID: <20170711184919.GD5347@redhat.com> (raw)
In-Reply-To: <7a4478cb-7eb6-2546-e707-1b0f18e3acd4@nvidia.com>

On Tue, Jul 11, 2017 at 11:42:20AM -0700, Evgeny Baskakov wrote:
> On 7/11/17 11:29 AM, Jerome Glisse wrote:
> > Can you test if attached patch helps ? I am having trouble reproducing
> > this
> > from inside a vm.
> > 
> > My theory is that 2 concurrent CPU page fault happens. First one manage to
> > start the migration back to system memory but second one see the migration
> > special entry and call migration_entry_wait() which increase page refcount
> > and this happen before first one check page refcount are ok for migration.
> > 
> > For regular migration such scenario is ok as the migration bails out and
> > because page is CPU accessible there is no need to kick again the migration
> > for other thread that CPU fault to migrate.
> > 
> > I am looking into how i can change migration_entry_wait() not to refcount
> > pages. Let me know if the attached patch helps.
> > 
> > Thank you
> > Jérôme
> 
> Hi Jerome,
> 
> Thanks for the update.
> 
> Unfortunately, the patch does not help. I just applied it and recompiled the
> kernel. Please find attached a new kernel log and an app log.
> 

What are the symptoms ? The program just stop making any progress and you
trigger a sysrequest to dump current states of each threads ? In this
log i don't see migration_entry_wait() anymore but it seems to be waiting
on page lock so there might be 2 issues here.

Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>
Subject: Re: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4
Date: Tue, 11 Jul 2017 14:49:19 -0400	[thread overview]
Message-ID: <20170711184919.GD5347@redhat.com> (raw)
In-Reply-To: <7a4478cb-7eb6-2546-e707-1b0f18e3acd4@nvidia.com>

On Tue, Jul 11, 2017 at 11:42:20AM -0700, Evgeny Baskakov wrote:
> On 7/11/17 11:29 AM, Jerome Glisse wrote:
> > Can you test if attached patch helps ? I am having trouble reproducing
> > this
> > from inside a vm.
> > 
> > My theory is that 2 concurrent CPU page fault happens. First one manage to
> > start the migration back to system memory but second one see the migration
> > special entry and call migration_entry_wait() which increase page refcount
> > and this happen before first one check page refcount are ok for migration.
> > 
> > For regular migration such scenario is ok as the migration bails out and
> > because page is CPU accessible there is no need to kick again the migration
> > for other thread that CPU fault to migrate.
> > 
> > I am looking into how i can change migration_entry_wait() not to refcount
> > pages. Let me know if the attached patch helps.
> > 
> > Thank you
> > Jerome
> 
> Hi Jerome,
> 
> Thanks for the update.
> 
> Unfortunately, the patch does not help. I just applied it and recompiled the
> kernel. Please find attached a new kernel log and an app log.
> 

What are the symptoms ? The program just stop making any progress and you
trigger a sysrequest to dump current states of each threads ? In this
log i don't see migration_entry_wait() anymore but it seems to be waiting
on page lock so there might be 2 issues here.

Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-11 18:49 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-22 16:51 [HMM 00/15] HMM (Heterogeneous Memory Management) v22 Jérôme Glisse
2017-05-22 16:51 ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 01/15] hmm: heterogeneous memory management documentation Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 02/15] mm/hmm: heterogeneous memory management (HMM for short) v3 Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 03/15] mm/hmm/mirror: mirror process address space on device with HMM helpers v3 Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 04/15] mm/hmm/mirror: helper to snapshot CPU page table v3 Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 05/15] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 06/15] mm/memory_hotplug: introduce add_pages Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 16:51 ` [HMM 07/15] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v2 Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 21:17   ` Dan Williams
2017-05-22 21:17     ` Dan Williams
2017-05-23 21:36     ` [HMM 07/18] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v3 Jérôme Glisse
2017-05-23 21:36       ` Jérôme Glisse
2017-05-23  8:36   ` [HMM 07/15] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v2 kbuild test robot
2017-05-23  8:36     ` kbuild test robot
2017-05-22 16:51 ` [HMM 08/15] mm/ZONE_DEVICE: special case put_page() for device private pages Jérôme Glisse
2017-05-22 16:51   ` Jérôme Glisse
2017-05-22 19:29   ` Dan Williams
2017-05-22 19:29     ` Dan Williams
2017-05-22 20:14     ` Jerome Glisse
2017-05-22 20:14       ` Jerome Glisse
2017-05-22 20:19       ` Dan Williams
2017-05-22 20:19         ` Dan Williams
2017-05-22 21:14         ` Jerome Glisse
2017-05-22 21:14           ` Jerome Glisse
2017-05-22 20:22       ` Hugh Dickins
2017-05-22 20:22         ` Hugh Dickins
2017-05-22 21:17         ` Jerome Glisse
2017-05-22 21:17           ` Jerome Glisse
2017-05-23  9:34   ` kbuild test robot
2017-05-23  9:34     ` kbuild test robot
2017-05-23 13:23   ` Kirill A. Shutemov
2017-05-23 13:23     ` Kirill A. Shutemov
2017-05-23 21:37     ` [HMM 08/18] mm/ZONE_DEVICE: special case put_page() for device private pages v2 Jérôme Glisse
2017-05-23 21:37       ` Jérôme Glisse
2017-05-22 16:52 ` [HMM 09/15] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE v4 Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-23 21:37   ` [HMM 09/18] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE v5 Jérôme Glisse
2017-05-23 21:37     ` Jérôme Glisse
2017-05-22 16:52 ` [HMM 10/15] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v3 Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-22 16:52 ` [HMM 11/15] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-22 16:52 ` [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4 Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-23 18:07   ` Reza Arbab
2017-05-23 18:07     ` Reza Arbab
2017-06-27  0:07   ` Evgeny Baskakov
2017-06-30 23:19     ` Evgeny Baskakov
2017-06-30 23:19       ` Evgeny Baskakov
2017-07-01  0:57       ` Jerome Glisse
2017-07-01  0:57         ` Jerome Glisse
2017-07-01  2:06         ` Evgeny Baskakov
2017-07-01  2:06           ` Evgeny Baskakov
2017-07-10 22:59         ` Evgeny Baskakov
2017-07-10 23:43           ` Jerome Glisse
2017-07-10 23:43             ` Jerome Glisse
2017-07-11  0:17             ` Evgeny Baskakov
2017-07-11  0:17               ` Evgeny Baskakov
2017-07-11  0:54               ` Jerome Glisse
2017-07-11  0:54                 ` Jerome Glisse
2017-07-20 21:05                 ` Evgeny Baskakov
2017-07-20 21:05                   ` Evgeny Baskakov
2017-07-10 23:44         ` Evgeny Baskakov
2017-07-11 18:29           ` Jerome Glisse
2017-07-11 18:29             ` Jerome Glisse
2017-07-11 18:42             ` Evgeny Baskakov
2017-07-11 18:42               ` Evgeny Baskakov
2017-07-11 18:49               ` Jerome Glisse [this message]
2017-07-11 18:49                 ` Jerome Glisse
2017-07-11 19:35                 ` Evgeny Baskakov
2017-07-11 19:35                   ` Evgeny Baskakov
2017-07-13 20:16                   ` Jerome Glisse
2017-07-13 20:16                     ` Jerome Glisse
2017-07-14  5:32                     ` Evgeny Baskakov
2017-07-14  5:32                       ` Evgeny Baskakov
2017-07-14 19:43                     ` Evgeny Baskakov
2017-07-15  0:55                       ` Jerome Glisse
2017-07-15  0:55                         ` Jerome Glisse
2017-07-15  5:04                         ` Evgeny Baskakov
2017-07-15  5:04                           ` Evgeny Baskakov
2017-07-21  1:00                         ` Evgeny Baskakov
2017-07-21  1:00                           ` Evgeny Baskakov
2017-07-21  1:33                           ` Jerome Glisse
2017-07-21  1:33                             ` Jerome Glisse
2017-07-21 22:01                             ` Evgeny Baskakov
2017-07-21 22:01                               ` Evgeny Baskakov
2017-07-25 22:45                             ` Evgeny Baskakov
2017-07-25 22:45                               ` Evgeny Baskakov
2017-07-26 19:14                               ` Jerome Glisse
2017-07-26 19:14                                 ` Jerome Glisse
2017-05-22 16:52 ` [HMM 13/15] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-22 16:52 ` [HMM 14/15] mm/migrate: support un-addressable ZONE_DEVICE page in migration v2 Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-22 16:52 ` [HMM 15/15] mm/migrate: allow migrate_vma() to alloc new page on empty entry v2 Jérôme Glisse
2017-05-22 16:52   ` Jérôme Glisse
2017-05-23 22:02 ` [HMM 00/15] HMM (Heterogeneous Memory Management) v22 Jerome Glisse
2017-05-23 22:02   ` Jerome Glisse
2017-05-23 22:05   ` Andrew Morton
2017-05-23 22:05     ` Andrew Morton
2017-05-24  1:55 ` Balbir Singh
2017-05-24  1:55   ` Balbir Singh
2017-05-24 17:53   ` Jerome Glisse
2017-05-24 17:53     ` Jerome Glisse
2017-06-01  2:04     ` Balbir Singh
2017-06-01  2:04       ` Balbir Singh
2017-06-01 22:38       ` Jerome Glisse
2017-06-01 22:38         ` Jerome Glisse
2017-06-03  9:18         ` Balbir Singh
2017-06-03  9:18           ` Balbir Singh
2017-05-24 17:20 [HMM 00/15] HMM (Heterogeneous Memory Management) v23 Jérôme Glisse
2017-05-24 17:20 ` [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4 Jérôme Glisse
2017-05-24 17:20   ` Jérôme Glisse
2017-05-31  3:59   ` Balbir Singh
2017-05-31  3:59     ` Balbir Singh
2017-06-01 22:35     ` Jerome Glisse
2017-06-01 22:35       ` Jerome Glisse
2017-06-07  9:02       ` Balbir Singh
2017-06-07  9:02         ` Balbir Singh
2017-06-07 14:06         ` Jerome Glisse
2017-06-07 14:06           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170711184919.GD5347@redhat.com \
    --to=jglisse@redhat.com \
    --cc=SCheung@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=dnellans@nvidia.com \
    --cc=ebaskakov@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhairgrove@nvidia.com \
    --cc=sgutti@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.