linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Matthew Wilcox" <willy@infradead.org>,
	"Alex Sierra" <alex.sierra@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Linux MM" <linux-mm@kvack.org>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Maling list - DRI developers" <dri-devel@lists.freedesktop.org>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Vishal Verma" <vishal.l.verma@intel.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Linux NVDIMM" <nvdimm@lists.linux.dev>
Subject: Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount
Date: Thu, 14 Oct 2021 20:06:06 -0300	[thread overview]
Message-ID: <20211014230606.GZ2744544@nvidia.com> (raw)
In-Reply-To: <CAPcyv4hBdSwdtG6Hnx9mDsRXiPMyhNH=4hDuv8JZ+U+Jj4RUWg@mail.gmail.com>

On Thu, Oct 14, 2021 at 12:01:14PM -0700, Dan Williams wrote:
> > > Does anyone know why devmap is pte_special anyhow?
> 
> It does not need to be special as mentioned here:
> 
> https://lore.kernel.org/all/CAPcyv4iFeVDVPn6uc=aKsyUvkiu3-fK-N16iJVZQ3N8oT00hWA@mail.gmail.com/

I added a remark there

Not special means more to me, it means devmap should do the refcounts
properly like normal memory pages.

It means vm_normal_page should return !NULL and it means insert_page,
not insert_pfn should be used to install them in the PTE. VMAs should
not be MIXED MAP, but normal struct page maps.

I think this change alone would fix all the refcount problems
everwhere in DAX and devmap.

> The refcount dependencies also go away after this...
> 
> https://lore.kernel.org/all/161604050866.1463742.7759521510383551055.stgit@dwillia2-desk3.amr.corp.intel.com/
>
> ...but you can see that patches 1 and 2 in that series depend on being
> able to guarantee that all mappings are invalidated when the undelying
> device that owns the pgmap goes away.

If I have put everything together right this is because of what I
pointed to here. FS-DAX is installing 0 refcount pages into PTEs and
expecting that to work sanely. 

This means the page map cannot be removed until all the PTEs are fully
flushed, which buggily doesn't happen because of the missing unplug.

However, this is all because nobody incrd a refcount to represent the
reference in the PTE and since this ment that 0 refcount pages were
wrongly stuffed into PTEs then devmap used the refcount == 1 hack to
unbreak GUP?

So.. Is there some reason why devmap pages are trying so hard to avoid
sane refcounting???

If the PTE itself holds the refcount (by not being special) then there
is no need for the pagemap stuff in GUP. pagemap already waits for
refs to go to 0 so the missing shootdown during nvdimm unplug will
cause pagemap to block until the address spaces are invalidated. IMHO
this is already better than the current buggy situation of allowing
continued PTE reference to memory that is now removed from the system.

> For that to happen there needs to be communication back to the FS for
> device-gone / failure events. That work is in progress via this
> series:
> 
> https://lore.kernel.org/all/20210924130959.2695749-1-ruansy.fnst@fujitsu.com/

This is fine, but I don't think it should block fixing the mm side -
the end result here still cannot be 0 ref count pages installed in
PTEs.

Fixing that does not depend on shootdown during device removal, right?

It requires holding refcounts while pages are installed into address
spaces - and this lack is a direct cause of making the PTEs all
special and using insert_pfn and MIXED_MAP.

Thanks,
Jason


  reply	other threads:[~2021-10-14 23:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-14 15:39 [PATCH v1 0/2] mm: remove extra ZONE_DEVICE struct page refcount Alex Sierra
2021-10-14 15:39 ` [PATCH v1 1/2] ext4/xfs: add page refcount helper Alex Sierra
2021-10-14 16:25   ` Jason Gunthorpe
2021-10-14 16:40   ` Matthew Wilcox
2021-10-14 15:39 ` [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount Alex Sierra
2021-10-14 16:52   ` Matthew Wilcox
2021-10-14 17:06   ` Jason Gunthorpe
2021-10-14 17:35     ` Ralph Campbell
2021-10-14 18:01       ` Jason Gunthorpe
2021-10-14 20:57         ` Ralph Campbell
2021-10-15  3:45           ` Sierra Guiza, Alejandro (Alex)
2021-10-15 11:06             ` Jason Gunthorpe
2021-10-14 18:43     ` Matthew Wilcox
2021-10-14 19:01       ` Dan Williams
2021-10-14 23:06         ` Jason Gunthorpe [this message]
2021-10-15  1:37           ` Dan Williams
2021-10-16 15:44             ` Jason Gunthorpe
2021-10-16 16:39               ` Matthew Wilcox
2021-10-17 18:20                 ` Dan Williams
2021-10-17 18:35               ` Dan Williams
2021-10-18 18:25                 ` Jason Gunthorpe
2021-10-18 19:37                   ` Dan Williams
2021-10-18 23:06                     ` Jason Gunthorpe
2021-10-19 15:13                       ` Joao Martins
2021-10-19 16:01                         ` Jason Gunthorpe
2021-10-19 19:21                           ` Dan Williams
2021-10-20 17:06                             ` Joao Martins
2021-10-20 17:12                               ` Dan Williams
2021-10-20 18:51                                 ` Joao Martins
2021-11-15 19:33 [PATCH v1 0/2] Remove extra ZONE_DEVICE " Alex Sierra
2021-11-15 19:33 ` [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct " Alex Sierra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211014230606.GZ2744544@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jglisse@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rcampbell@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).