linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave@sr71.net>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Stephen Bates <Stephen.Bates@pmcs.com>
Subject: Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
Date: Fri, 2 Oct 2015 14:53:24 -0700	[thread overview]
Message-ID: <CAPcyv4iwJJX-rSgC0ramLrvccdzDXgnUAUMQbTMpoODo2f7kOw@mail.gmail.com> (raw)
In-Reply-To: <20151002212137.GB30448@deltatee.com>

On Fri, Oct 2, 2015 at 2:21 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hi Dan,
>
> We've been doing some experimenting and testing with this patchset.
> Specifically, we are trying to use you're ZONE_DEVICE work to enable
> peer to peer PCIe transfers. This is actually working pretty well
> (though we're still testing and working through some things).

Hmm, I didn't have peer-to-peer PCI-E in mind for this mechanism, but
the test report is welcome nonetheless.  The definition of dma_addr_t
is the device view of host memory, not necessarily the device view of
a peer device's memory range, so I expect you'll run into issues with
IOMMUs and other parts of the kernel that assume this definition.

>
> However, we've found a couple of issues:
>
> On Wed, Sep 23, 2015 at 12:42:27AM -0400, Dan Williams wrote:
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 3d6baa7d4534..20097e7b679a 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -49,12 +49,16 @@ struct page {
>>                                        * updated asynchronously */
>>       union {
>>               struct address_space *mapping;  /* If low bit clear, points to
>> -                                              * inode address_space, or NULL.
>> +                                              * inode address_space, unless
>> +                                              * the page is in ZONE_DEVICE
>> +                                              * then it points to its parent
>> +                                              * dev_pagemap, otherwise NULL.
>>                                                * If page mapped as anonymous
>>                                                * memory, low bit is set, and
>>                                                * it points to anon_vma object:
>>                                                * see PAGE_MAPPING_ANON below.
>>                                                */
>> +             struct dev_pagemap *pgmap;
>>               void *s_mem;                    /* slab first object */
>>       };
>
>
> When you add to this union and overide the mapping value, we see bugs
> in calls to set_page_dirty when it tries to dereference mapping. I believe
> a change to page_mapping is required such as the patch that's at the end of
> this email.

Yes, this location for dev_pagemap will not work.  I've since moved it
to a union with the lru list_head since ZONE_DEVICE pages memory
should always have an elevated page count and never land on a slab
allocator lru.

>> diff --git a/mm/gup.c b/mm/gup.c
>> index a798293fc648..1064e9a489a4 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -98,7 +98,16 @@ retry:
>>       }
>>
>>       page = vm_normal_page(vma, address, pte);
>> -     if (unlikely(!page)) {
>> +     if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
>> +             /*
>> +              * Only return device mapping pages in the FOLL_GET case since
>> +              * they are only valid while holding the pgmap reference.
>> +              */
>> +             if (get_dev_pagemap(pte_pfn(pte), NULL))
>> +                     page = pte_page(pte);
>> +             else
>> +                     goto no_page;
>> +     } else if (unlikely(!page)) {
>
> I've found that if a driver creates a ZONE_DEVICE mapping but doesn't
> create the pagemap (using devm_register_pagemap) then the get_user_pages code
> will go into an infinite loop. I'm not really sure if this as an issue or
> not but it seems a bit undesirable for a buggy driver to be able to cause this.
>
> My thoughts are that either devm_register_pagemap needs to be done by
> devm_memremap_pages so a driver cannot use one without the other,
> or the GUP code needs to return EFAULT if no pagemap was registered so
> it doesn't loop forever.

Exactly, we should fail (-EFAULT)  get_user_pages() in that case since
we don't have a mechanism to pin down the mapping.  I'll track down
what's causing the loop.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-10-02 21:53 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23  4:41 [PATCH 00/15] get_user_pages() for dax mappings Dan Williams
2015-09-23  4:41 ` [PATCH 01/15] avr32: convert to asm-generic/memory_model.h Dan Williams
2015-09-24 15:10   ` Christoph Hellwig
2015-09-26  0:36     ` Dan Williams
2015-09-26 20:10       ` Christoph Hellwig
2015-09-28 18:44         ` Luck, Tony
2015-09-23  4:41 ` [PATCH 02/15] hugetlb: fix compile error on tile Dan Williams
2015-09-23  4:41 ` [PATCH 03/15] frv: fix compiler warning from definition of __pmd() Dan Williams
2015-09-23  4:41 ` [PATCH 04/15] x86, mm: quiet arch_add_memory() Dan Williams
2015-09-24 15:10   ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 05/15] pmem: kill memremap_pmem() Dan Williams
2015-09-24 15:11   ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 06/15] devm_memunmap: use devres_release() Dan Williams
2015-09-24 15:13   ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 07/15] devm_memremap: convert to return ERR_PTR Dan Williams
2015-09-24 15:13   ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 08/15] block, dax, pmem: reference counting infrastructure Dan Williams
2015-09-24 15:15   ` Christoph Hellwig
2015-09-25  0:03     ` Dan Williams
2015-09-25 11:32       ` Christoph Hellwig
2015-09-25 21:08         ` Williams, Dan J
2015-09-23  4:42 ` [PATCH 09/15] block, pmem: fix null pointer de-reference on shutdown, check for queue death Dan Williams
2015-09-23  4:42 ` [PATCH 10/15] block, dax: fix lifetime of in-kernel dax mappings Dan Williams
2015-10-07 22:56   ` Logan Gunthorpe
2015-10-09 21:12     ` Dan Williams
2015-09-23  4:42 ` [PATCH 11/15] mm, dax, pmem: introduce __pfn_t Dan Williams
2015-09-23 16:02   ` Dave Hansen
2015-09-23 23:36     ` Williams, Dan J
2015-09-23  4:42 ` [PATCH 12/15] mm, dax, gpu: convert vm_insert_mixed to __pfn_t, introduce _PAGE_DEVMAP Dan Williams
2015-09-23 13:47   ` Geert Uytterhoeven
2015-09-23 16:59     ` Dan Williams
2015-09-23  4:42 ` [PATCH 13/15] mm, dax: convert vmf_insert_pfn_pmd() to __pfn_t Dan Williams
2015-09-23  4:42 ` [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup Dan Williams
2015-10-02 21:21   ` Logan Gunthorpe
2015-10-02 21:53     ` Dan Williams [this message]
2015-10-02 22:14       ` Logan Gunthorpe
2015-10-02 22:42       ` Logan Gunthorpe
2015-10-02 22:55         ` Dan Williams
2015-09-23  4:42 ` [PATCH 15/15] mm, x86: get_user_pages() for dax mappings Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4iwJJX-rSgC0ramLrvccdzDXgnUAUMQbTMpoODo2f7kOw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Stephen.Bates@pmcs.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@sr71.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=logang@deltatee.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).