linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "lizhijian@fujitsu.com" <lizhijian@fujitsu.com>
To: Jason Gunthorpe <jgg@ziepe.ca>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>
Cc: Yishai Hadas <yishaih@nvidia.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>
Subject: Re: RDMA/rpma + fsdax(ext4) was broken since 36f30e486d
Date: Fri, 27 Aug 2021 08:15:40 +0000	[thread overview]
Message-ID: <10c4bead-c778-8794-f916-80bf7ba3a56b@fujitsu.com> (raw)
In-Reply-To: <b5e6c4cd-8842-59ef-c089-2802057f3202@cn.fujitsu.com>

Hi all

I have verified that below changes can solve this problem

diff --git a/mm/hmm.c b/mm/hmm.c
index 24f9ff95f3ae..2c9a3e3eefce 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -294,7 +294,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
          * Since each architecture defines a struct page for the zero page, just
          * fall through and treat it like a normal page.
          */
-       if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+       if (!pte_devmap(pte) && pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
                 if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
                         pte_unmap(ptep);
                         return -EFAULT;


i looked over the change-log of hmm_vma_handle_pte(), and found that before
4055062 ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling")

hmm_vma_handle_pte() will not check pte_special(pte) if pte_devmap(pte) is true.

when we reached
"if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {"
the pte have already presented and its pte's flag already fulfilled the request flags.


My question is that
Per https://01.org/blogs/dave/2020/linux-consumption-x86-page-table-bits,
pte_devmap(pte) and pte_special(pte) could be both true in fsdax user case, right ?

if so, what's the expected code path for fsdax case ?


commit 4055062749229101365e5f9e87cb1c5a93e292f8
Author: Jason Gunthorpe <jgg@mellanox.com>
Date:   Thu Mar 5 14:27:20 2020 -0400

     mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling

     Currently if a special PTE is encountered hmm_range_fault() immediately
     returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
     uses).

     EFAULT should only be returned after testing with hmm_pte_need_fault().

     Also pte_devmap() and pte_special() are exclusive, and there is no need to
     check IS_ENABLED, pte_special() is stubbed out to return false on
     unsupported architectures.

     Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
     Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
     Reviewed-by: Christoph Hellwig <hch@lst.de>
     Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

diff --git a/mm/hmm.c b/mm/hmm.c
index 3a03fcf..9c82ea9 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -339,16 +339,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
                         pte_unmap(ptep);
                         return -EBUSY;
                 }
-       } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
-               if (!is_zero_pfn(pte_pfn(pte))) {
+       }
+
+       /*
+        * Since each architecture defines a struct page for the zero page, just
+        * fall through and treat it like a normal page.
+        */
+       if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+               hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+                                  &write_fault);
+               if (fault || write_fault) {
                         pte_unmap(ptep);
-                       *pfn = range->values[HMM_PFN_SPECIAL];
                         return -EFAULT;
                 }
-               /*
-                * Since each architecture defines a struct page for the zero
-                * page, just fall through and treat it like a normal page.
-                */
+               *pfn = range->values[HMM_PFN_SPECIAL];
+               return 0;
         }

         *pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;


Thanks
Zhijian



On 06/08/2021 11:20, Li, Zhijian wrote:
> Hi Jason
>
> thank you for your advice.
>
> CCing nvdimm
>
> both ext4 and xfs are impacted.
>
> Thanks
>
>
> at 2021/8/6 9:45, Jason Gunthorpe wrote:
>> On Wed, Aug 04, 2021 at 04:06:53PM +0800, Li, Zhijian/李 智坚 wrote:
>>> convert to text and send again
>>>
>>> 2021/8/4 15:55, Li, Zhijian wrote:
>>>> Hey all:
>>>>
>>>> Recently, i reported a issue to rpmahttps://github.com/pmem/rpma/issues/1142
>>>> where we found that the native rpma + fsdax example failed in recent kernel.
>>>>
>>>> Below is the bisect log
>>>>
>>>> [lizhijian@yl linux]$ git bisect log
>>>> git bisect start
>>>> # good: [bbf5c979011a099af5dc76498918ed7df445635b] Linux 5.9
>>>> git bisect good bbf5c979011a099af5dc76498918ed7df445635b
>>>> # bad: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
>>>> git bisect bad 2c85ebc57b3e1817b6ce1a6b703928e113a90442
>>>> # good: [4d0e9df5e43dba52d38b251e3b909df8fa1110be] lib, uaccess: add failure injection to usercopy functions
>>>> git bisect good 4d0e9df5e43dba52d38b251e3b909df8fa1110be
>>>> # bad: [6694875ef8045cdb1e6712ee9b68fe08763507d8] ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
>>>> git bisect bad 6694875ef8045cdb1e6712ee9b68fe08763507d8
>>>> # good: [14c914fcb515c424177bb6848cc2858ebfe717a8] Merge tag 'wireless-drivers-next-2020-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
>>>> git bisect good 14c914fcb515c424177bb6848cc2858ebfe717a8
>>>> # good: [6f78b9acf04fbf9ede7f4265e7282f9fb39d2c8c] Merge tag 'mtd/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
>>>> git bisect good 6f78b9acf04fbf9ede7f4265e7282f9fb39d2c8c
>>>> # bad: [bbe85027ce8019c73ab99ad1c2603e2dcd1afa49] Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
>>>> git bisect bad bbe85027ce8019c73ab99ad1c2603e2dcd1afa49
>>>> # bad: [9d9af1007bc08971953ae915d88dc9bb21344b53] Merge tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
>>>> git bisect bad 9d9af1007bc08971953ae915d88dc9bb21344b53
>>>> # good: [21c2fe94abb2abe894e6aabe6b4e84a255c8d339] RDMA/mthca: Combine special QP struct with mthca QP
>>>> git bisect good 21c2fe94abb2abe894e6aabe6b4e84a255c8d339
>>>> # good: [dbaa1b3d9afba3c050d365245a36616ae3f425a7] Merge branch 'perf/urgent' into perf/core
>>>> git bisect good dbaa1b3d9afba3c050d365245a36616ae3f425a7
>>>> # bad: [c7a198c700763ac89abbb166378f546aeb9afb33] RDMA/ucma: Fix use after free in destroy id flow
>>>> git bisect bad c7a198c700763ac89abbb166378f546aeb9afb33
>>>> # bad: [5ce2dced8e95e76ff7439863a118a053a7fc6f91] RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
>>>> git bisect bad 5ce2dced8e95e76ff7439863a118a053a7fc6f91
>>>> # bad: [a03bfc37d59de316436c46f5691c5a972ed57c82] RDMA/mlx5: Sync device with CPU pages upon ODP MR registration
>>>> git bisect bad a03bfc37d59de316436c46f5691c5a972ed57c82
>>>> # good: [a6f0b08dbaf289c3c57284e16ac8043140f2139b] RDMA/core: Remove ucontext->closing
>>>> git bisect good a6f0b08dbaf289c3c57284e16ac8043140f2139b
>>>> # bad: [36f30e486dce22345c2dd3a3ba439c12cd67f6ba] IB/core: Improve ODP to use hmm_range_fault()
>>>> git bisect bad 36f30e486dce22345c2dd3a3ba439c12cd67f6ba
>>>> # good: [2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e] RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
>>>> git bisect good 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e
>>>> # first bad commit: [36f30e486dce22345c2dd3a3ba439c12cd67f6ba] IB/core: Improve ODP to use hmm_range_fault()
>> This is perhaps not so surprising, but I think you should report it to
>> the dax people that hmm_range_fault and dax don't work together..
>>
>> Though I think it is supposed to, and I'm surprised it doesn't.
>>
>> Jason
>>
>>

  reply	other threads:[~2021-08-27  8:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8b2514bb-1d4b-48bb-a666-85e6804fbac0@cn.fujitsu.com>
2021-08-04  8:06 ` RDMA/rpma + fsdax(ext4) was broken since 36f30e486d Li, Zhijian/李 智坚
2021-08-06  1:45   ` Jason Gunthorpe
2021-08-06  3:20     ` Li, Zhijian/李 智坚
2021-08-27  8:15       ` lizhijian [this message]
2021-08-27 12:10         ` Jason Gunthorpe
2021-08-27 13:05           ` Li, Zhijian
2021-08-27 13:16             ` Jason Gunthorpe
2021-08-27 13:38               ` Li, Zhijian
2021-08-27 16:42             ` Dan Williams
2021-08-27 16:52               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10c4bead-c778-8794-f916-80bf7ba3a56b@fujitsu.com \
    --to=lizhijian@fujitsu.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=yangx.jy@fujitsu.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).