linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org,
	brauner@kernel.org, surenb@google.com,
	michael.christie@oracle.com, mjguzik@gmail.com,
	mathieu.desnoyers@efficios.com, npiggin@gmail.com,
	peterz@infradead.org, oliver.sang@intel.com, mst@redhat.com,
	maple-tree@lists.infradead.org, linux-mm@kvack.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v4 10/10] fork: Use __mt_dup() to duplicate maple tree in dup_mmap()
Date: Wed, 11 Oct 2023 10:59:50 -0400	[thread overview]
Message-ID: <20231011145950.6ypjrfgkngukbjyr@revolver> (raw)
In-Reply-To: <9eb93423-a2ee-4b9c-be8c-108915eb7e0f@bytedance.com>

* Peng Zhang <zhangpeng.00@bytedance.com> [231011 03:00]:
> 
> 
> 在 2023/10/11 09:28, Liam R. Howlett 写道:
...
> > 
> > > +	unmap_region(mm, &vmi.mas, vma, NULL, NULL, 0, tree_end, tree_end, true);
> > > +
> > 
> > I really don't like having to modify unmap_region() and free_pgtables()
> > for a rare error case.  Looking into the issue, you are correct in the
> > rounding that is happening in free_pgd_range() and this alignment to
> > avoid "unnecessary work" is causing us issues.  However, if we open code
> > it a lot like what exit_mmap() does, we can avoid changing these
> > functions:
> > 
> > +       lru_add_drain();
> > +       tlb_gather_mmu(&tlb, mm);
> > +       update_hiwater_rss(mm);
> > +       unmap_vmas(&tlb, &vmi.mas, vma, 0, tree_end, tree_end, true);
> > +       vma_iter_set(&vmi, vma->vm_end);
> > +       free_pgtables(&tlb, &vmi.mas, vma, FIRST_USER_ADDRESS, vma_end->vm_start,
> > +                     true);
> > +       free_pgd_range(&tlb, vma->vm_start, vma_end->vm_start,
> > +                      FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> I think both approaches are valid. If you feel that this method is better,
> I can make the necessary changes accordingly. However, take a look at the
> following code:
> 
> if (is_vm_hugetlb_page(vma)) {
> 	hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
> 		floor, next ? next->vm_start : ceiling);
> }
> 
> In free_pgtables(), there is also a possibility of using
> hugetlb_free_pgd_range() to free the page tables. By adding an
> additional call to free_pgd_range() instead of hugetlb_free_pgd_range(),
> I'm not sure if it would cause any potential issues.

Okay.  It is safe for the general case, but I've no idea about powerpc
and other variants.  After looking at the ppc stuff, I don't think it's
safe (for our sanity) to proceed with my plan.

I think we go back to your v2 attempt at this and store XA_ZERO, then
modify unmap_vmas(), free_pgtables(), and the (already done in v2) exit
path loop.  Then we just let the normal failure path be taken in
exit_mmap().  Sorry for going back on this, but there's no tidy way to
proceed.


From your v2 [1]:
+			if (unlikely(mas_is_err(&vmi.mas))) {
+				retval = xa_err(vmi.mas.node);
+				mas_reset(&vmi.mas);
+				if (mas_find(&vmi.mas, ULONG_MAX))
+					mas_store(&vmi.mas, XA_ZERO_ENTRY);
+				goto loop_out;
+			}

You can do this instead:
+			if (unlikely(mas_is_err(&vmi.mas))) {
+				retval = xa_err(vmi.mas.node);
+				mas_set_range(&vim.mas, mntp->vm_start,
mntp->vm_end -1);
+				mas_store(&vmi.mas, XA_ZERO_ENTRY);
+				goto loop_out;
+			}

We'll have to be careful that the first VMA isn't XA_ZERO in the two
functions as well, but I think it will be better than having 7 arguments
to the free_pgtables() with the last two being the same for all but one
case, and/or our own clean up for exit.  Even with a wrapping function,
this is too messy.

[1]. https://lore.kernel.org/lkml/20230830125654.21257-7-zhangpeng.00@bytedance.com/

Thanks,
Liam

      reply	other threads:[~2023-10-11 15:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09  9:03 [PATCH v4 00/10] Introduce __mt_dup() to improve the performance of fork() Peng Zhang
2023-10-09  9:03 ` [PATCH v4 01/10] maple_tree: Add mt_free_one() and mt_attr() helpers Peng Zhang
2023-10-09  9:03 ` [PATCH v4 02/10] maple_tree: Introduce {mtree,mas}_lock_nested() Peng Zhang
2023-10-09  9:03 ` [PATCH v4 03/10] maple_tree: Introduce interfaces __mt_dup() and mtree_dup() Peng Zhang
2023-10-09  9:03 ` [PATCH v4 04/10] radix tree test suite: Align kmem_cache_alloc_bulk() with kernel behavior Peng Zhang
2023-10-09  9:03 ` [PATCH v4 05/10] maple_tree: Add test for mtree_dup() Peng Zhang
2023-10-09  9:03 ` [PATCH v4 06/10] maple_tree: Update the documentation of maple tree Peng Zhang
2023-10-09  9:03 ` [PATCH v4 07/10] maple_tree: Skip other tests when BENCH is enabled Peng Zhang
2023-10-09  9:03 ` [PATCH v4 08/10] maple_tree: Update check_forking() and bench_forking() Peng Zhang
2023-10-09  9:03 ` [PATCH v4 09/10] maple_tree: Preserve the tree attributes when destroying maple tree Peng Zhang
2023-10-11 15:42   ` Peng Zhang
2023-10-11 15:48     ` Liam R. Howlett
2023-10-09  9:03 ` [PATCH v4 10/10] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Peng Zhang
2023-10-09 11:06   ` Peng Zhang
2023-10-11  1:28   ` Liam R. Howlett
2023-10-11  7:00     ` Peng Zhang
2023-10-11 14:59       ` Liam R. Howlett [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231011145950.6ypjrfgkngukbjyr@revolver \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).