linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: willy@infradead.org, michael.christie@oracle.com,
	surenb@google.com, npiggin@gmail.com, corbet@lwn.net,
	mathieu.desnoyers@efficios.com, avagin@gmail.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, brauner@kernel.org,
	peterz@infradead.org
Subject: Re: [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup()
Date: Wed, 16 Aug 2023 14:30:29 -0400	[thread overview]
Message-ID: <20230816183029.5rpkbgp2umebrjh5@revolver> (raw)
In-Reply-To: <3f4e73cc-1a98-95a8-9ab2-47797d236585@bytedance.com>

* Peng Zhang <zhangpeng.00@bytedance.com> [230816 09:42]:
> 
> 
...

> > > > > +/**
> > > > > + * __mt_dup(): Duplicate a maple tree
> > > > > + * @mt: The source maple tree
> > > > > + * @new: The new maple tree
> > > > > + * @gfp: The GFP_FLAGS to use for allocations
> > > > > + *
> > > > > + * This function duplicates a maple tree using a faster method than traversing
> > > > > + * the source tree and inserting entries into the new tree one by one. The user
> > > > > + * needs to lock the source tree manually. Before calling this function, @new
> > > > > + * must be an empty tree or an uninitialized tree. If @mt uses an external lock,
> > > > > + * we may also need to manually set @new's external lock using
> > > > > + * mt_set_external_lock().
> > > > > + *
> > > > > + * Return: 0 on success, -ENOMEM if memory could not be allocated.
> > > > > + */
> > > > > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp)
> > > > 
> > > > We use mas_ for things that won't handle the locking and pass in a maple
> > > > state.  Considering the leaves need to be altered once this is returned,
> > > > I would expect passing in a maple state should be feasible?
> > > But we don't really need mas here. What do you think the state of mas
> > > should be when this function returns? Make it point to the first entry,
> > > or the last entry?
> > 
> > I would write it to point to the first element so that the call to
> > replace the first element can just do that without an extra walk and
> > document the maple state end point.
> Unfortunately, this does not seem to be convenient. Users usually use
> mas_for_each() to replace elements. If we set mas to the first element,
> the first call to mas_find() in mas_for_each() will get the next
> element.

This sounds like the need for another iterator specifically for
duplicating.

> 
> There may also be other scenarios where the user does not necessarily
> have to replace every element.

Do you mean a limit or elements that need to be skipped?  We could have
a limit on the iteration.

> 
> Finally, getting the first element in __mt_dup() requires an additional
> check to check whether the first element has already been recorded. Such
> a check will be performed at each leaf node, which is unnecessary
> overhead.
> 
> Of course, the first reason is the main reason, which prevents us from
> using mas_for_each(). So I don't want to record the first element.


I don't like the interface because it can easily be misunderstood and
used incorrectly.  I don't know how to make a cleaner interface, but
I've gone through a few thoughts:

The first was hide _all of it_ in a new iterator:
mas_dup_each(old, new, old_entry) {
	if (don't_dup(old_entry)) {
		mas_erase(new);
		continue;
	}

	mas_dup_insert(new, new_entry);
}

This iterator would check if mas_is_start(old) and dup the tree in that
event.  Leave the both new trees pointing to the first element and set
old_entry.  I don't know how to handle the failure in duplicating the
tree in this case - I guess we could return old_entry = NULL and check
if mas_is_err(old) after the loop.  Do you see a problem with this?


The second idea was an init of the old tree.  This is closest to what you
have:

if (mas_dup_init(old, new))
	goto -ENOMEM;

mas_dup_each(old, new) {
	if (don't_dup(old_entry)) {
		mas_erase(new);
		continue;
	}

	mas_dup_insert(new, new_entry);
}

This would duplicate the tree at the start and leave both pointing at
the first element so that mas_dup_each() could start on that element.
Each subsequent call would go to the next element in both maple states.
It sounds like you don't want this for performance reasons?  Although
looking at mas_find() today, I think this could still work since we are
checking the maple state for a lot.

Both ideas could be even faster than what you have if we handle the
special cases of mas_is_none()/mas_is_ptr() in a smarter way because we
don't need to be as worried about the entry point of the maple state as
much as we do with mas_find()/mas_for_each().  I mean, is it possible to
get to a mas_is_none() or mas_is_ptr() on duplicating a tree?  How do we
handle these users?

Both ideas still suffer from someone saying "Gee, that {insert function
name here} is used in the forking code, so I can totally use that in my
code because that's how it work!"  and find out it works for the limited
testing they do.  Then it fails later and the emails start flying.


I almost think we should do something like this on insert:

void mas_dup_insert(old, new, new_entry) {
	WARN_ON_ONCE(old == new);
	WARN_ON_ONCE(old->index != new->index);
	WARN_ON_ONCE(old->last != new->last);
	...
}

This would at least _require_ someone to have two maple states and
hopefully think twice on using it where it should not be used.

The bottom line is that this code is close to what we need to make
forking better, but I fear the misuse of the interface.

Something else to think about:
In the work items for the Maple Tree, there is a plan to have an enum to
specify the type of write that is going to happen.  The idea was for
mas_preallocate() to set this type of write so we can just go right to
the correct function.  We could use that here and set the maple state
write type to a direct replacement.

Thanks,
Liam

  reply	other threads:[~2023-08-16 18:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26  8:09 [PATCH 00/11] Introduce mt_dup() to improve the performance of fork() Peng Zhang
2023-07-26  8:09 ` [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}() Peng Zhang
2023-07-26 14:58   ` Liam R. Howlett
2023-07-31  9:52     ` Peng Zhang
2023-07-31 16:08       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 02/11] maple_tree: Validate MAPLE_ENODE and ma_nonleaf_data_end() Peng Zhang
2023-07-26  8:09 ` [PATCH 03/11] maple_tree: Add some helper functions Peng Zhang
2023-07-26 15:02   ` Liam R. Howlett
2023-07-26 15:08     ` Matthew Wilcox
2023-07-31 11:45       ` Peng Zhang
2023-08-11 17:28         ` Liam R. Howlett
2023-07-31 11:40     ` Peng Zhang
2023-07-26  8:09 ` [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() Peng Zhang
2023-07-26 16:03   ` Liam R. Howlett
2023-07-31 12:24     ` Peng Zhang
2023-07-31 16:27       ` Liam R. Howlett
2023-08-16 13:41         ` Peng Zhang
2023-08-16 18:30           ` Liam R. Howlett [this message]
2023-08-18 11:53             ` Peng Zhang
2023-08-18 16:13               ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 05/11] maple_tree: Add test for mt_dup() Peng Zhang
2023-07-26 16:06   ` Liam R. Howlett
2023-07-31 12:32     ` Peng Zhang
2023-07-31 16:41       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 06/11] maple_tree: Introduce mas_replace_entry() to directly replace an entry Peng Zhang
2023-07-26 16:08   ` Liam R. Howlett
2023-07-31 12:39     ` Peng Zhang
2023-07-31 16:48       ` Liam R. Howlett
2023-08-16 13:11         ` Peng Zhang
2023-08-16 17:40           ` Liam R. Howlett
2023-08-18  9:39             ` Peng Zhang
2023-08-18 16:15               ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 07/11] maple_tree: Update the documentation of maple tree Peng Zhang
2023-07-26  8:09 ` [PATCH 08/11] maple_tree: Skip other tests when BENCH is enabled Peng Zhang
2023-07-26  8:09 ` [PATCH 09/11] maple_tree: Update check_forking() and bench_forking() Peng Zhang
2023-07-26  8:09 ` [PATCH 10/11] MAINTAINERS: Add co-maintainer for maple tree Peng Zhang
2023-07-26 16:39   ` Liam R. Howlett
2023-07-31 12:55     ` Peng Zhang
2023-07-31 20:55       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 11/11] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Peng Zhang
2023-07-26 17:06   ` Liam R. Howlett
2023-07-31 12:59     ` Peng Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230816183029.5rpkbgp2umebrjh5@revolver \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).