Re: [Lsf-pc] [LSF/MM/BPF TOPIC] FS Maintainers Don't Scale

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Allison Collins <allison.henderson@oracle.com>,
	Amir Goldstein <amir73il@gmail.com>,
	lsf-pc@lists.linux-foundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	xfs <linux-xfs@vger.kernel.org>, Eryu Guan <guaneryu@gmail.com>,
	Eric Sandeen <sandeen@redhat.com>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] FS Maintainers Don't Scale
Date: Wed, 12 Feb 2020 13:36:01 -0800	[thread overview]
Message-ID: <20200212213601.GR6870@magnolia> (raw)
In-Reply-To: <20200202214620.GA20628@dread.disaster.area>

On Mon, Feb 03, 2020 at 08:46:20AM +1100, Dave Chinner wrote:
> On Fri, Jan 31, 2020 at 08:20:37PM -0700, Allison Collins wrote:
> > 
> > 
> > On 1/31/20 12:30 AM, Amir Goldstein wrote:
> > > On Fri, Jan 31, 2020 at 7:25 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > > 
> > > > Hi everyone,
> > > > 
> > > > I would like to discuss how to improve the process of shepherding code
> > > > into the kernel to make it more enjoyable for maintainers, reviewers,
> > > > and code authors.  Here is a brief summary of how we got here:
> > > > 
> > > > Years ago, XFS had one maintainer tending to all four key git repos
> > > > (kernel, userspace, documentation, testing).  Like most subsystems, the
> > > > maintainer did a lot of review and porting code between the kernel and
> > > > userspace, though with help from others.
> > > > 
> > > > It turns out that this didn't scale very well, so we split the
> > > > responsibilities into three maintainers.  Like most subsystems, the
> > > > maintainers still did a lot of review and porting work, though with help
> > > > from others.
> > > > 
> > > > It turns out that this system doesn't scale very well either.  Even with
> > > > three maintainers sharing access to the git trees and working together
> > > > to get reviews done, mailing list traffic has been trending upwards for
> > > > years, and we still can't keep up.  I fear that many maintainers are
> > > > burning out.  For XFS, the biggest pain point (AFAICT) is not assembly and
> > > > testing of the git trees, but keeping up with the mail and the reviews.
> > > > 
> > > > So what do we do about this?  I think we (the XFS project, anyway)
> > > > should increase the amount of organizing in our review process.  For
> > > > large patchsets, I would like to improve informal communication about
> > > > who the author might like to have conduct a review, who might be
> > > > interested in conducting a review, estimates of how much time a reviewer
> > > > has to spend on a patchset, and of course, feedback about how it went.
> > > > This of course is to lay the groundwork for making a case to our bosses
> > > > for growing our community, allocating time for reviews and for growing
> > > > our skills as reviewers.
> > > > 
> > > 
> > > Interesting.
> > > 
> > > Eryu usually posts a weekly status of xfstests review queue, often with
> > > a call for reviewers, sometimes with specific patch series mentioned.
> > > That helps me as a developer to monitor the status of my own work
> > > and it helps me as a reviewer to put the efforts where the maintainer
> > > needs me the most.

FWIW, I wasn't aware that he did that.

> > > For xfs kernel patches, I can represent the voice of "new blood".
> > > Getting new people to join the review effort is quite a hard barrier.
> > > I have taken a few stabs at doing review for xfs patch series over the
> > > year, but it mostly ends up feeling like it helped me (get to know xfs code
> > > better) more than it helped the maintainer, because the chances of a
> > > new reviewer to catch meaningful bugs are very low and if another reviewer
> > > is going to go over the same patch series, the chances of new reviewer to
> > > catch bugs that novice reviewer will not catch are extremely low.
> > That sounds like a familiar experience.  Lots of times I'll start a review,
> > but then someone else will finish it before I do, and catch more things
> > along the way.  So I sort of feel like if it's not something I can get
> > through quickly, then it's not a very good distribution of work effort and I
> > should shift to something else. Most of the time, I'll study it until I feel
> > like I understand what the person is trying to do, and I might catch stuff
> > that appears like it may not align with that pursuit, but I don't
> > necessarily feel I can deem it void of all unforeseen bugs.
> 
> I think you are both underselling yourselves. Imposter syndrome and
> all that jazz.
> 
> The reality is that we don't need more people doing the sorts of
> "how does this work with the rest of XFS" reviews that people like
> Darricki or Christoph do.

I 90% agree with that, except for the complexity that is the logging
system and how the cil & ail interact with each other.  I /really/
appreciate Brian digging into those kinds of things. :)

> What we really need is more people looking
> at whether loops are correctly terminated, the right variable types
> are used, we don't have signed vs unsigned issues, 32 bit overflows,
> use the right 32/64 bit division functions, the error handling logic
> is correct, etc.

Yeah, someone running sparse/smatch regularly would also help.  Granted,
Dan Carpenter does a good job of staying on top of that, but still...

> It's those sorts of little details that lead to most bugs, and
> that's precisely the sort of thing that is typically missed by an
> experienced developer doing a "is this the best possible
> implemenation of this functionality" review.

I would also add (to state this explicitly) that it was easier for me
earlier in my xfs career to spot certain kinds of bugs because I had to
figure out what the code was doing on my own and did not have to
compensate for the mental shortcut of "fmeh, it worked last time I went
here".

> A recent personal example: look at the review of Matthew Wilcox's
> ->readahead() series that I recently did. I noticed problems in the
> core change and the erofs and btfrs implementations not because I
> knew anything about those filesystems, but because I was checking
> whether the new loops iterated the pages in the page cache
> correctly. i.e. all I was really looking at was variable counting
> and loop initialisation and termination conditions. Experience tells
> me this stuff is notoriously difficult to get right, so that's what
> I looked at....
> 
> IOWs, you don't need to know anything about the subsystem to
> perform such a useful review, and a lot of the time you won't find a
> problem.

The other thing that would be helpful is reviewing code submission
w.r.t. test cases.  Did the author submit a testcase to go with the
patch?  Does the code patch instead reference finding the bug by
studying failures of specific fstests?  (Or LTP, or blktests, or...)

> But it's still a very useful review to perform, and in
> doing so you've validated, to the best of your ability, that the
> change is sound. Put simply:
> 
> 	"I've checked <all these things> and it looks good to me.
> 
> 	Reviewed-by: Joe Bloggs <joe@blogg.com>"
> 
> This is a very useful, valid review, regardless of whether you find
> anything. It's also a method of review that you can use when you
> have limited time - rather than trying to check everything and
> spending hours on a pathset, pick one thing and get the entire
> review done in 15 minutes. Then do the same thing for the next patch
> set. You'll be surprised how many things you notice that aren't what
> you are looking for when you do this.

> Hence the fact that other people find (different) issues is
> irrelevant - they'll be looking at different things to you, and
> there may not even be any overlap in the focus/scope of the reviews
> that have been performed. You may find the same things, but that is
> also not a bad thing - I intentionally don't read other reviews
> before I review a patch series, so that I don't taint my view of the
> code before I look at it (e.g., darrick found a bug in this code, so
> I don't need to look at it...).
> 
> IOWs, if you are starting from the premise that "I don't know this
> code well enough to perform a useful review" then you are setting

TBH I've wondered if this goes hand in hand with "I fear/figure/guess
the maintainer will probably review this and prove better at spotting
the bugs than me" and "Oh gosh what if the maintainer pushes this into
for-next before anyone even realizes I was reading this"?

This is why I want to push for /slightly/ more explicit coordination of
authors and reviewers for patch sets.  Let's say that Pavel sends a
patchset to the list.  If (for example) Allison immediately emails a
reply to the cover letter saying that she'll look at test coverage and
Bill sends his own email saying that he'll check the error paths, then I
will know expect further discussion (or RVBs) between the three of them,
and will wait for the reviews to wrap up before pushing to -next.

Obviously if this is some nasty bug that needs fixing quickly then cc me
to get my attention and I'll flag myself as conducting the review.

> yourself up for failure right at the start. Read the series
> description, think about the change being made, use your experience
> to answer the question "what's a mistake I could make performing
> this change". Then go looking for that mistake through the
> patch(es). In the process of performing this review, more than
> likely, you'll notice bugs other than what you are actually looking
> for...
> 
> This does not require any deep subsystem specific knowledge, but in
> doing this sort of review you're going to notice things and learn
> about the code and slowly build your knowledge and experience about
> that subsystem.

(Agreed, most of my familiarity from xfs comes either from reading
patches coming in or examining previously submitted patchsets, even if
it's been a while since the last submission.)

> > > However, there are quite a few cleanup and refactoring patch series,
> > > especially on the xfs list, where a review from an "outsider" could still
> > > be of value to the xfs community. OTOH, for xfs maintainer, those are
> > > the easy patches to review, so is there a gain in offloading those reviews?
> > > 
> > > Bottom line - a report of the subsystem review queue status, call for
> > > reviewers and highlighting specific areas in need of review is a good idea.
> > > Developers responding to that report publicly with availability for review,
> > > intention and expected time frame for taking on a review would be helpful
> > > for both maintainers and potential reviewers.
> > I definitely think that would help delegate review efforts a little more.
> > That way it's clear what people are working on, and what still needs
> > attention.
> 
> It is not the maintainer's repsonsibility to gather reviewers. That

No, but I'm conflating the two here because I'm engaged in all three
modes. :)

> is entirely the responsibility of the patch submitter. That is, if
> the code has gone unreviewed, it is up to the submitter to find
> people to review the code, not the maintainer. If you, as a
> developer, are unable to find people willing to review your code
> then it's a sign you haven't been reviewing enough code yourself.

I don't entirely agree with this statement.  I have loads of patches
waiting for review, but I also don't feel like I'm not reviewing enough
code.  I swear I'm not trying to guilt trip anyone by that statement. :)

Recently I was talking to Eric and he pointed out that certain things
like the NYE mass patchbombings aren't so helpful because 400 patches is
a lot to absorb.  I dump those patchbombs (both to linux-xfs and to
kernel.org) to try to keep my own bus factor as high as possible.  I
think I should shift to pushing the git trees to kernel.org and /only/
sending the cover letters with links, at least when I'm in "distributed
backup mode".

For everything else, maybe I should be more explicit about identifying
that which I'd like to get reviewed for the next cycle, e.g. "I would
like to try to get the btree bulk load code pushed for 5.7."  I think
some of the kernel subprojects actually do something like that, but it
doesn't seem like this is universally acknowledged as a strategy.

> Good reviewers are a valuable resource - as a developer I rely on
> reviewers to get my code merged, so if I don't review code and
> everyone else behaves the same way, how can I possibly get my code
> merged? IOWs, review is something every developer should be spending
> a significant chunk of their time on. IMO, if you are not spending
> *at least* a whole day a week reviewing code, you're not actually
> doing enough code review to allow other developers to be as
> productive as you are.

This kinda gets to my next topic, which is how do we make the business
case for more people to jump in and review code?  AFAICT people who work
for the longtime Usual Suspects (RH, IBM, SUSE, Oracle, etc.) are /very/
fortunate that their free software offices have enough sway that they
don't need to push very hard for paid review time, but I have little
clue if that's true for the more recent players.  If you work for a
company that doesn't let you review time as part of your workday, I
would like to know that.

> The more you review other people's code, the more you learn about
> the code and the more likely other people will be to review your
> code because they know you'll review their code in turn.  It's a
> positive reinforcement cycle that benefits both the individual
> developers personally and the wider community.
> 
> But this positive reinforcemnt cycle just doesn't happen if people
> avoid reviewing code because they think "I don't know anything so my
> review is not going to be worth anything".  Constructive review, not
> matter whether it's performed at a simple or complex level, is
> always valuable.

Also sometimes /this/ reviewer is running in stupid/harried/burnout mode
and misses obvious things. :(

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com