All of lore.kernel.org
 help / color / mirror / Atom feed
* [5.19 cycle] Planning and goals
@ 2022-04-05  2:03 Dave Chinner
  2022-04-07  3:11 ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-04-05  2:03 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

I'd really like to try getting the merge bottlenecks we've had
recently unstuck, so there are a few patchsets I want to try to get
reviewed, tested and merged for 5.19. Hopefully not too many
surprises will get in the way and so some planning to try to
minimises surprised might be a good thing.  Hence I want to have a
rough plan for the work I'd like to acheive during this 5.19 cycle,
and so that everyone has an idea of what needs to be done to (maybe)
achieve those goals over the next few weeks.

First of all, a rough timeline that I'm working with:

5.18-rc1:	where we are now
5.18-rc2:	Update linux-xfs master branch to 5.19-rc2
5.18-rc4:	At least 2 of the major pending works merged
5.18-rc6:	Last point for new work to be merged
5.18-rc6+:	Bug fixes only will be merged 

I'm assuming a -rc7 kernel will be released, hence this rough
timeline gives us 2 weeks of testing/stabilisation time before 5.19
merge window opens. 

Patchsets for review should be based on either 5.17.0 or the
linux-xfs master branch once it has been updated to 5.19-rc2. If
there are important bug fixes for the 5.18 cycle, I may move the
master branch forwards to a more recent release.

In terms of merge process, I plan to keep each major set of work in
a separate topic branch so that once it has been merged the commit
IDs remain stable. I will then merge the topic branches into the
for-next branch. Hence the for-next tree may still rebase (e.g. if I
need to send fixes for 5.18-rcX), but I hope to keep the individual
commits that make up the for-next branch as stable as possible. Bug
fixes for patchsets will get appended to the topic branches and the
for-next branch rebuilt via a new set of merges.

The major patchsets that I'm hoping to get reviewed and merged this
cycle:

- large extent counts V8 (Chandan)
	- Merge criteria and status:
		- review complete: 95%
		- no regressions when not enabled: 70%
		- no major regressions when enabled: 0%
	- Open questions:
		- Experimental tag for the first couple of cycles?
		  (Darrick says "YES" on #xfs)
	- Needs more QA, but signs are good so far.
	- Almost ready to merge.

- Logged attributes V28 (Allison)
	- I haven't looked at this since V24, so I'm not sure what
	  the current status is. I will do that discovery later in
	  the week.
	- Merge criteria and status:
		- review complete: Not sure
		- no regressions when not enabled: v24 was OK
		- no major regressions when enabled: v24 had issues
	- Open questions:
		- not sure what review will uncover
		- don't know what problems testing will show
		- what other log fixes does it depend on?
		- is there a performance impact when not enabled?

- DAX + reflink V11 (Ruan)
	- Merge criteria and status:
		- review complete: 75%
		- no regressions when not enabled: unknown
		- no major regressions when enabled: unknown
	- Open questions:
		- still a little bit of change around change
		  notification?
		- Not sure when V12 will arrive, hence can't really
		  plan for this right now.
		- external dependencies?

- xlog_write() rework V8
	- Merge criteria and status:
		- review complete: 100%
		- No regressions in testing: 100%
	- Open questions:
		- unchanged since last review/merge attempt,
		  reverted because of problems with other code that
		  was merged with it that isn't in this patchset
		  now. Does it need re-reviewing?
	- Ready to merge.

- Intent Whiteouts V3
	- Merge criteria and status:
		- review complete: 0%
		- No regressions in testing: 100%
	- Open questions:
		- will it get reviewed in time?
		- what bits of the patchset does LARP depend on?
		- Is LARP perf without intent whiteouts acceptible
		  (Experimental tag tends to suggest yes).
	- Functionally complete and tested, just needs review.

Have I missed any of the major outstanding things that are nearly
ready to go?

Do the patchset authors have the time available in the next 2-3
weeks to make enough progress to get their work merged? I'd kinda
like to have the xlog_write() rework and the large extent counts
merged ASAP so we have plenty of time to focus on the more
complex/difficult pieces.  If you don't have time in the next few
weeks, then let me know so I can adjust the plan appropriately for
the cycle.

What does everyone think of the plan?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-05  2:03 [5.19 cycle] Planning and goals Dave Chinner
@ 2022-04-07  3:11 ` Darrick J. Wong
  2022-04-07  5:49   ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-04-07  3:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> I'd really like to try getting the merge bottlenecks we've had
> recently unstuck, so there are a few patchsets I want to try to get
> reviewed, tested and merged for 5.19. Hopefully not too many
> surprises will get in the way and so some planning to try to
> minimises surprised might be a good thing.  Hence I want to have a
> rough plan for the work I'd like to acheive during this 5.19 cycle,
> and so that everyone has an idea of what needs to be done to (maybe)
> achieve those goals over the next few weeks.
> 
> First of all, a rough timeline that I'm working with:
> 
> 5.18-rc1:	where we are now
> 5.18-rc2:	Update linux-xfs master branch to 5.19-rc2

Presumably you meant 5.18-rc2 here?

> 5.18-rc4:	At least 2 of the major pending works merged
> 5.18-rc6:	Last point for new work to be merged
> 5.18-rc6+:	Bug fixes only will be merged 
> 
> I'm assuming a -rc7 kernel will be released, hence this rough
> timeline gives us 2 weeks of testing/stabilisation time before 5.19
> merge window opens. 
> 
> Patchsets for review should be based on either 5.17.0 or the
> linux-xfs master branch once it has been updated to 5.19-rc2. If

...and here?

> there are important bug fixes for the 5.18 cycle, I may move the
> master branch forwards to a more recent release.
>
> In terms of merge process, I plan to keep each major set of work in
> a separate topic branch so that once it has been merged the commit
> IDs remain stable. I will then merge the topic branches into the
> for-next branch. Hence the for-next tree may still rebase (e.g. if I
> need to send fixes for 5.18-rcX), but I hope to keep the individual
> commits that make up the for-next branch as stable as possible. Bug
> fixes for patchsets will get appended to the topic branches and the
> for-next branch rebuilt via a new set of merges.

Hmm, that's a way to do it that I hadn't previously considered.

> The major patchsets that I'm hoping to get reviewed and merged this
> cycle:
> 
> - large extent counts V8 (Chandan)
> 	- Merge criteria and status:
> 		- review complete: 95%
> 		- no regressions when not enabled: 70%
> 		- no major regressions when enabled: 0%
> 	- Open questions:
> 		- Experimental tag for the first couple of cycles?
> 		  (Darrick says "YES" on #xfs)
> 	- Needs more QA, but signs are good so far.
> 	- Almost ready to merge.

I think this one is mostly ready to go, with the few nits fixed that you
and I have already posted about.

> - Logged attributes V28 (Allison)
> 	- I haven't looked at this since V24, so I'm not sure what
> 	  the current status is. I will do that discovery later in
> 	  the week.
> 	- Merge criteria and status:
> 		- review complete: Not sure
> 		- no regressions when not enabled: v24 was OK
> 		- no major regressions when enabled: v24 had issues
> 	- Open questions:
> 		- not sure what review will uncover
> 		- don't know what problems testing will show
> 		- what other log fixes does it depend on?
> 		- is there a performance impact when not enabled?

Hm.  Last time I went through this I was mostly satisfied except for (a)
all of the subtle rules about who owns and frees the attr name/value
buffers, and (b) all that stuff with the alignment/sizing asserts
tripping on fsstress loop tests.

I /think/ Allison's fixed (a), and I think Dave had a patch or two for
(b)?

Oh one more thing:

ISTR one of the problems is that the VFS allocates an onstack
buffer for the xattr name.  The buffer is char[], so the start of it
isn't necessarily aligned to what the logging code wants; and the end of
it (since it's 255 bytes long) almost assuredly isn't.

> - DAX + reflink V11 (Ruan)
> 	- Merge criteria and status:
> 		- review complete: 75%
> 		- no regressions when not enabled: unknown
> 		- no major regressions when enabled: unknown
> 	- Open questions:
> 		- still a little bit of change around change
> 		  notification?
> 		- Not sure when V12 will arrive, hence can't really
> 		  plan for this right now.
> 		- external dependencies?

I thought the XFS part of this patchset looked like it was in good
enough shape to merge, but the actual infrastructure stuff (AKA messing
with mm/ and dax code) hasn't gotten a review.  I don't really have the
depth to know if the changes proposed are good or bad.

> - xlog_write() rework V8
> 	- Merge criteria and status:
> 		- review complete: 100%
> 		- No regressions in testing: 100%
> 	- Open questions:
> 		- unchanged since last review/merge attempt,
> 		  reverted because of problems with other code that
> 		  was merged with it that isn't in this patchset
> 		  now. Does it need re-reviewing?

I suggest you rebase to something recent (5.17.0 + xfs-5.18-merge-4?)
and send it to the list for a quick once-over before merging that.
IIRC I understood it well enough to have been ok with putting it in.

That said, if you push a branch somewhere I'll give it a spin on my
testfrastructure to see if anything else falls off.

> 	- Ready to merge.
> 
> - Intent Whiteouts V3
> 	- Merge criteria and status:
> 		- review complete: 0%
> 		- No regressions in testing: 100%
> 	- Open questions:
> 		- will it get reviewed in time?
> 		- what bits of the patchset does LARP depend on?
> 		- Is LARP perf without intent whiteouts acceptible
> 		  (Experimental tag tends to suggest yes).
> 	- Functionally complete and tested, just needs review.

<shrug> No opinions, having never seen this before(?)

> Have I missed any of the major outstanding things that are nearly
> ready to go?

At this point my rmap/reflink performance speedups series are ready for
review, but I think the xlog and nrext64 are more than enough for a
single cycle.

> Do the patchset authors have the time available in the next 2-3
> weeks to make enough progress to get their work merged? I'd kinda
> like to have the xlog_write() rework and the large extent counts
> merged ASAP so we have plenty of time to focus on the more
> complex/difficult pieces.  If you don't have time in the next few
> weeks, then let me know so I can adjust the plan appropriately for
> the cycle.
> 
> What does everyone think of the plan?

I like that you're making the plan explicit.  I'd wanted to talk about
doing this back at LPC 2021, but nobody from RH registered... :(

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-07  3:11 ` Darrick J. Wong
@ 2022-04-07  5:49   ` Dave Chinner
  2022-04-07 22:40     ` Alli
  2022-04-10 18:21     ` Darrick J. Wong
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2022-04-07  5:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > I'd really like to try getting the merge bottlenecks we've had
> > recently unstuck, so there are a few patchsets I want to try to get
> > reviewed, tested and merged for 5.19. Hopefully not too many
> > surprises will get in the way and so some planning to try to
> > minimises surprised might be a good thing.  Hence I want to have a
> > rough plan for the work I'd like to acheive during this 5.19 cycle,
> > and so that everyone has an idea of what needs to be done to (maybe)
> > achieve those goals over the next few weeks.
> > 
> > First of all, a rough timeline that I'm working with:
> > 
> > 5.18-rc1:	where we are now
> > 5.18-rc2:	Update linux-xfs master branch to 5.19-rc2
> 
> Presumably you meant 5.18-rc2 here?
> 
> > 5.18-rc4:	At least 2 of the major pending works merged
> > 5.18-rc6:	Last point for new work to be merged
> > 5.18-rc6+:	Bug fixes only will be merged 
> > 
> > I'm assuming a -rc7 kernel will be released, hence this rough
> > timeline gives us 2 weeks of testing/stabilisation time before 5.19
> > merge window opens. 
> > 
> > Patchsets for review should be based on either 5.17.0 or the
> > linux-xfs master branch once it has been updated to 5.19-rc2. If
> 
> ...and here?

Yes, I meant 5.18-rc2.

> > - Logged attributes V28 (Allison)
> > 	- I haven't looked at this since V24, so I'm not sure what
> > 	  the current status is. I will do that discovery later in
> > 	  the week.
> > 	- Merge criteria and status:
> > 		- review complete: Not sure
> > 		- no regressions when not enabled: v24 was OK
> > 		- no major regressions when enabled: v24 had issues
> > 	- Open questions:
> > 		- not sure what review will uncover
> > 		- don't know what problems testing will show
> > 		- what other log fixes does it depend on?
> > 		- is there a performance impact when not enabled?
> 
> Hm.  Last time I went through this I was mostly satisfied except for (a)
> all of the subtle rules about who owns and frees the attr name/value
> buffers, and (b) all that stuff with the alignment/sizing asserts
> tripping on fsstress loop tests.
> 
> I /think/ Allison's fixed (a), and I think Dave had a patch or two for
> (b)?

Yup, I think the patches in the intent whiteout series fix the
issues with the log iovecs that came up.

> Oh one more thing:
> 
> ISTR one of the problems is that the VFS allocates an onstack
> buffer for the xattr name.  The buffer is char[], so the start of it
> isn't necessarily aligned to what the logging code wants; and the end of
> it (since it's 255 bytes long) almost assuredly isn't.

Not sure that is a problem - we're copying them into log iovecs in
the shadow buffer - the iovecs in the shadow buffer have alignment
constraints because xlog_write() needing 4 byte alignment of ophdrs,
but the source buffer they get memcpy()d from has no alignment
restrictions.

I still need check that the code hasn't changed since v24 when I
looked at this in detail, but I think the VFS buffer is fine.

> > - DAX + reflink V11 (Ruan)
> > 	- Merge criteria and status:
> > 		- review complete: 75%
> > 		- no regressions when not enabled: unknown
> > 		- no major regressions when enabled: unknown
> > 	- Open questions:
> > 		- still a little bit of change around change
> > 		  notification?
> > 		- Not sure when V12 will arrive, hence can't really
> > 		  plan for this right now.
> > 		- external dependencies?
> 
> I thought the XFS part of this patchset looked like it was in good
> enough shape to merge, but the actual infrastructure stuff (AKA messing
> with mm/ and dax code) hasn't gotten a review.  I don't really have the
> depth to know if the changes proposed are good or bad.

Most of the patches have RVBs when I checked a couple of days ago.
There's a couple that still need work. I'm mostly relying on Dan and
Christoph to finish the reviews of this, hopefully it won't take
more than one more round...

> > - xlog_write() rework V8
> > 	- Merge criteria and status:
> > 		- review complete: 100%
> > 		- No regressions in testing: 100%
> > 	- Open questions:
> > 		- unchanged since last review/merge attempt,
> > 		  reverted because of problems with other code that
> > 		  was merged with it that isn't in this patchset
> > 		  now. Does it need re-reviewing?
> 
> I suggest you rebase to something recent (5.17.0 + xfs-5.18-merge-4?)
> and send it to the list for a quick once-over before merging that.
> IIRC I understood it well enough to have been ok with putting it in.

When I last posted it (March 9) it was rebased against 5.17-rc4:

https://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git/log/?h=xlog-write-rework-3
https://lore.kernel.org/linux-xfs/20220309052937.2696447-1-david@fromorbit.com/

And it I think there's only been a line or two change for rebasing
to the current for-next branch. 

I have a current base on 5.17+for-next, so if you need a newer
version to check over I can send that out easily enough...

> > 	- Ready to merge.
> > 
> > - Intent Whiteouts V3
> > 	- Merge criteria and status:
> > 		- review complete: 0%
> > 		- No regressions in testing: 100%
> > 	- Open questions:
> > 		- will it get reviewed in time?
> > 		- what bits of the patchset does LARP depend on?
> > 		- Is LARP perf without intent whiteouts acceptible
> > 		  (Experimental tag tends to suggest yes).
> > 	- Functionally complete and tested, just needs review.
> 
> <shrug> No opinions, having never seen this before(?)

First RFC was 7 months ago:

https://lore.kernel.org/linux-xfs/20210902095927.911100-1-david@fromorbit.com/

I mentioned it here in the 5.16 cycle planning discussion:

https://lore.kernel.org/linux-xfs/20210922053047.GS1756565@dread.disaster.area/

I posted v3 on 5.17-rc4 and xlog-write-rewrite back on about 3
weeks ago now:

https://lore.kernel.org/linux-xfs/20220314220631.3093283-1-david@fromorbit.com/

and like the xlog-write rework it is largely unchanged by a rebase
to 5.17.0+for-next.

> > Have I missed any of the major outstanding things that are nearly
> > ready to go?
> 
> At this point my rmap/reflink performance speedups series are ready for
> review,

OK, what's the timeframe for you getting them out for review? Today,
next week, -rc4?

> but I think the xlog and nrext64 are more than enough for a
> single cycle.

Except we've already done most of the work needed to merge them and
we aren't even at -rc2. That leaves another 4 weeks of time to
review, test and merge other work before we hit the -rc6 cutoff.
The plan I've outlined is based on what I think *I* can acheive in
the cycle, but I have no doubt that some of it will not get done
because that's the way these things always go. SO I've aimed high,
knowing that we're more likely to hit the middle of the target
range...

That said, if the code is reviewed, ready to merge and passes initial
regression tests, then I'll merge it regardless of how much else
I've already got queued up.

> > Do the patchset authors have the time available in the next 2-3
> > weeks to make enough progress to get their work merged? I'd kinda
> > like to have the xlog_write() rework and the large extent counts
> > merged ASAP so we have plenty of time to focus on the more
> > complex/difficult pieces.  If you don't have time in the next few
> > weeks, then let me know so I can adjust the plan appropriately for
> > the cycle.
> > 
> > What does everyone think of the plan?
> 
> I like that you're making the plan explicit.  I'd wanted to talk about
> doing this back at LPC 2021, but nobody from RH registered... :(

Lead by example - you need to don't ask for permission or build
consensus for doing something you think needs to be done. We don't
need to discuss whether we should have a planning discussion, just
publish the plan and that will naturally lead to a discussion of
the plan.

Speaking as the "merge shepherd" for this release, what I want from
this discussion is feedback that points out things I've missed, for
the authors of patchsets that I've flagged as merge candidates to
tell me if they are able to do the work needed in the next 4-6 weeks
to get their work merged, for people to voice their concerns about
aspects of the plan, etc.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-07  5:49   ` Dave Chinner
@ 2022-04-07 22:40     ` Alli
  2022-04-11  1:50       ` Dave Chinner
  2022-04-10 18:21     ` Darrick J. Wong
  1 sibling, 1 reply; 12+ messages in thread
From: Alli @ 2022-04-07 22:40 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs

On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > I'd really like to try getting the merge bottlenecks we've had
> > > recently unstuck, so there are a few patchsets I want to try to
> > > get
> > > reviewed, tested and merged for 5.19. Hopefully not too many
> > > surprises will get in the way and so some planning to try to
> > > minimises surprised might be a good thing.  Hence I want to have
> > > a
> > > rough plan for the work I'd like to acheive during this 5.19
> > > cycle,
> > > and so that everyone has an idea of what needs to be done to
> > > (maybe)
> > > achieve those goals over the next few weeks.
> > > 
> > > First of all, a rough timeline that I'm working with:
> > > 
> > > 5.18-rc1:	where we are now
> > > 5.18-rc2:	Update linux-xfs master branch to 5.19-rc2
> > 
> > Presumably you meant 5.18-rc2 here?
> > 
> > > 5.18-rc4:	At least 2 of the major pending works merged
> > > 5.18-rc6:	Last point for new work to be merged
> > > 5.18-rc6+:	Bug fixes only will be merged 
> > > 
> > > I'm assuming a -rc7 kernel will be released, hence this rough
> > > timeline gives us 2 weeks of testing/stabilisation time before
> > > 5.19
> > > merge window opens. 
> > > 
> > > Patchsets for review should be based on either 5.17.0 or the
> > > linux-xfs master branch once it has been updated to 5.19-rc2. If
> > 
> > ...and here?
> 
> Yes, I meant 5.18-rc2.
> 
> > > - Logged attributes V28 (Allison)
> > > 	- I haven't looked at this since V24, so I'm not sure what
> > > 	  the current status is. I will do that discovery later in
> > > 	  the week.
> > > 	- Merge criteria and status:
> > > 		- review complete: Not sure
So far each patch in v29 has at least 2 rvbs I think

> > > 		- no regressions when not enabled: v24 was OK
> > > 		- no major regressions when enabled: v24 had issues
> > > 	- Open questions:
> > > 		- not sure what review will uncover
> > > 		- don't know what problems testing will show
> > > 		- what other log fixes does it depend on?
If it goes on top of whiteouts, it will need some modifications to
follow the new log item changes that the whiteout set makes.

Alternately, if the white out set goes in after the larp set, then it
will need to apply the new log item changes to xfs_attr_item.c as well

Looking forward, once we get the kernel patches worked out, we should
probably port the corresponding patches to xfsprogs before enabling the
feature.  I have a patch to print the new log item in a dump.  It's not
very complicated though, I don't think it will take too many reviews to
get through though.

> > > 		- is there a performance impact when not enabled?
It shouldn't.  When the feature is not enabled, it should not create
any intents.

> > 
> > Hm.  Last time I went through this I was mostly satisfied except
> > for (a)
> > all of the subtle rules about who owns and frees the attr
> > name/value
> > buffers, and (b) all that stuff with the alignment/sizing asserts
> > tripping on fsstress loop tests.
> > 
> > I /think/ Allison's fixed (a), and I think Dave had a patch or two
> > for
> > (b)?
> 
> Yup, I think the patches in the intent whiteout series fix the
> issues with the log iovecs that came up.
> 
> > Oh one more thing:
> > 
> > ISTR one of the problems is that the VFS allocates an onstack
> > buffer for the xattr name.  The buffer is char[], so the start of
> > it
> > isn't necessarily aligned to what the logging code wants; and the
> > end of
> > it (since it's 255 bytes long) almost assuredly isn't.
> 
> Not sure that is a problem - we're copying them into log iovecs in
> the shadow buffer - the iovecs in the shadow buffer have alignment
> constraints because xlog_write() needing 4 byte alignment of ophdrs,
> but the source buffer they get memcpy()d from has no alignment
> restrictions.
> 
> I still need check that the code hasn't changed since v24 when I
> looked at this in detail, but I think the VFS buffer is fine.
> 
> > > - DAX + reflink V11 (Ruan)
> > > 	- Merge criteria and status:
> > > 		- review complete: 75%
> > > 		- no regressions when not enabled: unknown
> > > 		- no major regressions when enabled: unknown
> > > 	- Open questions:
> > > 		- still a little bit of change around change
> > > 		  notification?
> > > 		- Not sure when V12 will arrive, hence can't really
> > > 		  plan for this right now.
> > > 		- external dependencies?
> > 
> > I thought the XFS part of this patchset looked like it was in good
> > enough shape to merge, but the actual infrastructure stuff (AKA
> > messing
> > with mm/ and dax code) hasn't gotten a review.  I don't really have
> > the
> > depth to know if the changes proposed are good or bad.
> 
> Most of the patches have RVBs when I checked a couple of days ago.
> There's a couple that still need work. I'm mostly relying on Dan and
> Christoph to finish the reviews of this, hopefully it won't take
> more than one more round...
> 
> > > - xlog_write() rework V8
> > > 	- Merge criteria and status:
> > > 		- review complete: 100%
> > > 		- No regressions in testing: 100%
> > > 	- Open questions:
> > > 		- unchanged since last review/merge attempt,
> > > 		  reverted because of problems with other code that
> > > 		  was merged with it that isn't in this patchset
> > > 		  now. Does it need re-reviewing?
> > 
> > I suggest you rebase to something recent (5.17.0 + xfs-5.18-merge-
> > 4?)
> > and send it to the list for a quick once-over before merging that.
> > IIRC I understood it well enough to have been ok with putting it
> > in.
> 
> When I last posted it (March 9) it was rebased against 5.17-rc4:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git/log/?h=xlog-write-rework-3
> https://lore.kernel.org/linux-xfs/20220309052937.2696447-1-david@fromorbit.com/
> 
> And it I think there's only been a line or two change for rebasing
> to the current for-next branch. 
> 
> I have a current base on 5.17+for-next, so if you need a newer
> version to check over I can send that out easily enough...
> 
> > > 	- Ready to merge.
> > > 
> > > - Intent Whiteouts V3
> > > 	- Merge criteria and status:
> > > 		- review complete: 0%
I think patch 2 of this set is the same as patch 2 of the larp set.  If
you agree with the review results, you can just take patch 2 from the
larp series, and have 2 rvbs for this one

xfs: don't commit the first deferred transaction
v25
https://lore.kernel.org/all/20211117041343.3050202-3-allison.henderson@oracle.com/
v26
https://lore.kernel.org/all/20220124052708.580016-3-allison.henderson@oracle.com/
v27
https://lore.kernel.org/all/20220216013713.1191082-3-allison.henderson@oracle.com/
v28
https://lore.kernel.org/all/20220228195147.1913281-3-allison.henderson@oracle.com/

I'll see if I can stack the two sets together and get some reviews done
on the remainder of the set

> > > 		- No regressions in testing: 100%
> > > 	- Open questions:
> > > 		- will it get reviewed in time?
> > > 		- what bits of the patchset does LARP depend on?
Just from glancing at the sets, I don't think they have merge conflicts
other than patch 2, which can simply be dropped from one of the sets.

However, patches 3,4,6,7 of the whiteout set make a series of changes
to the xfs_*_item.c files.  So similar changes need to be applied to
the new fs/xfs/xfs_attr_item.c that the larp set introduces 
 
> > > 		- Is LARP perf without intent whiteouts acceptible
> > > 		  (Experimental tag tends to suggest yes).
I'll see if I can get a few perf captures on it too

Allison
> > > 	- Functionally complete and tested, just needs review.
> > 
> > <shrug> No opinions, having never seen this before(?)
> 
> First RFC was 7 months ago:
> 
> https://lore.kernel.org/linux-xfs/20210902095927.911100-1-david@fromorbit.com/
> 
> I mentioned it here in the 5.16 cycle planning discussion:
> 
> https://lore.kernel.org/linux-xfs/20210922053047.GS1756565@dread.disaster.area/
> 
> I posted v3 on 5.17-rc4 and xlog-write-rewrite back on about 3
> weeks ago now:
> 
> https://lore.kernel.org/linux-xfs/20220314220631.3093283-1-david@fromorbit.com/
> 
> and like the xlog-write rework it is largely unchanged by a rebase
> to 5.17.0+for-next.
> 
> > > Have I missed any of the major outstanding things that are nearly
> > > ready to go?
> > 
> > At this point my rmap/reflink performance speedups series are ready
> > for
> > review,
> 
> OK, what's the timeframe for you getting them out for review? Today,
> next week, -rc4?
> 
> > but I think the xlog and nrext64 are more than enough for a
> > single cycle.
> 
> Except we've already done most of the work needed to merge them and
> we aren't even at -rc2. That leaves another 4 weeks of time to
> review, test and merge other work before we hit the -rc6 cutoff.
> The plan I've outlined is based on what I think *I* can acheive in
> the cycle, but I have no doubt that some of it will not get done
> because that's the way these things always go. SO I've aimed high,
> knowing that we're more likely to hit the middle of the target
> range...
> 
> That said, if the code is reviewed, ready to merge and passes initial
> regression tests, then I'll merge it regardless of how much else
> I've already got queued up.
> 
> > > Do the patchset authors have the time available in the next 2-3
> > > weeks to make enough progress to get their work merged? I'd kinda
> > > like to have the xlog_write() rework and the large extent counts
> > > merged ASAP so we have plenty of time to focus on the more
> > > complex/difficult pieces.  If you don't have time in the next few
> > > weeks, then let me know so I can adjust the plan appropriately
> > > for
> > > the cycle.
> > > 
> > > What does everyone think of the plan?
> > 
> > I like that you're making the plan explicit.  I'd wanted to talk
> > about
> > doing this back at LPC 2021, but nobody from RH registered... :(
> 
> Lead by example - you need to don't ask for permission or build
> consensus for doing something you think needs to be done. We don't
> need to discuss whether we should have a planning discussion, just
> publish the plan and that will naturally lead to a discussion of
> the plan.
> 
> Speaking as the "merge shepherd" for this release, what I want from
> this discussion is feedback that points out things I've missed, for
> the authors of patchsets that I've flagged as merge candidates to
> tell me if they are able to do the work needed in the next 4-6 weeks
> to get their work merged, for people to voice their concerns about
> aspects of the plan, etc.
> 
> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-07  5:49   ` Dave Chinner
  2022-04-07 22:40     ` Alli
@ 2022-04-10 18:21     ` Darrick J. Wong
  2022-04-11  1:51       ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2022-04-10 18:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Apr 07, 2022 at 03:49:39PM +1000, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > I'd really like to try getting the merge bottlenecks we've had
> > > recently unstuck, so there are a few patchsets I want to try to get
> > > reviewed, tested and merged for 5.19. Hopefully not too many
> > > surprises will get in the way and so some planning to try to
> > > minimises surprised might be a good thing.  Hence I want to have a
> > > rough plan for the work I'd like to acheive during this 5.19 cycle,
> > > and so that everyone has an idea of what needs to be done to (maybe)
> > > achieve those goals over the next few weeks.
> > > 
> > > First of all, a rough timeline that I'm working with:
> > > 
> > > 5.18-rc1:	where we are now
> > > 5.18-rc2:	Update linux-xfs master branch to 5.19-rc2
> > 
> > Presumably you meant 5.18-rc2 here?
> > 
> > > 5.18-rc4:	At least 2 of the major pending works merged
> > > 5.18-rc6:	Last point for new work to be merged
> > > 5.18-rc6+:	Bug fixes only will be merged 
> > > 
> > > I'm assuming a -rc7 kernel will be released, hence this rough
> > > timeline gives us 2 weeks of testing/stabilisation time before 5.19
> > > merge window opens. 
> > > 
> > > Patchsets for review should be based on either 5.17.0 or the
> > > linux-xfs master branch once it has been updated to 5.19-rc2. If
> > 
> > ...and here?
> 
> Yes, I meant 5.18-rc2.
> 
> > > - Logged attributes V28 (Allison)
> > > 	- I haven't looked at this since V24, so I'm not sure what
> > > 	  the current status is. I will do that discovery later in
> > > 	  the week.
> > > 	- Merge criteria and status:
> > > 		- review complete: Not sure
> > > 		- no regressions when not enabled: v24 was OK
> > > 		- no major regressions when enabled: v24 had issues
> > > 	- Open questions:
> > > 		- not sure what review will uncover
> > > 		- don't know what problems testing will show
> > > 		- what other log fixes does it depend on?
> > > 		- is there a performance impact when not enabled?
> > 
> > Hm.  Last time I went through this I was mostly satisfied except for (a)
> > all of the subtle rules about who owns and frees the attr name/value
> > buffers, and (b) all that stuff with the alignment/sizing asserts
> > tripping on fsstress loop tests.
> > 
> > I /think/ Allison's fixed (a), and I think Dave had a patch or two for
> > (b)?
> 
> Yup, I think the patches in the intent whiteout series fix the
> issues with the log iovecs that came up.
> 
> > Oh one more thing:
> > 
> > ISTR one of the problems is that the VFS allocates an onstack
> > buffer for the xattr name.  The buffer is char[], so the start of it
> > isn't necessarily aligned to what the logging code wants; and the end of
> > it (since it's 255 bytes long) almost assuredly isn't.
> 
> Not sure that is a problem - we're copying them into log iovecs in
> the shadow buffer - the iovecs in the shadow buffer have alignment
> constraints because xlog_write() needing 4 byte alignment of ophdrs,
> but the source buffer they get memcpy()d from has no alignment
> restrictions.
> 
> I still need check that the code hasn't changed since v24 when I
> looked at this in detail, but I think the VFS buffer is fine.
> 
> > > - DAX + reflink V11 (Ruan)
> > > 	- Merge criteria and status:
> > > 		- review complete: 75%
> > > 		- no regressions when not enabled: unknown
> > > 		- no major regressions when enabled: unknown
> > > 	- Open questions:
> > > 		- still a little bit of change around change
> > > 		  notification?
> > > 		- Not sure when V12 will arrive, hence can't really
> > > 		  plan for this right now.
> > > 		- external dependencies?
> > 
> > I thought the XFS part of this patchset looked like it was in good
> > enough shape to merge, but the actual infrastructure stuff (AKA messing
> > with mm/ and dax code) hasn't gotten a review.  I don't really have the
> > depth to know if the changes proposed are good or bad.
> 
> Most of the patches have RVBs when I checked a couple of days ago.
> There's a couple that still need work. I'm mostly relying on Dan and
> Christoph to finish the reviews of this, hopefully it won't take
> more than one more round...
> 
> > > - xlog_write() rework V8
> > > 	- Merge criteria and status:
> > > 		- review complete: 100%
> > > 		- No regressions in testing: 100%
> > > 	- Open questions:
> > > 		- unchanged since last review/merge attempt,
> > > 		  reverted because of problems with other code that
> > > 		  was merged with it that isn't in this patchset
> > > 		  now. Does it need re-reviewing?
> > 
> > I suggest you rebase to something recent (5.17.0 + xfs-5.18-merge-4?)
> > and send it to the list for a quick once-over before merging that.
> > IIRC I understood it well enough to have been ok with putting it in.
> 
> When I last posted it (March 9) it was rebased against 5.17-rc4:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git/log/?h=xlog-write-rework-3
> https://lore.kernel.org/linux-xfs/20220309052937.2696447-1-david@fromorbit.com/
> 
> And it I think there's only been a line or two change for rebasing
> to the current for-next branch. 
> 
> I have a current base on 5.17+for-next, so if you need a newer
> version to check over I can send that out easily enough...

Ah, ok, that rework-3 branch looks fine to me.

> > > 	- Ready to merge.
> > > 
> > > - Intent Whiteouts V3
> > > 	- Merge criteria and status:
> > > 		- review complete: 0%
> > > 		- No regressions in testing: 100%
> > > 	- Open questions:
> > > 		- will it get reviewed in time?
> > > 		- what bits of the patchset does LARP depend on?
> > > 		- Is LARP perf without intent whiteouts acceptible
> > > 		  (Experimental tag tends to suggest yes).
> > > 	- Functionally complete and tested, just needs review.
> > 
> > <shrug> No opinions, having never seen this before(?)
> 
> First RFC was 7 months ago:
> 
> https://lore.kernel.org/linux-xfs/20210902095927.911100-1-david@fromorbit.com/
> 
> I mentioned it here in the 5.16 cycle planning discussion:
> 
> https://lore.kernel.org/linux-xfs/20210922053047.GS1756565@dread.disaster.area/
> 
> I posted v3 on 5.17-rc4 and xlog-write-rewrite back on about 3
> weeks ago now:
> 
> https://lore.kernel.org/linux-xfs/20220314220631.3093283-1-david@fromorbit.com/
> 
> and like the xlog-write rework it is largely unchanged by a rebase
> to 5.17.0+for-next.

Ok, I'll have a second look tomorrow (Monday).

> > > Have I missed any of the major outstanding things that are nearly
> > > ready to go?
> > 
> > At this point my rmap/reflink performance speedups series are ready for
> > review,
> 
> OK, what's the timeframe for you getting them out for review? Today,
> next week, -rc4?

I'll send them as soon as the frextents bugfix series clears review.

> > but I think the xlog and nrext64 are more than enough for a
> > single cycle.
> 
> Except we've already done most of the work needed to merge them and
> we aren't even at -rc2. That leaves another 4 weeks of time to
> review, test and merge other work before we hit the -rc6 cutoff.
> The plan I've outlined is based on what I think *I* can acheive in
> the cycle, but I have no doubt that some of it will not get done
> because that's the way these things always go. SO I've aimed high,
> knowing that we're more likely to hit the middle of the target
> range...
> 
> That said, if the code is reviewed, ready to merge and passes initial
> regression tests, then I'll merge it regardless of how much else
> I've already got queued up.

Ok.

> > > Do the patchset authors have the time available in the next 2-3
> > > weeks to make enough progress to get their work merged? I'd kinda
> > > like to have the xlog_write() rework and the large extent counts
> > > merged ASAP so we have plenty of time to focus on the more
> > > complex/difficult pieces.  If you don't have time in the next few
> > > weeks, then let me know so I can adjust the plan appropriately for
> > > the cycle.
> > > 
> > > What does everyone think of the plan?
> > 
> > I like that you're making the plan explicit.  I'd wanted to talk about
> > doing this back at LPC 2021, but nobody from RH registered... :(
> 
> Lead by example - you need to don't ask for permission or build
> consensus for doing something you think needs to be done. We don't
> need to discuss whether we should have a planning discussion, just
> publish the plan and that will naturally lead to a discussion of
> the plan.

Yeah, it's funny -- 20 years ago me would have done exactly this.

> Speaking as the "merge shepherd" for this release, what I want from
> this discussion is feedback that points out things I've missed, for
> the authors of patchsets that I've flagged as merge candidates to
> tell me if they are able to do the work needed in the next 4-6 weeks
> to get their work merged, for people to voice their concerns about
> aspects of the plan, etc.

<nod> I hope you've gotten enough info to proceed, then?

--D

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-07 22:40     ` Alli
@ 2022-04-11  1:50       ` Dave Chinner
  2022-04-11  3:59         ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-04-11  1:50 UTC (permalink / raw)
  To: Alli; +Cc: Darrick J. Wong, linux-xfs

On Thu, Apr 07, 2022 at 03:40:08PM -0700, Alli wrote:
> On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > > Hi folks,
> > > > 
> > > > I'd really like to try getting the merge bottlenecks we've had
> > > > recently unstuck, so there are a few patchsets I want to try to
> > > > get
> > > > reviewed, tested and merged for 5.19. Hopefully not too many
> > > > surprises will get in the way and so some planning to try to
> > > > minimises surprised might be a good thing.  Hence I want to have
> > > > a
> > > > rough plan for the work I'd like to acheive during this 5.19
> > > > cycle,
> > > > and so that everyone has an idea of what needs to be done to
> > > > (maybe)
> > > > achieve those goals over the next few weeks.
> > > > 
> > > > First of all, a rough timeline that I'm working with:
> > > > 
> > > > 5.18-rc1:	where we are now
> > > > 5.18-rc2:	Update linux-xfs master branch to 5.19-rc2
> > > 
> > > Presumably you meant 5.18-rc2 here?
> > > 
> > > > 5.18-rc4:	At least 2 of the major pending works merged
> > > > 5.18-rc6:	Last point for new work to be merged
> > > > 5.18-rc6+:	Bug fixes only will be merged 
> > > > 
> > > > I'm assuming a -rc7 kernel will be released, hence this rough
> > > > timeline gives us 2 weeks of testing/stabilisation time before
> > > > 5.19
> > > > merge window opens. 
> > > > 
> > > > Patchsets for review should be based on either 5.17.0 or the
> > > > linux-xfs master branch once it has been updated to 5.19-rc2. If
> > > 
> > > ...and here?
> > 
> > Yes, I meant 5.18-rc2.
> > 
> > > > - Logged attributes V28 (Allison)
> > > > 	- I haven't looked at this since V24, so I'm not sure what
> > > > 	  the current status is. I will do that discovery later in
> > > > 	  the week.
> > > > 	- Merge criteria and status:
> > > > 		- review complete: Not sure
> So far each patch in v29 has at least 2 rvbs I think

OK.

> > > > 		- no regressions when not enabled: v24 was OK
> > > > 		- no major regressions when enabled: v24 had issues
> > > > 	- Open questions:
> > > > 		- not sure what review will uncover
> > > > 		- don't know what problems testing will show
> > > > 		- what other log fixes does it depend on?
> If it goes on top of whiteouts, it will need some modifications to
> follow the new log item changes that the whiteout set makes.
> 
> Alternately, if the white out set goes in after the larp set, then it
> will need to apply the new log item changes to xfs_attr_item.c as well

I figured as much, thanks for confirming!

> Looking forward, once we get the kernel patches worked out, we should
> probably port the corresponding patches to xfsprogs before enabling the
> feature.  I have a patch to print the new log item in a dump.

You mean for xfs_logprint? 

> It's not
> very complicated though, I don't think it will take too many reviews to
> get through though.

*nod*

> > > > - Intent Whiteouts V3
> > > > 	- Merge criteria and status:
> > > > 		- review complete: 0%
> I think patch 2 of this set is the same as patch 2 of the larp set.  If
> you agree with the review results, you can just take patch 2 from the
> larp series, and have 2 rvbs for this one

OK. I'll duplicate the rvbs so whichever gets merged first contains
them.

> > > > 		- No regressions in testing: 100%
> > > > 	- Open questions:
> > > > 		- will it get reviewed in time?
> > > > 		- what bits of the patchset does LARP depend on?
> Just from glancing at the sets, I don't think they have merge conflicts
> other than patch 2, which can simply be dropped from one of the sets.
> 
> However, patches 3,4,6,7 of the whiteout set make a series of changes
> to the xfs_*_item.c files.  So similar changes need to be applied to
> the new fs/xfs/xfs_attr_item.c that the larp set introduces 

*nod*

Thanks for the update, Allison.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-10 18:21     ` Darrick J. Wong
@ 2022-04-11  1:51       ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2022-04-11  1:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Apr 10, 2022 at 11:21:12AM -0700, Darrick J. Wong wrote:
> On Thu, Apr 07, 2022 at 03:49:39PM +1000, Dave Chinner wrote:
> > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > > Have I missed any of the major outstanding things that are nearly
> > > > ready to go?
> > > 
> > > At this point my rmap/reflink performance speedups series are ready for
> > > review,
> > 
> > OK, what's the timeframe for you getting them out for review? Today,
> > next week, -rc4?
> 
> I'll send them as soon as the frextents bugfix series clears review.

Thanks!

> > Speaking as the "merge shepherd" for this release, what I want from
> > this discussion is feedback that points out things I've missed, for
> > the authors of patchsets that I've flagged as merge candidates to
> > tell me if they are able to do the work needed in the next 4-6 weeks
> > to get their work merged, for people to voice their concerns about
> > aspects of the plan, etc.
> 
> <nod> I hope you've gotten enough info to proceed, then?

Yes, so far, so good :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-11  1:50       ` Dave Chinner
@ 2022-04-11  3:59         ` Dave Chinner
  2022-04-11  7:31           ` Dave Chinner
  2022-04-11 19:38           ` Alli
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2022-04-11  3:59 UTC (permalink / raw)
  To: Alli; +Cc: Darrick J. Wong, linux-xfs

On Mon, Apr 11, 2022 at 11:50:23AM +1000, Dave Chinner wrote:
> On Thu, Apr 07, 2022 at 03:40:08PM -0700, Alli wrote:
> > On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> > > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > > > - Logged attributes V28 (Allison)
> > > > > 	- I haven't looked at this since V24, so I'm not sure what
> > > > > 	  the current status is. I will do that discovery later in
> > > > > 	  the week.
> > > > > 	- Merge criteria and status:
> > > > > 		- review complete: Not sure
> > So far each patch in v29 has at least 2 rvbs I think
> 
> OK.
> 
> > > > > 		- no regressions when not enabled: v24 was OK
> > > > > 		- no major regressions when enabled: v24 had issues
> > > > > 	- Open questions:
> > > > > 		- not sure what review will uncover
> > > > > 		- don't know what problems testing will show
> > > > > 		- what other log fixes does it depend on?
> > If it goes on top of whiteouts, it will need some modifications to
> > follow the new log item changes that the whiteout set makes.
> > 
> > Alternately, if the white out set goes in after the larp set, then it
> > will need to apply the new log item changes to xfs_attr_item.c as well
> 
> I figured as much, thanks for confirming!

Ok, so I've just gone through the process of merging the two
branches to see where we stand. The modifications to the log code
that are needed for the larp code - changes to log iovec processing
and padding - are out of date in the LARP v29 patchset.

That is, the versions that are in the intent whiteout patchset are
much more sophisticated and cleanly separated. The version of the
"avoid extra transactions when no intents" patch in the LARP v29
series is really only looking at whether the transaction is dirty,
not whether there are intents in the transactions, which is what we
really need to know when deciding whether to commit the transaction
or not.

There are also a bunch of log iovec changes buried in patch 4 of the
LARP patchset which is labelled as "infrastructure". Those changes
are cleanly split out as patch 1 in the intent whiteout patchset and
provide the xlog_calc_vec_len() function that the LARP code needs.

As such, the RVBs on the patches in the LARPv29 series don't carry
over to the patches in the intent whiteout series - they are just
too different for that to occur.

The additional changes needed to support intent whiteouts are
relatively straight forward for the attri/attrd items, so at this
point I'd much prefer that the two patchsets are ordered "intent
whiteouts" then "LARP".

I've pushed the compose I just processed to get most of the pending
patchsets as they stand into topic branches and onto test machines
out to kernel.org. Have a look at:

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-5.19-compose

to see how I merged everything and maybe give it a run through your
test cycle to see if there's anything I broke when LARP is enabled....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-11  3:59         ` Dave Chinner
@ 2022-04-11  7:31           ` Dave Chinner
  2022-04-11  8:50             ` Dave Chinner
  2022-04-11 19:38           ` Alli
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-04-11  7:31 UTC (permalink / raw)
  To: Alli; +Cc: Darrick J. Wong, linux-xfs

On Mon, Apr 11, 2022 at 01:59:35PM +1000, Dave Chinner wrote:
> On Mon, Apr 11, 2022 at 11:50:23AM +1000, Dave Chinner wrote:
> > On Thu, Apr 07, 2022 at 03:40:08PM -0700, Alli wrote:
> > > On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> > > > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > > > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > > > > - Logged attributes V28 (Allison)
> > > > > > 	- I haven't looked at this since V24, so I'm not sure what
> > > > > > 	  the current status is. I will do that discovery later in
> > > > > > 	  the week.
> > > > > > 	- Merge criteria and status:
> > > > > > 		- review complete: Not sure
> > > So far each patch in v29 has at least 2 rvbs I think
> > 
> > OK.
> > 
> > > > > > 		- no regressions when not enabled: v24 was OK
> > > > > > 		- no major regressions when enabled: v24 had issues
> > > > > > 	- Open questions:
> > > > > > 		- not sure what review will uncover
> > > > > > 		- don't know what problems testing will show
> > > > > > 		- what other log fixes does it depend on?
> > > If it goes on top of whiteouts, it will need some modifications to
> > > follow the new log item changes that the whiteout set makes.
> > > 
> > > Alternately, if the white out set goes in after the larp set, then it
> > > will need to apply the new log item changes to xfs_attr_item.c as well
> > 
> > I figured as much, thanks for confirming!
> 
> Ok, so I've just gone through the process of merging the two
> branches to see where we stand. The modifications to the log code
> that are needed for the larp code - changes to log iovec processing
> and padding - are out of date in the LARP v29 patchset.
> 
> That is, the versions that are in the intent whiteout patchset are
> much more sophisticated and cleanly separated. The version of the
> "avoid extra transactions when no intents" patch in the LARP v29
> series is really only looking at whether the transaction is dirty,
> not whether there are intents in the transactions, which is what we
> really need to know when deciding whether to commit the transaction
> or not.
> 
> There are also a bunch of log iovec changes buried in patch 4 of the
> LARP patchset which is labelled as "infrastructure". Those changes
> are cleanly split out as patch 1 in the intent whiteout patchset and
> provide the xlog_calc_vec_len() function that the LARP code needs.
> 
> As such, the RVBs on the patches in the LARPv29 series don't carry
> over to the patches in the intent whiteout series - they are just
> too different for that to occur.
> 
> The additional changes needed to support intent whiteouts are
> relatively straight forward for the attri/attrd items, so at this
> point I'd much prefer that the two patchsets are ordered "intent
> whiteouts" then "LARP".
> 
> I've pushed the compose I just processed to get most of the pending
> patchsets as they stand into topic branches and onto test machines
> out to kernel.org. Have a look at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-5.19-compose
> 
> to see how I merged everything and maybe give it a run through your
> test cycle to see if there's anything I broke when LARP is enabled....

generic/642 producded this splat:

 XFS: Assertion failed: !list_empty(&cil->xc_cil), file: fs/xfs/xfs_log_cil.c, line: 1274
 ------------[ cut here ]------------
 kernel BUG at fs/xfs/xfs_message.c:102!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 1 PID: 2187772 Comm: fsstress Not tainted 5.18.0-rc2-dgc+ #1108
 Call Trace:
  <TASK>
  xlog_cil_commit+0xa5a/0xad0
  __xfs_trans_commit+0xb8/0x330
  xfs_trans_commit+0x10/0x20
  xfs_attr_set+0x3e2/0x4c0
  xfs_xattr_set+0x8d/0xe0
  __vfs_setxattr+0x6b/0x90
  __vfs_setxattr_noperm+0x76/0x220
  __vfs_setxattr_locked+0xdf/0x100
  vfs_setxattr+0x94/0x170
  setxattr+0x110/0x200
  ? __might_fault+0x22/0x30
  ? strncpy_from_user+0x23/0x170
  ? getname_flags.part.0+0x4c/0x1b0
  ? kmem_cache_free+0x1fc/0x380
  ? __might_sleep+0x43/0x70
  path_setxattr+0xbf/0xe0
  __x64_sys_setxattr+0x2b/0x30
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Which implies a dirty transaction with nothing in it at the end of
a run through xfs_attr_set_iter() without LARP enabled. It raced
with a CIL push, so when the empty dirty transaction tries to push
the CIL, it assert fails because the CIL is empty....

I don't know how this happens yet, but there are no intents involved
here so it doesn't appear to have anything to do with intent logging
or intent whiteouts at this point.

I'll need to reproduce it to get more info about it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-11  7:31           ` Dave Chinner
@ 2022-04-11  8:50             ` Dave Chinner
  2022-04-11 20:00               ` Alli
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2022-04-11  8:50 UTC (permalink / raw)
  To: Alli; +Cc: Darrick J. Wong, linux-xfs

On Mon, Apr 11, 2022 at 05:31:21PM +1000, Dave Chinner wrote:
> On Mon, Apr 11, 2022 at 01:59:35PM +1000, Dave Chinner wrote:
> > On Mon, Apr 11, 2022 at 11:50:23AM +1000, Dave Chinner wrote:
> > > On Thu, Apr 07, 2022 at 03:40:08PM -0700, Alli wrote:
> > > > On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> > > > > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong wrote:
> > > > > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > > > > > - Logged attributes V28 (Allison)
> > > > > > > 	- I haven't looked at this since V24, so I'm not sure what
> > > > > > > 	  the current status is. I will do that discovery later in
> > > > > > > 	  the week.
> > > > > > > 	- Merge criteria and status:
> > > > > > > 		- review complete: Not sure
> > > > So far each patch in v29 has at least 2 rvbs I think
> > > 
> > > OK.
> > > 
> > > > > > > 		- no regressions when not enabled: v24 was OK
> > > > > > > 		- no major regressions when enabled: v24 had issues
> > > > > > > 	- Open questions:
> > > > > > > 		- not sure what review will uncover
> > > > > > > 		- don't know what problems testing will show
> > > > > > > 		- what other log fixes does it depend on?
> > > > If it goes on top of whiteouts, it will need some modifications to
> > > > follow the new log item changes that the whiteout set makes.
> > > > 
> > > > Alternately, if the white out set goes in after the larp set, then it
> > > > will need to apply the new log item changes to xfs_attr_item.c as well
> > > 
> > > I figured as much, thanks for confirming!
> > 
> > Ok, so I've just gone through the process of merging the two
> > branches to see where we stand. The modifications to the log code
> > that are needed for the larp code - changes to log iovec processing
> > and padding - are out of date in the LARP v29 patchset.
> > 
> > That is, the versions that are in the intent whiteout patchset are
> > much more sophisticated and cleanly separated. The version of the
> > "avoid extra transactions when no intents" patch in the LARP v29
> > series is really only looking at whether the transaction is dirty,
> > not whether there are intents in the transactions, which is what we
> > really need to know when deciding whether to commit the transaction
> > or not.
> > 
> > There are also a bunch of log iovec changes buried in patch 4 of the
> > LARP patchset which is labelled as "infrastructure". Those changes
> > are cleanly split out as patch 1 in the intent whiteout patchset and
> > provide the xlog_calc_vec_len() function that the LARP code needs.
> > 
> > As such, the RVBs on the patches in the LARPv29 series don't carry
> > over to the patches in the intent whiteout series - they are just
> > too different for that to occur.
> > 
> > The additional changes needed to support intent whiteouts are
> > relatively straight forward for the attri/attrd items, so at this
> > point I'd much prefer that the two patchsets are ordered "intent
> > whiteouts" then "LARP".
> > 
> > I've pushed the compose I just processed to get most of the pending
> > patchsets as they stand into topic branches and onto test machines
> > out to kernel.org. Have a look at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-5.19-compose
> > 
> > to see how I merged everything and maybe give it a run through your
> > test cycle to see if there's anything I broke when LARP is enabled....
> 
> generic/642 producded this splat:
> 
>  XFS: Assertion failed: !list_empty(&cil->xc_cil), file: fs/xfs/xfs_log_cil.c, line: 1274
>  ------------[ cut here ]------------
>  kernel BUG at fs/xfs/xfs_message.c:102!
>  invalid opcode: 0000 [#1] PREEMPT SMP
>  CPU: 1 PID: 2187772 Comm: fsstress Not tainted 5.18.0-rc2-dgc+ #1108
>  Call Trace:
>   <TASK>
>   xlog_cil_commit+0xa5a/0xad0
>   __xfs_trans_commit+0xb8/0x330
>   xfs_trans_commit+0x10/0x20
>   xfs_attr_set+0x3e2/0x4c0
>   xfs_xattr_set+0x8d/0xe0
>   __vfs_setxattr+0x6b/0x90
>   __vfs_setxattr_noperm+0x76/0x220
>   __vfs_setxattr_locked+0xdf/0x100
>   vfs_setxattr+0x94/0x170
>   setxattr+0x110/0x200
>   ? __might_fault+0x22/0x30
>   ? strncpy_from_user+0x23/0x170
>   ? getname_flags.part.0+0x4c/0x1b0
>   ? kmem_cache_free+0x1fc/0x380
>   ? __might_sleep+0x43/0x70
>   path_setxattr+0xbf/0xe0
>   __x64_sys_setxattr+0x2b/0x30
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Which implies a dirty transaction with nothing in it at the end of
> a run through xfs_attr_set_iter() without LARP enabled. It raced
> with a CIL push, so when the empty dirty transaction tries to push
> the CIL, it assert fails because the CIL is empty....
> 
> I don't know how this happens yet, but there are no intents involved
> here so it doesn't appear to have anything to do with intent logging
> or intent whiteouts at this point.

100% reproducable, and yup, there it is:

STATIC int
xfs_xattri_finish_update(
        struct xfs_attr_item            *attr,
        struct xfs_attrd_log_item       *attrdp,
        uint32_t                        op_flags)
{
.....
        switch (op) {
        case XFS_ATTR_OP_FLAGS_SET:
                error = xfs_attr_set_iter(attr);
                break;
.....
        /*
         * Mark the transaction dirty, even on error. This ensures the
         * transaction is aborted, which:
         *
         * 1.) releases the ATTRI and frees the ATTRD
         * 2.) shuts down the filesystem
         */
        args->trans->t_flags |= XFS_TRANS_DIRTY;
....

Ok, so the problem path is a create that ends up being a pure leaf
add operation. The trace looks like (trimmed for readability):

# trace-cmd record -e xfs_attr\* -e xfs_defer\* -e printk
....

 xfs_attr_leaf_lookup:		ino 0x99a name x11 namelen 3 valuelen 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
 xfs_defer_finish:		tp 0xffff888802e27138 caller __xfs_trans_commit+0x144
 xfs_defer_create_intent:	optype 5 intent (nil) committed 0 nr 1
 xfs_defer_pending_finish:	optype 5 intent (nil) committed 0 nr 1
 xfs_attr_leaf_lookup:		ino 0x99a name x11 namelen 3 valuelen 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
 xfs_attr_leaf_add:		ino 0x99a name x11 namelen 3 valuelen 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
 xfs_attr_leaf_add_work:	ino 0x99a name x11 namelen 3 valuelen 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
 xfs_attr_leaf_addname_return:	state change 4 ino 0x99a
 xfs_defer_trans_roll:		tp 0xffff888802e27138 caller xfs_defer_finish_noroll+0x2a5
 xfs_defer_pending_finish:	optype 5 intent (nil) committed 0 nr 1
 xfs_defer_finish_done:		tp 0xffff888802e273f0 caller __xfs_trans_commit+0x144
 console:			[   94.219375] XFS: Assertion failed: !list_empty(&cil->xc_cil), file: fs/xfs/

So we have a create/set operation here. THe first lookup is to check
that xattr exists or not. Gets -ENOENT so it sets the transaction
to deferred and commits it. That gets us into xfs_defer_finish()
where we process the xattri. We don't create an intent (because larp
is false), then we finish it. This calls into:

xfs_xattri_finish_update()
  xfs_attr_set_iter()
    case XFS_DAS_UNINIT:
      xfs_attr_leaf_addname()
        xfs_attr_leaf_try_add()
	  xfs_attr3_leaf_add()
	    xfs_attr3_leaf_add_work()
        attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
	trace_xfs_attr_leaf_addname_return()
	  state change 4, XFS_DAS_FOUND_LBLK == 4
	return -EAGAIN;

args->trans->t_flags |= XFS_TRANS_DIRTY;

Now we return -EAGAIN to xfs_defer_finish_one(), which requeues
the defered item onto the deferred work, which then returns to
xfs_defer_finish_noroll() and we go around the loop again.
We roll the dirty transaction that contained the dirty leaf buffer,
committing it, then run the loop again.

This time however, we run:

xfs_xattri_finish_update
  xfs_attr_set_iter
    case XFS_DAS_FOUND_LBLK:
      <tries to set up and copy remote xattr val>  <XXX - why?>
      <no remote xattr, nothing dirtied>
      <not a RENAME op>
        <no remote blocks, nothing dirtied>
      return 0;

args->trans->t_flags |= XFS_TRANS_DIRTY;

So, at this point, we now have a dirty transaction with no modified
objects in it. All we need to do how is have some other thread flush
the CIL and then for this task to win the race to be the first
transaction to commit once the push switches to a new, empty
context and unlocks the context lock....

And then xlog_cil_commit() trips over an empty CIL because we had a
dirty transaction with no dirty items attached to it.

So, the code snippet I pointed to above that unconditionally makes
the xattr transaction dirty is invalid. We should only be setting
the transaction dirty when we attach dirty an item attached to the
transaction. If we need to abort because of an unrecoverable error,
we need to shut down the log here. That will cause the transaction
to be aborted when it returns to the core defer/commit code.

In debugging this, there are several things I've noticed that need
correcting/fixing. Rather than go around the review circle to try to
get them all understood and fixed, I think I'm just going to writing
patches and send them out for review as I get them done and tested.
The faster we knock out the problems, the sooner we get this stuff
merged.

I'll start on this tomorrow morning....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-11  3:59         ` Dave Chinner
  2022-04-11  7:31           ` Dave Chinner
@ 2022-04-11 19:38           ` Alli
  1 sibling, 0 replies; 12+ messages in thread
From: Alli @ 2022-04-11 19:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Mon, 2022-04-11 at 13:59 +1000, Dave Chinner wrote:
> On Mon, Apr 11, 2022 at 11:50:23AM +1000, Dave Chinner wrote:
> > On Thu, Apr 07, 2022 at 03:40:08PM -0700, Alli wrote:
> > > On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> > > > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong
> > > > wrote:
> > > > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner wrote:
> > > > > > - Logged attributes V28 (Allison)
> > > > > > 	- I haven't looked at this since V24, so I'm not sure
> > > > > > what
> > > > > > 	  the current status is. I will do that discovery later
> > > > > > in
> > > > > > 	  the week.
> > > > > > 	- Merge criteria and status:
> > > > > > 		- review complete: Not sure
> > > So far each patch in v29 has at least 2 rvbs I think
> > 
> > OK.
> > 
> > > > > > 		- no regressions when not enabled: v24 was OK
> > > > > > 		- no major regressions when enabled: v24 had
> > > > > > issues
> > > > > > 	- Open questions:
> > > > > > 		- not sure what review will uncover
> > > > > > 		- don't know what problems testing will show
> > > > > > 		- what other log fixes does it depend on?
> > > If it goes on top of whiteouts, it will need some modifications
> > > to
> > > follow the new log item changes that the whiteout set makes.
> > > 
> > > Alternately, if the white out set goes in after the larp set,
> > > then it
> > > will need to apply the new log item changes to xfs_attr_item.c as
> > > well
> > 
> > I figured as much, thanks for confirming!
Hi Dave, sorry I just noticed this response after I had sent out the
whiteout reviews last night

> 
> Ok, so I've just gone through the process of merging the two
> branches to see where we stand. The modifications to the log code
> that are needed for the larp code - changes to log iovec processing
> and padding - are out of date in the LARP v29 patchset.
> 
> That is, the versions that are in the intent whiteout patchset are
> much more sophisticated and cleanly separated. The version of the
> "avoid extra transactions when no intents" patch in the LARP v29
> series is really only looking at whether the transaction is dirty,
> not whether there are intents in the transactions, which is what we
> really need to know when deciding whether to commit the transaction
> or not.
Ok, so it sounds like patch 2 of the larp set needs to be dropped then

> 
> There are also a bunch of log iovec changes buried in patch 4 of the
> LARP patchset which is labelled as "infrastructure". Those changes
> are cleanly split out as patch 1 in the intent whiteout patchset and
> provide the xlog_calc_vec_len() function that the LARP code needs.
> 
Ok, I will see if I can separate those out then

> As such, the RVBs on the patches in the LARPv29 series don't carry
> over to the patches in the intent whiteout series - they are just
> too different for that to occur.
> 
> The additional changes needed to support intent whiteouts are
> relatively straight forward for the attri/attrd items, so at this
> point I'd much prefer that the two patchsets are ordered "intent
> whiteouts" then "LARP".
Alrighty then, sounds good.

> 
> I've pushed the compose I just processed to get most of the pending
> patchsets as they stand into topic branches and onto test machines
> out to kernel.org. Have a look at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-
> 5.19-compose
Ok, I will take a look at this, I had not noticed it last night

> 
> to see how I merged everything and maybe give it a run through your
> test cycle to see if there's anything I broke when LARP is
> enabled....

Great, thanks!
Allison

> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [5.19 cycle] Planning and goals
  2022-04-11  8:50             ` Dave Chinner
@ 2022-04-11 20:00               ` Alli
  0 siblings, 0 replies; 12+ messages in thread
From: Alli @ 2022-04-11 20:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Mon, 2022-04-11 at 18:50 +1000, Dave Chinner wrote:
> On Mon, Apr 11, 2022 at 05:31:21PM +1000, Dave Chinner wrote:
> > On Mon, Apr 11, 2022 at 01:59:35PM +1000, Dave Chinner wrote:
> > > On Mon, Apr 11, 2022 at 11:50:23AM +1000, Dave Chinner wrote:
> > > > On Thu, Apr 07, 2022 at 03:40:08PM -0700, Alli wrote:
> > > > > On Thu, 2022-04-07 at 15:49 +1000, Dave Chinner wrote:
> > > > > > On Wed, Apr 06, 2022 at 08:11:06PM -0700, Darrick J. Wong
> > > > > > wrote:
> > > > > > > On Tue, Apr 05, 2022 at 12:03:12PM +1000, Dave Chinner
> > > > > > > wrote:
> > > > > > > > - Logged attributes V28 (Allison)
> > > > > > > > 	- I haven't looked at this since V24, so I'm
> > > > > > > > not sure what
> > > > > > > > 	  the current status is. I will do that
> > > > > > > > discovery later in
> > > > > > > > 	  the week.
> > > > > > > > 	- Merge criteria and status:
> > > > > > > > 		- review complete: Not sure
> > > > > So far each patch in v29 has at least 2 rvbs I think
> > > > 
> > > > OK.
> > > > 
> > > > > > > > 		- no regressions when not enabled: v24
> > > > > > > > was OK
> > > > > > > > 		- no major regressions when enabled:
> > > > > > > > v24 had issues
> > > > > > > > 	- Open questions:
> > > > > > > > 		- not sure what review will uncover
> > > > > > > > 		- don't know what problems testing will
> > > > > > > > show
> > > > > > > > 		- what other log fixes does it depend
> > > > > > > > on?
> > > > > If it goes on top of whiteouts, it will need some
> > > > > modifications to
> > > > > follow the new log item changes that the whiteout set makes.
> > > > > 
> > > > > Alternately, if the white out set goes in after the larp set,
> > > > > then it
> > > > > will need to apply the new log item changes to
> > > > > xfs_attr_item.c as well
> > > > 
> > > > I figured as much, thanks for confirming!
> > > 
> > > Ok, so I've just gone through the process of merging the two
> > > branches to see where we stand. The modifications to the log code
> > > that are needed for the larp code - changes to log iovec
> > > processing
> > > and padding - are out of date in the LARP v29 patchset.
> > > 
> > > That is, the versions that are in the intent whiteout patchset
> > > are
> > > much more sophisticated and cleanly separated. The version of the
> > > "avoid extra transactions when no intents" patch in the LARP v29
> > > series is really only looking at whether the transaction is
> > > dirty,
> > > not whether there are intents in the transactions, which is what
> > > we
> > > really need to know when deciding whether to commit the
> > > transaction
> > > or not.
> > > 
> > > There are also a bunch of log iovec changes buried in patch 4 of
> > > the
> > > LARP patchset which is labelled as "infrastructure". Those
> > > changes
> > > are cleanly split out as patch 1 in the intent whiteout patchset
> > > and
> > > provide the xlog_calc_vec_len() function that the LARP code
> > > needs.
> > > 
> > > As such, the RVBs on the patches in the LARPv29 series don't
> > > carry
> > > over to the patches in the intent whiteout series - they are just
> > > too different for that to occur.
> > > 
> > > The additional changes needed to support intent whiteouts are
> > > relatively straight forward for the attri/attrd items, so at this
> > > point I'd much prefer that the two patchsets are ordered "intent
> > > whiteouts" then "LARP".
> > > 
> > > I've pushed the compose I just processed to get most of the
> > > pending
> > > patchsets as they stand into topic branches and onto test
> > > machines
> > > out to kernel.org. Have a look at:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git
> > > xfs-5.19-compose
> > > 
> > > to see how I merged everything and maybe give it a run through
> > > your
> > > test cycle to see if there's anything I broke when LARP is
> > > enabled....
> > 
> > generic/642 producded this splat:
> > 
> >  XFS: Assertion failed: !list_empty(&cil->xc_cil), file:
> > fs/xfs/xfs_log_cil.c, line: 1274
> >  ------------[ cut here ]------------
> >  kernel BUG at fs/xfs/xfs_message.c:102!
> >  invalid opcode: 0000 [#1] PREEMPT SMP
> >  CPU: 1 PID: 2187772 Comm: fsstress Not tainted 5.18.0-rc2-dgc+
> > #1108
> >  Call Trace:
> >   <TASK>
> >   xlog_cil_commit+0xa5a/0xad0
> >   __xfs_trans_commit+0xb8/0x330
> >   xfs_trans_commit+0x10/0x20
> >   xfs_attr_set+0x3e2/0x4c0
> >   xfs_xattr_set+0x8d/0xe0
> >   __vfs_setxattr+0x6b/0x90
> >   __vfs_setxattr_noperm+0x76/0x220
> >   __vfs_setxattr_locked+0xdf/0x100
> >   vfs_setxattr+0x94/0x170
> >   setxattr+0x110/0x200
> >   ? __might_fault+0x22/0x30
> >   ? strncpy_from_user+0x23/0x170
> >   ? getname_flags.part.0+0x4c/0x1b0
> >   ? kmem_cache_free+0x1fc/0x380
> >   ? __might_sleep+0x43/0x70
> >   path_setxattr+0xbf/0xe0
> >   __x64_sys_setxattr+0x2b/0x30
> >   do_syscall_64+0x35/0x80
> >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > Which implies a dirty transaction with nothing in it at the end of
> > a run through xfs_attr_set_iter() without LARP enabled. It raced
> > with a CIL push, so when the empty dirty transaction tries to push
> > the CIL, it assert fails because the CIL is empty....
> > 
> > I don't know how this happens yet, but there are no intents
> > involved
> > here so it doesn't appear to have anything to do with intent
> > logging
> > or intent whiteouts at this point.
> 
> 100% reproducable, and yup, there it is:
> 
> STATIC int
> xfs_xattri_finish_update(
>         struct xfs_attr_item            *attr,
>         struct xfs_attrd_log_item       *attrdp,
>         uint32_t                        op_flags)
> {
> .....
>         switch (op) {
>         case XFS_ATTR_OP_FLAGS_SET:
>                 error = xfs_attr_set_iter(attr);
>                 break;
> .....
>         /*
>          * Mark the transaction dirty, even on error. This ensures
> the
>          * transaction is aborted, which:
>          *
>          * 1.) releases the ATTRI and frees the ATTRD
>          * 2.) shuts down the filesystem
>          */
>         args->trans->t_flags |= XFS_TRANS_DIRTY;
> ....
> 
> Ok, so the problem path is a create that ends up being a pure leaf
> add operation. The trace looks like (trimmed for readability):
> 
> # trace-cmd record -e xfs_attr\* -e xfs_defer\* -e printk
> ....
> 
>  xfs_attr_leaf_lookup:		ino 0x99a name x11 namelen 3
> valuelen 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
>  xfs_defer_finish:		tp 0xffff888802e27138 caller
> __xfs_trans_commit+0x144
>  xfs_defer_create_intent:	optype 5 intent (nil) committed 0 nr 1
>  xfs_defer_pending_finish:	optype 5 intent (nil) committed 0 nr 1
>  xfs_attr_leaf_lookup:		ino 0x99a name x11 namelen 3
> valuelen 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
>  xfs_attr_leaf_add:		ino 0x99a name x11 namelen 3 valuelen
> 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
>  xfs_attr_leaf_add_work:	ino 0x99a name x11 namelen 3 valuelen
> 28 hashval 0x1e18b1 filter  flags  op_flags ADDNAME|OKNOENT
>  xfs_attr_leaf_addname_return:	state change 4 ino 0x99a
>  xfs_defer_trans_roll:		tp 0xffff888802e27138 caller
> xfs_defer_finish_noroll+0x2a5
>  xfs_defer_pending_finish:	optype 5 intent (nil) committed 0 nr 1
>  xfs_defer_finish_done:		tp 0xffff888802e273f0 caller
> __xfs_trans_commit+0x144
>  console:			[   94.219375] XFS: Assertion failed:
> !list_empty(&cil->xc_cil), file: fs/xfs/
> 
> So we have a create/set operation here. THe first lookup is to check
> that xattr exists or not. Gets -ENOENT so it sets the transaction
> to deferred and commits it. That gets us into xfs_defer_finish()
> where we process the xattri. We don't create an intent (because larp
> is false), then we finish it. This calls into:
> 
> xfs_xattri_finish_update()
>   xfs_attr_set_iter()
>     case XFS_DAS_UNINIT:
>       xfs_attr_leaf_addname()
>         xfs_attr_leaf_try_add()
> 	  xfs_attr3_leaf_add()
> 	    xfs_attr3_leaf_add_work()
>         attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
> 	trace_xfs_attr_leaf_addname_return()
> 	  state change 4, XFS_DAS_FOUND_LBLK == 4
> 	return -EAGAIN;
> 
> args->trans->t_flags |= XFS_TRANS_DIRTY;
> 
> Now we return -EAGAIN to xfs_defer_finish_one(), which requeues
> the defered item onto the deferred work, which then returns to
> xfs_defer_finish_noroll() and we go around the loop again.
> We roll the dirty transaction that contained the dirty leaf buffer,
> committing it, then run the loop again.
> 
> This time however, we run:
> 
> xfs_xattri_finish_update
>   xfs_attr_set_iter
>     case XFS_DAS_FOUND_LBLK:
>       <tries to set up and copy remote xattr val>  <XXX - why?>
>       <no remote xattr, nothing dirtied>
>       <not a RENAME op>
>         <no remote blocks, nothing dirtied>
>       return 0;
> 
> args->trans->t_flags |= XFS_TRANS_DIRTY;
> 
> So, at this point, we now have a dirty transaction with no modified
> objects in it. All we need to do how is have some other thread flush
> the CIL and then for this task to win the race to be the first
> transaction to commit once the push switches to a new, empty
> context and unlocks the context lock....
> 
> And then xlog_cil_commit() trips over an empty CIL because we had a
> dirty transaction with no dirty items attached to it.
> 
> So, the code snippet I pointed to above that unconditionally makes
> the xattr transaction dirty is invalid. We should only be setting
> the transaction dirty when we attach dirty an item attached to the
> transaction. If we need to abort because of an unrecoverable error,
> we need to shut down the log here. That will cause the transaction
> to be aborted when it returns to the core defer/commit code.
> 
> In debugging this, there are several things I've noticed that need
> correcting/fixing. Rather than go around the review circle to try to
> get them all understood and fixed, I think I'm just going to writing
> patches and send them out for review as I get them done and tested.
> The faster we knock out the problems, the sooner we get this stuff
> merged.
> 
> I'll start on this tomorrow morning....
Ok then, I will wait to hear a reply on this so that we don't duplicate
each others work efforts.  Let me know if you need me to pick something
up.  Thanks!

Allison

> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-11 20:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-05  2:03 [5.19 cycle] Planning and goals Dave Chinner
2022-04-07  3:11 ` Darrick J. Wong
2022-04-07  5:49   ` Dave Chinner
2022-04-07 22:40     ` Alli
2022-04-11  1:50       ` Dave Chinner
2022-04-11  3:59         ` Dave Chinner
2022-04-11  7:31           ` Dave Chinner
2022-04-11  8:50             ` Dave Chinner
2022-04-11 20:00               ` Alli
2022-04-11 19:38           ` Alli
2022-04-10 18:21     ` Darrick J. Wong
2022-04-11  1:51       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.