bcachefs backpointers errors in no write buf mode

* bcachefs backpointers errors in no write buf mode
@ 2023-07-07 17:32 Brian Foster
  2023-07-07 17:58 ` Kent Overstreet
  2023-07-07 20:37 ` Kent Overstreet
  0 siblings, 2 replies; 9+ messages in thread
From: Brian Foster @ 2023-07-07 17:32 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Hi Kent,

I'm doing some analysis of these "existing backpointer found when
inserting" errors observed when running generic/388 in
backpointers_no_use_write_buffer=1 mode. To start with context, here's
what I see in one particular variant of the issue...

We're in read bio completion path doing a CRC narrow. The commit for
that transaction runs trans triggers on the extent key, which then falls
into bch2_trans_mark_extent() and attempts backpointer updates from
there. I stuck a warning in the backpointer error checking code just to
get a stack trace for the error context, which is appended below for
reference.

On IRC Kent mentioned this was likely a trigger ordering issue and that
something similar was addressed in the normal backpointer write buffer
update path. I think I follow what's going on there from commit
0a1b82456400 ("bcachefs: btree write buffer update ordering"), but the
context for the no write buf path seems a bit different..

WRT ordering, if I follow the code correctly, run_btree_triggers() in
the commit path basically implements an overwrite followed by !overwrite
pass, except if the old and new key type/triggers match we call into
bch2_trans_mark_key() with both the old and new keys. This calls into
bch2_trans_mark_extent(), which then implements an explicit insert ->
overwrite order. If that falls into the backpointer write buffer path,
that then has to reorder the updates in the write buffer by putting the
overwrite update at the head, presumably so it won't remove the
backpointer insert that is about to be added. (I assume there are
legitimate design reasons for the update ordering at each level here,
but the reordering back and forth seems like unfortunate complexity.)

This raises a few questions wrt the non write buffer path. First, the
existing bp error check lives in
bch2_bucket_backpointer_mod_nowritebuffer() before we even call into
bch2_trans_update(), so changing the order of trans updates doesn't seem
like it would avoid this particular error. I don't think we want to
change the high level order of the extent key trigger, so that means we
probably need to do something with the error check itself.

It also seems like this is possibly different from whatever issue lead
to the ordering fix in the write buf path, though I'm not quite sure how
that one manifests or is reproduced. A missing backpointer due to
unexpected bp removal, perhaps? If so, I suppose I could see why the
updates should be reordered in the non-buf trans, assuming the overwrite
would delete the just inserted bp key from the trans (and then btree on
commit), and perhaps I'm just not seeing that because I hit the existing
bp error check first.

So I'm still trying to piece high level things together such that an
appropriate fix is not yet clear to me. I'm wondering if perhaps the
high level (narrow crc) trans needs a flag to modify the bp checking
behavior to indicate we aren't actually inserting a new bp here (so
seeing an existing one is fine/expected), and then subsequently
bch2_bucket_backpointer_mod_nowritebuffer() needs to do something to
turn the "insert" into a bp key overwrite, and perhaps ignore the higher
level overwrite, or something *handwavy* along those lines.

I need to think about this some more, but wanted to put some thoughts
down in case I'm misunderstanding things or there's an obvious fix I'm
just not seeing yet..

Brian

--- 8< ---

WARNING: CPU: 1 PID: 31479 at fs/bcachefs/backpointers.c:128 backpointer_mod_err.isra.0+0x193/0x1b0 [bcachefs]
...
CPU: 1 PID: 31479 Comm: kworker/u16:4 Tainted: G        W  OE      6.4.0+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc36 04/01/2014
Workqueue: events_unbound __bch2_read_endio [bcachefs]

RIP: 0010:backpointer_mod_err.isra.0+0x193/0x1b0 [bcachefs]
...
Call Trace:
 <TASK>
 ? backpointer_mod_err.isra.0+0x193/0x1b0 [bcachefs]
 ? __warn+0x7d/0x130
 ? backpointer_mod_err.isra.0+0x193/0x1b0 [bcachefs]
 ? report_bug+0x18d/0x1c0
 ? handle_bug+0x41/0x70
 ? exc_invalid_op+0x13/0x60
 ? asm_exc_invalid_op+0x16/0x20
 ? backpointer_mod_err.isra.0+0x193/0x1b0 [bcachefs]
 bch2_bucket_backpointer_mod_nowritebuffer+0x15b/0x210 [bcachefs]
 ? bch2_bucket_backpointer_mod_nowritebuffer+0x5/0x210 [bcachefs]
 bch2_bucket_backpointer_mod+0x2ad/0x2c0 [bcachefs]
 bch2_trans_mark_pointer.constprop.0+0x34d/0x360 [bcachefs]
 ? bch2_trans_start_alloc_update+0x5/0x140 [bcachefs]
 __trans_mark_extent+0x1c5/0x3e0 [bcachefs]
 ? __trace_bprintk+0x54/0x60
 bch2_trans_mark_extent+0x9f/0xe0 [bcachefs]
 run_btree_triggers+0x1ae/0x330 [bcachefs]
 __bch2_trans_commit+0x92/0x1e70 [bcachefs]
 ? __bch2_rbio_narrow_crcs+0xbb/0x380 [bcachefs]
 bch2_rbio_narrow_crcs+0xbf/0x100 [bcachefs]
 ? bch2_rbio_narrow_crcs+0x45/0x100 [bcachefs]
 __bch2_read_endio+0x7c7/0x8f0 [bcachefs]
 ? process_one_work+0x1c8/0x3c0
 process_one_work+0x1c8/0x3c0
 worker_thread+0x4d/0x380
 ? _raw_spin_lock_irqsave+0x23/0x50
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xf3/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread