archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <>
Subject: infinite looping in logical_ino ioctl
Date: Fri, 11 Nov 2022 22:49:17 -0500	[thread overview]
Message-ID: <Y28XvZmK0bAS4Ht/> (raw)

I've been chasing an infinite loop in the logical_ino ioctl that appears
when dedupe and logical_ino are used together on the same filesystem.
An easy way to do that is to run bees on a CPU with a double-digit
number of cores, but I've also knocked down servers several times in
the last year by running accidentally allowing 'btdu' and 'duperemove'
to run at the same time.

The bug has been highly resistant to analysis.  Even in the best cases,
it takes up to 70 hours of heavy dedupe+logical_ino on every core to
trigger.  bpftrace relies on RCU, but the RCU grace period doesn't happen
on the core running the infinite loop, so bpftraces simply stop when
the bug occurs.  Almost any change to the code in fs/btrfs/backref.c,
even incrementing a static variable counter in some other function,
causes the problem to become much harder to repro, and another similar
change makes it come back.  Once the infinite loop is started, it's
fairly robust--nothing but a reboot gets it out of the loop.

Yesterday I was been able to capture the bug in kgdb on an
unmodified 5.19.16 kernel after 60 hours, and the infinite loop is in

    462         while (!ret && count < ref->count) {
    486                 fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
    487                 disk_byte = btrfs_file_extent_disk_bytenr(eb, fi);
    488                 data_offset = btrfs_file_extent_offset(eb, fi);
    490                 if (disk_byte == wanted_disk_byte) {
    517 next:
    518                 if (time_seq == BTRFS_SEQ_LAST)
    519                         ret = btrfs_next_item(root, path);
    520                 else
    521                         ret = btrfs_next_old_item(root, path, time_seq);
    522         }

In the infinite looping case, time_seq is a 4-digit number, ret and
count are always 0, ref->count is 1, and disk_byte != wanted_disk_byte.
Those conditions never change, so we can't get out of this loop.

When I tried to probe more deeply what btrfs_next_old_item was doing,
I found that the code is somehow executing btrfs_next_item on line 519,
despite time_seq having the value 3722 at the time.  Iteration over the
items in views at two different points in time sounds like it could
result in infinite looping, but I wasn't able to confirm that before
the gdb session died.  I'm now waiting for my test VM to repro again.

Maybe this bug isn't in the _code_ after all...?  I wasn't expecting
to get here, so I'm not sure what to try next.

Kernel built on Debian with gcc (Debian 12.2.0-7) 12.2.0.

Old references:

User report of bees lockup:

My previous attempt to bisect the bug or use bpftrace on it:

             reply	other threads:[~2022-11-12  3:49 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-12  3:49 Zygo Blaxell [this message]
2022-11-13 20:05 ` infinite looping in logical_ino ioctl Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y28XvZmK0bAS4Ht/ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).