linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Steigerwald <martin@lichtvoll.de>
To: linux-block@vger.kernel.org, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
Date: Sat, 26 Jun 2021 10:49:30 +0200	[thread overview]
Message-ID: <2009039.b04VgvrTqe@ananda> (raw)
In-Reply-To: <fe83dadc-bbcf-2f85-6664-bad3fcd83553@gmx.com>

Qu Wenruo - 26.06.21, 02:27:54 CEST:
> On 2021/6/26 上午3:06, Martin Steigerwald wrote:
> > Hi!
> > 
> > I found repeatedly that Baloo indexes the same files twice or even
> > more often after a while.
> > 
> > I reported this upstream in:
> > 
> > Bug 438434 - Baloo appears to be indexing twice the number of files
> > than are actually in my home directory
> > 
> > https://bugs.kde.org/show_bug.cgi?id=438434
> > 
> > And got back that if the device number changes, Baloo will think it
> > has new files even tough the path is still the same. And found over
> > time that the device number for the single BTRFS filesystem on a
> > NVMe SSD in a ThinkPad T14 Gen1 AMD can change. It is not (maybe
> > yet) RAID 1. I do have BTRFS RAID 1 in another laptop and there I
> > also had this issue already.
> 
> Since btrfs has multi-device support by default, it reports anonymous
> device number, just as if you use a filesystem over LVM.

Ah, this!

I forgot to mention that: I use BTRFS on top of LVM on top of LUKS based 
dm-crypt on a partition on the NVMe SSD. Sorry, somehow I forgot to 
mention that here. I mentioned it in the bug report. I'd use a different 
approach if there would be one that give me full disk encryption. I am 
not willing to use ecryptfs on top of BTRFS and as far as I know BTRFS 
cannot yet encrypt by itself.

I still think this could give a fixed order of loading:

1. Unlock LUKS.

2. Activate LVM logical volumes. No idea whether that happens in a fixed 
order though or whether it can have a different order on each boot.

3. Mount BTRFS. /home is always on the same subvolume. So that should 
not change.

> The problem is why the anonymous device number change.

Good question. Maybe I have an idea about that. See below.

> > I argued that a desktop application has no business to rely on a
> > device number and got back that search/indexing is in the middle
> > between an application and system software. And that Baloo needs an
> > "invariant" for a file. See comment #11 of that bug report:
> > 
> > https://bugs.kde.org/show_bug.cgi?id=438434#c11
> 
> Well, a lot of tools relies on device number to distinguish filesystem
> boundary, like find.
> Thus it's a little hard to argue.
> 
> But on the other hand, it also means baloo can't handle regular fs
> over LVM cases well neither.

Yes. Also it could not handle the case of a driver loading race 
condition with two or more different controllers in a desktop machine.

> > I got the suggestion to try to find a way to tell the kernel to use
> > a fixed device number.
> 
> I don't think it's possible for btrfs, as each subvolume get its
> anonymous device number assigned when it gets first read.
> 
> Thus it's really hard to make it fixed, as the reason for anonymous
> device number is to avoid conflicts.

Fair enough.

> > I still think, an application or an infrastructure service for a
> > desktop environment or even anything else in user space should not
> > rely on a device number to be fixed and never change upon reboots.
> 
> Well, LVM/device mapper is doing the same thing, a lot of behavior
> change is never a good idea for the kernel.
> 
> Thus for use cases where we really need a proper mapping, we use
> hashes, not just device number, like what we did in dupremover.

I think I suggested that some time ago.

> > Another question would be whether I could somehow make sure that the
> > device number does not change, even if just as a work-around.
> 
> If you really just want a fixed device number, you can ensure that by:
> 
> - Make sure all users of anonymous devices get fixed sequence
>    Things like device mapper/LVM, btrfs should get loaded/initialized
>    in a fixed order.

Ah, I see.

> - Make sure the subvolume you care always get mounted/read before any
>    other subvolumes
>    So that the target subvolume always get the first device number in
> the pool.

Hmm, that may be a pointer. This is what I currently have in fstab:

/dev/nvme/home /home btrfs lazytime,compress=zstd 0 0
/dev/nvme/home /zeit/home btrfs subvol=zeit 0 0

In the first line the default subvolume is used which I changed 
accordingly after creating this BTRFS. I use the approach to keep 
(temporary) snapshots separated from the directory tree in /home.

Could it be that this order between these two mounts is not the same on 
every boot? I use Devuan with Runit, so the mounting would happen by 
some init scripts (instead of Systemd).

I am not aware of an option for fstab to mount this one first and then 
the other second, but I could set the second mount to noauto and mount 
it when I need it.

>    But this also means, all later subvolumes not in the fixed
> mount/read sequence can not get a fixed number.

I somehow thought this would get complicated.

Best,
-- 
Martin



  reply	other threads:[~2021-06-26  8:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25 19:06 Assumption on fixed device numbers in Plasma's desktop search Baloo Martin Steigerwald
2021-06-26  0:27 ` Qu Wenruo
2021-06-26  8:49   ` Martin Steigerwald [this message]
2021-06-26  9:33     ` Qu Wenruo
2021-06-26 10:18       ` Martin Steigerwald
2021-06-26  0:54 ` NeilBrown
2021-06-26  3:38   ` Bart Van Assche
2021-06-26  5:17     ` NeilBrown
2021-06-26  6:14       ` Andrei Borzenkov
2021-06-26  6:24         ` Qu Wenruo
2021-06-26  8:51   ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2009039.b04VgvrTqe@ananda \
    --to=martin@lichtvoll.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).