All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <martin@lichtvoll.de>
To: linux-block@vger.kernel.org, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
Date: Sat, 26 Jun 2021 10:49:30 +0200	[thread overview]
Message-ID: <2009039.b04VgvrTqe@ananda> (raw)
In-Reply-To: <fe83dadc-bbcf-2f85-6664-bad3fcd83553@gmx.com>

Qu Wenruo - 26.06.21, 02:27:54 CEST:
> On 2021/6/26 上午3:06, Martin Steigerwald wrote:
> > Hi!
> > 
> > I found repeatedly that Baloo indexes the same files twice or even
> > more often after a while.
> > 
> > I reported this upstream in:
> > 
> > Bug 438434 - Baloo appears to be indexing twice the number of files
> > than are actually in my home directory
> > 
> > https://bugs.kde.org/show_bug.cgi?id=438434
> > 
> > And got back that if the device number changes, Baloo will think it
> > has new files even tough the path is still the same. And found over
> > time that the device number for the single BTRFS filesystem on a
> > NVMe SSD in a ThinkPad T14 Gen1 AMD can change. It is not (maybe
> > yet) RAID 1. I do have BTRFS RAID 1 in another laptop and there I
> > also had this issue already.
> 
> Since btrfs has multi-device support by default, it reports anonymous
> device number, just as if you use a filesystem over LVM.

Ah, this!

I forgot to mention that: I use BTRFS on top of LVM on top of LUKS based 
dm-crypt on a partition on the NVMe SSD. Sorry, somehow I forgot to 
mention that here. I mentioned it in the bug report. I'd use a different 
approach if there would be one that give me full disk encryption. I am 
not willing to use ecryptfs on top of BTRFS and as far as I know BTRFS 
cannot yet encrypt by itself.

I still think this could give a fixed order of loading:

1. Unlock LUKS.

2. Activate LVM logical volumes. No idea whether that happens in a fixed 
order though or whether it can have a different order on each boot.

3. Mount BTRFS. /home is always on the same subvolume. So that should 
not change.

> The problem is why the anonymous device number change.

Good question. Maybe I have an idea about that. See below.

> > I argued that a desktop application has no business to rely on a
> > device number and got back that search/indexing is in the middle
> > between an application and system software. And that Baloo needs an
> > "invariant" for a file. See comment #11 of that bug report:
> > 
> > https://bugs.kde.org/show_bug.cgi?id=438434#c11
> 
> Well, a lot of tools relies on device number to distinguish filesystem
> boundary, like find.
> Thus it's a little hard to argue.
> 
> But on the other hand, it also means baloo can't handle regular fs
> over LVM cases well neither.

Yes. Also it could not handle the case of a driver loading race 
condition with two or more different controllers in a desktop machine.

> > I got the suggestion to try to find a way to tell the kernel to use
> > a fixed device number.
> 
> I don't think it's possible for btrfs, as each subvolume get its
> anonymous device number assigned when it gets first read.
> 
> Thus it's really hard to make it fixed, as the reason for anonymous
> device number is to avoid conflicts.

Fair enough.

> > I still think, an application or an infrastructure service for a
> > desktop environment or even anything else in user space should not
> > rely on a device number to be fixed and never change upon reboots.
> 
> Well, LVM/device mapper is doing the same thing, a lot of behavior
> change is never a good idea for the kernel.
> 
> Thus for use cases where we really need a proper mapping, we use
> hashes, not just device number, like what we did in dupremover.

I think I suggested that some time ago.

> > Another question would be whether I could somehow make sure that the
> > device number does not change, even if just as a work-around.
> 
> If you really just want a fixed device number, you can ensure that by:
> 
> - Make sure all users of anonymous devices get fixed sequence
>    Things like device mapper/LVM, btrfs should get loaded/initialized
>    in a fixed order.

Ah, I see.

> - Make sure the subvolume you care always get mounted/read before any
>    other subvolumes
>    So that the target subvolume always get the first device number in
> the pool.

Hmm, that may be a pointer. This is what I currently have in fstab:

/dev/nvme/home /home btrfs lazytime,compress=zstd 0 0
/dev/nvme/home /zeit/home btrfs subvol=zeit 0 0

In the first line the default subvolume is used which I changed 
accordingly after creating this BTRFS. I use the approach to keep 
(temporary) snapshots separated from the directory tree in /home.

Could it be that this order between these two mounts is not the same on 
every boot? I use Devuan with Runit, so the mounting would happen by 
some init scripts (instead of Systemd).

I am not aware of an option for fstab to mount this one first and then 
the other second, but I could set the second mount to noauto and mount 
it when I need it.

>    But this also means, all later subvolumes not in the fixed
> mount/read sequence can not get a fixed number.

I somehow thought this would get complicated.

Best,
-- 
Martin



  reply	other threads:[~2021-06-26  8:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25 19:06 Assumption on fixed device numbers in Plasma's desktop search Baloo Martin Steigerwald
2021-06-26  0:27 ` Qu Wenruo
2021-06-26  8:49   ` Martin Steigerwald [this message]
2021-06-26  9:33     ` Qu Wenruo
2021-06-26 10:18       ` Martin Steigerwald
2021-06-26  0:54 ` NeilBrown
2021-06-26  3:38   ` Bart Van Assche
2021-06-26  5:17     ` NeilBrown
2021-06-26  6:14       ` Andrei Borzenkov
2021-06-26  6:24         ` Qu Wenruo
2021-06-26  8:51   ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2009039.b04VgvrTqe@ananda \
    --to=martin@lichtvoll.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.