linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Assumption on fixed device numbers in Plasma's desktop search Baloo
@ 2021-06-25 19:06 Martin Steigerwald
  2021-06-26  0:27 ` Qu Wenruo
  2021-06-26  0:54 ` NeilBrown
  0 siblings, 2 replies; 11+ messages in thread
From: Martin Steigerwald @ 2021-06-25 19:06 UTC (permalink / raw)
  To: linux-block; +Cc: linux-btrfs

Hi!

I found repeatedly that Baloo indexes the same files twice or even more 
often after a while.

I reported this upstream in:

Bug 438434 - Baloo appears to be indexing twice the number of files than 
are actually in my home directory 

https://bugs.kde.org/show_bug.cgi?id=438434

And got back that if the device number changes, Baloo will think it has 
new files even tough the path is still the same. And found over time that 
the device number for the single BTRFS filesystem on a NVMe SSD in a 
ThinkPad T14 Gen1 AMD can change. It is not (maybe yet) RAID 1. I do 
have BTRFS RAID 1 in another laptop and there I also had this issue 
already.

I argued that a desktop application has no business to rely on a device 
number and got back that search/indexing is in the middle between an 
application and system software. And that Baloo needs an "invariant" for 
a file. See comment #11 of that bug report:

https://bugs.kde.org/show_bug.cgi?id=438434#c11

I got the suggestion to try to find a way to tell the kernel to use a 
fixed device number. 

I still think, an application or an infrastructure service for a desktop 
environment or even anything else in user space should not rely on a 
device number to be fixed and never change upon reboots.

But maybe you have a different idea about that and it is okay for an 
userspace component to do that. I would like to hear your idea about 
that.

Another question would be whether I could somehow make sure that the 
device number does not change, even if just as a work-around. I know for 
NFS there is a fsid= mount option, but it does not appear to be 
something generic, at least the mount man page seems to have nothing 
related to fsid.


Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-25 19:06 Assumption on fixed device numbers in Plasma's desktop search Baloo Martin Steigerwald
@ 2021-06-26  0:27 ` Qu Wenruo
  2021-06-26  8:49   ` Martin Steigerwald
  2021-06-26  0:54 ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2021-06-26  0:27 UTC (permalink / raw)
  To: Martin Steigerwald, linux-block; +Cc: linux-btrfs



On 2021/6/26 上午3:06, Martin Steigerwald wrote:
> Hi!
>
> I found repeatedly that Baloo indexes the same files twice or even more
> often after a while.
>
> I reported this upstream in:
>
> Bug 438434 - Baloo appears to be indexing twice the number of files than
> are actually in my home directory
>
> https://bugs.kde.org/show_bug.cgi?id=438434
>
> And got back that if the device number changes, Baloo will think it has
> new files even tough the path is still the same. And found over time that
> the device number for the single BTRFS filesystem on a NVMe SSD in a
> ThinkPad T14 Gen1 AMD can change. It is not (maybe yet) RAID 1. I do
> have BTRFS RAID 1 in another laptop and there I also had this issue
> already.

Since btrfs has multi-device support by default, it reports anonymous
device number, just as if you use a filesystem over LVM.

The problem is why the anonymous device number change.

If the fs is always mounted at a fixed sequence with fixed
snapshots/subvolume mount, it should not get a new anonymous device number.

But if snapshots or new subvolumes are involved, or just
mounting/reading subvolumes in different order, then the device number
for each subvolume will change.

>
> I argued that a desktop application has no business to rely on a device
> number and got back that search/indexing is in the middle between an
> application and system software. And that Baloo needs an "invariant" for
> a file. See comment #11 of that bug report:
>
> https://bugs.kde.org/show_bug.cgi?id=438434#c11

Well, a lot of tools relies on device number to distinguish filesystem
boundary, like find.
Thus it's a little hard to argue.

But on the other hand, it also means baloo can't handle regular fs over
LVM cases well neither.

>
> I got the suggestion to try to find a way to tell the kernel to use a
> fixed device number.

I don't think it's possible for btrfs, as each subvolume get its
anonymous device number assigned when it gets first read.

Thus it's really hard to make it fixed, as the reason for anonymous
device number is to avoid conflicts.

>
> I still think, an application or an infrastructure service for a desktop
> environment or even anything else in user space should not rely on a
> device number to be fixed and never change upon reboots.

Well, LVM/device mapper is doing the same thing, a lot of behavior
change is never a good idea for the kernel.

Thus for use cases where we really need a proper mapping, we use hashes,
not just device number, like what we did in dupremover.

>
> But maybe you have a different idea about that and it is okay for an
> userspace component to do that. I would like to hear your idea about
> that.
>
> Another question would be whether I could somehow make sure that the
> device number does not change, even if just as a work-around.

If you really just want a fixed device number, you can ensure that by:

- Make sure all users of anonymous devices get fixed sequence
   Things like device mapper/LVM, btrfs should get loaded/initialized
   in a fixed order.

- Make sure the subvolume you care always get mounted/read before any
   other subvolumes
   So that the target subvolume always get the first device number in the
   pool.

   But this also means, all later subvolumes not in the fixed mount/read
   sequence can not get a fixed number.

Thanks,
Qu

> I know for
> NFS there is a fsid= mount option, but it does not appear to be
> something generic, at least the mount man page seems to have nothing
> related to fsid.
>
>
> Best,
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-25 19:06 Assumption on fixed device numbers in Plasma's desktop search Baloo Martin Steigerwald
  2021-06-26  0:27 ` Qu Wenruo
@ 2021-06-26  0:54 ` NeilBrown
  2021-06-26  3:38   ` Bart Van Assche
  2021-06-26  8:51   ` Martin Steigerwald
  1 sibling, 2 replies; 11+ messages in thread
From: NeilBrown @ 2021-06-26  0:54 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-block, linux-btrfs

On Sat, 26 Jun 2021, Martin Steigerwald wrote:
> Hi!
> 
> I found repeatedly that Baloo indexes the same files twice or even more 
> often after a while.
> 
> I reported this upstream in:
> 
> Bug 438434 - Baloo appears to be indexing twice the number of files than 
> are actually in my home directory 
> 
> https://bugs.kde.org/show_bug.cgi?id=438434
> 
> And got back that if the device number changes, Baloo will think it has 
> new files even tough the path is still the same. And found over time that 
> the device number for the single BTRFS filesystem on a NVMe SSD in a 
> ThinkPad T14 Gen1 AMD can change. It is not (maybe yet) RAID 1. I do 
> have BTRFS RAID 1 in another laptop and there I also had this issue 
> already.
> 
> I argued that a desktop application has no business to rely on a device 
> number and got back that search/indexing is in the middle between an 
> application and system software.

NO SOFTWARE can rely on device numbers being stable in Linux.  Not
desktop, not system, not anything.  They are stable while the device is
in use (e.g. while the filesystem is mounted) but can definitely change
on reboot.  This has been the case since about Linux 2.4.

>                                  And that Baloo needs an "invariant" for 
> a file. See comment #11 of that bug report:

That is really hard to provide in general.  Possibly the best approach
is to use the statfs() systemcall to get the "f_fsid" field.  This is
64bits.  It is not supported uniformly well by all filesystems, but I
think it is at least not worse than using the device number.  For a lot
of older filesystems it is just an encoding of the device number.

For btrfs, xfs, ext4 it is much much better.

NeilBrown


> 
> https://bugs.kde.org/show_bug.cgi?id=438434#c11
> 
> I got the suggestion to try to find a way to tell the kernel to use a 
> fixed device number. 
> 
> I still think, an application or an infrastructure service for a desktop 
> environment or even anything else in user space should not rely on a 
> device number to be fixed and never change upon reboots.
> 
> But maybe you have a different idea about that and it is okay for an 
> userspace component to do that. I would like to hear your idea about 
> that.
> 
> Another question would be whether I could somehow make sure that the 
> device number does not change, even if just as a work-around. I know for 
> NFS there is a fsid= mount option, but it does not appear to be 
> something generic, at least the mount man page seems to have nothing 
> related to fsid.
> 
> 
> Best,
> -- 
> Martin
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  0:54 ` NeilBrown
@ 2021-06-26  3:38   ` Bart Van Assche
  2021-06-26  5:17     ` NeilBrown
  2021-06-26  8:51   ` Martin Steigerwald
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2021-06-26  3:38 UTC (permalink / raw)
  To: NeilBrown, Martin Steigerwald; +Cc: linux-block, linux-btrfs

On 6/25/21 5:54 PM, NeilBrown wrote:
> On Sat, 26 Jun 2021, Martin Steigerwald wrote:
>>                                  And that Baloo needs an "invariant" for 
>> a file. See comment #11 of that bug report:
> 
> That is really hard to provide in general.  Possibly the best approach
> is to use the statfs() systemcall to get the "f_fsid" field.  This is
> 64bits.  It is not supported uniformly well by all filesystems, but I
> think it is at least not worse than using the device number.  For a lot
> of older filesystems it is just an encoding of the device number.
> 
> For btrfs, xfs, ext4 it is much much better.

How about combining the UUID of the partition with the file path? An
example from one of the VMs on my workstation:

$ df .
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/vda1       25670972 12730276  11613648  53% /
$ lsblk -O | grep vda1
└─vda1 vda1  /dev/vda1 252:1     11.1G  24.5G ext4    12.1G    50% 1.0
 /                84cebea8-7e6f-4c2a-8a1b-8bc0c9744751 ae2151de
                    dos    0x83     Linux                  ae2151de-01
                        0x80      128  0  0       0
                 25G         root  disk  brw-rw----         0    512
  0     512     512    1 mq-deadline     256 part        0      512B
   2G         0    0B        0 vda                      block:virtio:pci
                   none    0

In other words, UUID 84cebea8-7e6f-4c2a-8a1b-8bc0c9744751 has been
associated with the block device under the filesystem that owns the
directory from which the 'df' command has been run.

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  3:38   ` Bart Van Assche
@ 2021-06-26  5:17     ` NeilBrown
  2021-06-26  6:14       ` Andrei Borzenkov
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2021-06-26  5:17 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Martin Steigerwald, linux-block, linux-btrfs

On Sat, 26 Jun 2021, Bart Van Assche wrote:
> On 6/25/21 5:54 PM, NeilBrown wrote:
> > On Sat, 26 Jun 2021, Martin Steigerwald wrote:
> >>                                  And that Baloo needs an "invariant" for 
> >> a file. See comment #11 of that bug report:
> > 
> > That is really hard to provide in general.  Possibly the best approach
> > is to use the statfs() systemcall to get the "f_fsid" field.  This is
> > 64bits.  It is not supported uniformly well by all filesystems, but I
> > think it is at least not worse than using the device number.  For a lot
> > of older filesystems it is just an encoding of the device number.
> > 
> > For btrfs, xfs, ext4 it is much much better.
> 
> How about combining the UUID of the partition with the file path? An
> example from one of the VMs on my workstation:

A btrfs filesystem can span multiple partitions, and those partitions
can be added and removed dynamically.  So you could migrated from one to
another.

f_fsid really is best for any modern filesystem.

NeilBrown


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  5:17     ` NeilBrown
@ 2021-06-26  6:14       ` Andrei Borzenkov
  2021-06-26  6:24         ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Andrei Borzenkov @ 2021-06-26  6:14 UTC (permalink / raw)
  To: NeilBrown, Bart Van Assche; +Cc: Martin Steigerwald, linux-block, linux-btrfs

On 26.06.2021 08:17, NeilBrown wrote:
> On Sat, 26 Jun 2021, Bart Van Assche wrote:
>> On 6/25/21 5:54 PM, NeilBrown wrote:
>>> On Sat, 26 Jun 2021, Martin Steigerwald wrote:
>>>>                                  And that Baloo needs an "invariant" for 
>>>> a file. See comment #11 of that bug report:
>>>
>>> That is really hard to provide in general.  Possibly the best approach
>>> is to use the statfs() systemcall to get the "f_fsid" field.  This is
>>> 64bits.  It is not supported uniformly well by all filesystems, but I
>>> think it is at least not worse than using the device number.  For a lot
>>> of older filesystems it is just an encoding of the device number.
>>>
>>> For btrfs, xfs, ext4 it is much much better.
>>
>> How about combining the UUID of the partition with the file path? An
>> example from one of the VMs on my workstation:
> 
> A btrfs filesystem can span multiple partitions, and those partitions
> can be added and removed dynamically.  So you could migrated from one to
> another.
> 

I suspect it was intended to be "filesytemm UUID". At least that is the
field in lsblk output that was referenced.

> f_fsid really is best for any modern filesystem.
> 
> NeilBrown
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  6:14       ` Andrei Borzenkov
@ 2021-06-26  6:24         ` Qu Wenruo
  0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2021-06-26  6:24 UTC (permalink / raw)
  To: Andrei Borzenkov, NeilBrown, Bart Van Assche
  Cc: Martin Steigerwald, linux-block, linux-btrfs



On 2021/6/26 下午2:14, Andrei Borzenkov wrote:
> On 26.06.2021 08:17, NeilBrown wrote:
>> On Sat, 26 Jun 2021, Bart Van Assche wrote:
>>> On 6/25/21 5:54 PM, NeilBrown wrote:
>>>> On Sat, 26 Jun 2021, Martin Steigerwald wrote:
>>>>>                                   And that Baloo needs an "invariant" for
>>>>> a file. See comment #11 of that bug report:
>>>>
>>>> That is really hard to provide in general.  Possibly the best approach
>>>> is to use the statfs() systemcall to get the "f_fsid" field.  This is
>>>> 64bits.  It is not supported uniformly well by all filesystems, but I
>>>> think it is at least not worse than using the device number.  For a lot
>>>> of older filesystems it is just an encoding of the device number.
>>>>
>>>> For btrfs, xfs, ext4 it is much much better.
>>>
>>> How about combining the UUID of the partition with the file path? An
>>> example from one of the VMs on my workstation:
>>
>> A btrfs filesystem can span multiple partitions, and those partitions
>> can be added and removed dynamically.  So you could migrated from one to
>> another.
>>
>
> I suspect it was intended to be "filesytemm UUID". At least that is the
> field in lsblk output that was referenced.

Filesystem UUID is not enough.

In btrfs, all subvolumes share the same fsid.

While for statfs() call, we do extra XOR with subvolume id to generate
unique f_fsid for each subvolume.

Thanks,
Qu
>
>> f_fsid really is best for any modern filesystem.
>>
>> NeilBrown
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  0:27 ` Qu Wenruo
@ 2021-06-26  8:49   ` Martin Steigerwald
  2021-06-26  9:33     ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2021-06-26  8:49 UTC (permalink / raw)
  To: linux-block, Qu Wenruo; +Cc: linux-btrfs

Qu Wenruo - 26.06.21, 02:27:54 CEST:
> On 2021/6/26 上午3:06, Martin Steigerwald wrote:
> > Hi!
> > 
> > I found repeatedly that Baloo indexes the same files twice or even
> > more often after a while.
> > 
> > I reported this upstream in:
> > 
> > Bug 438434 - Baloo appears to be indexing twice the number of files
> > than are actually in my home directory
> > 
> > https://bugs.kde.org/show_bug.cgi?id=438434
> > 
> > And got back that if the device number changes, Baloo will think it
> > has new files even tough the path is still the same. And found over
> > time that the device number for the single BTRFS filesystem on a
> > NVMe SSD in a ThinkPad T14 Gen1 AMD can change. It is not (maybe
> > yet) RAID 1. I do have BTRFS RAID 1 in another laptop and there I
> > also had this issue already.
> 
> Since btrfs has multi-device support by default, it reports anonymous
> device number, just as if you use a filesystem over LVM.

Ah, this!

I forgot to mention that: I use BTRFS on top of LVM on top of LUKS based 
dm-crypt on a partition on the NVMe SSD. Sorry, somehow I forgot to 
mention that here. I mentioned it in the bug report. I'd use a different 
approach if there would be one that give me full disk encryption. I am 
not willing to use ecryptfs on top of BTRFS and as far as I know BTRFS 
cannot yet encrypt by itself.

I still think this could give a fixed order of loading:

1. Unlock LUKS.

2. Activate LVM logical volumes. No idea whether that happens in a fixed 
order though or whether it can have a different order on each boot.

3. Mount BTRFS. /home is always on the same subvolume. So that should 
not change.

> The problem is why the anonymous device number change.

Good question. Maybe I have an idea about that. See below.

> > I argued that a desktop application has no business to rely on a
> > device number and got back that search/indexing is in the middle
> > between an application and system software. And that Baloo needs an
> > "invariant" for a file. See comment #11 of that bug report:
> > 
> > https://bugs.kde.org/show_bug.cgi?id=438434#c11
> 
> Well, a lot of tools relies on device number to distinguish filesystem
> boundary, like find.
> Thus it's a little hard to argue.
> 
> But on the other hand, it also means baloo can't handle regular fs
> over LVM cases well neither.

Yes. Also it could not handle the case of a driver loading race 
condition with two or more different controllers in a desktop machine.

> > I got the suggestion to try to find a way to tell the kernel to use
> > a fixed device number.
> 
> I don't think it's possible for btrfs, as each subvolume get its
> anonymous device number assigned when it gets first read.
> 
> Thus it's really hard to make it fixed, as the reason for anonymous
> device number is to avoid conflicts.

Fair enough.

> > I still think, an application or an infrastructure service for a
> > desktop environment or even anything else in user space should not
> > rely on a device number to be fixed and never change upon reboots.
> 
> Well, LVM/device mapper is doing the same thing, a lot of behavior
> change is never a good idea for the kernel.
> 
> Thus for use cases where we really need a proper mapping, we use
> hashes, not just device number, like what we did in dupremover.

I think I suggested that some time ago.

> > Another question would be whether I could somehow make sure that the
> > device number does not change, even if just as a work-around.
> 
> If you really just want a fixed device number, you can ensure that by:
> 
> - Make sure all users of anonymous devices get fixed sequence
>    Things like device mapper/LVM, btrfs should get loaded/initialized
>    in a fixed order.

Ah, I see.

> - Make sure the subvolume you care always get mounted/read before any
>    other subvolumes
>    So that the target subvolume always get the first device number in
> the pool.

Hmm, that may be a pointer. This is what I currently have in fstab:

/dev/nvme/home /home btrfs lazytime,compress=zstd 0 0
/dev/nvme/home /zeit/home btrfs subvol=zeit 0 0

In the first line the default subvolume is used which I changed 
accordingly after creating this BTRFS. I use the approach to keep 
(temporary) snapshots separated from the directory tree in /home.

Could it be that this order between these two mounts is not the same on 
every boot? I use Devuan with Runit, so the mounting would happen by 
some init scripts (instead of Systemd).

I am not aware of an option for fstab to mount this one first and then 
the other second, but I could set the second mount to noauto and mount 
it when I need it.

>    But this also means, all later subvolumes not in the fixed
> mount/read sequence can not get a fixed number.

I somehow thought this would get complicated.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  0:54 ` NeilBrown
  2021-06-26  3:38   ` Bart Van Assche
@ 2021-06-26  8:51   ` Martin Steigerwald
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Steigerwald @ 2021-06-26  8:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-block, linux-btrfs

NeilBrown - 26.06.21, 02:54:09 CEST:
> > And that Baloo needs an "invariant" for
> 
> > a file. See comment #11 of that bug report:
> That is really hard to provide in general.  Possibly the best approach
> is to use the statfs() systemcall to get the "f_fsid" field.  This is
> 64bits.  It is not supported uniformly well by all filesystems, but I
> think it is at least not worse than using the device number.  For a
> lot of older filesystems it is just an encoding of the device number.
> 
> For btrfs, xfs, ext4 it is much much better.

Thank you for the clear statement and for your alternative suggestion. I 
will forward this to Baloo upstream.

I think the main focus of Baloo would be to work on currently mostly in 
use Linux filesystem which should be BTRFS, XFS, EXT4 and probably F2FS.

-- 
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  8:49   ` Martin Steigerwald
@ 2021-06-26  9:33     ` Qu Wenruo
  2021-06-26 10:18       ` Martin Steigerwald
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2021-06-26  9:33 UTC (permalink / raw)
  To: Martin Steigerwald, linux-block; +Cc: linux-btrfs



On 2021/6/26 下午4:49, Martin Steigerwald wrote:
> Qu Wenruo - 26.06.21, 02:27:54 CEST:
>> On 2021/6/26 上午3:06, Martin Steigerwald wrote:
>>> Hi!
>>>
>>> I found repeatedly that Baloo indexes the same files twice or even
>>> more often after a while.
>>>
>>> I reported this upstream in:
>>>
>>> Bug 438434 - Baloo appears to be indexing twice the number of files
>>> than are actually in my home directory
>>>
>>> https://bugs.kde.org/show_bug.cgi?id=438434
>>>
>>> And got back that if the device number changes, Baloo will think it
>>> has new files even tough the path is still the same. And found over
>>> time that the device number for the single BTRFS filesystem on a
>>> NVMe SSD in a ThinkPad T14 Gen1 AMD can change. It is not (maybe
>>> yet) RAID 1. I do have BTRFS RAID 1 in another laptop and there I
>>> also had this issue already.
>>
>> Since btrfs has multi-device support by default, it reports anonymous
>> device number, just as if you use a filesystem over LVM.
>
> Ah, this!
>
> I forgot to mention that: I use BTRFS on top of LVM on top of LUKS based
> dm-crypt on a partition on the NVMe SSD. Sorry, somehow I forgot to
> mention that here. I mentioned it in the bug report. I'd use a different
> approach if there would be one that give me full disk encryption. I am
> not willing to use ecryptfs on top of BTRFS and as far as I know BTRFS
> cannot yet encrypt by itself.
>
> I still think this could give a fixed order of loading:
>
> 1. Unlock LUKS.
>
> 2. Activate LVM logical volumes. No idea whether that happens in a fixed
> order though or whether it can have a different order on each boot.

LVM/LUKS normally isn't a big deal, as most of them are initialized
before btrfs, and have a pretty fixed initialization sequence.

Unless you change the LVM setup, then at least all your LVs should have
a fixed device number.
(But there are still cases where kernel update may change them)

>
> 3. Mount BTRFS. /home is always on the same subvolume. So that should
> not change.

Normally it won't change.

But it's more dependent on the btrfs behavior.

Thus I'm not that confident it won't change forever.

But at this point I guess you already get the point, under normal cases,
no config change then device number won't change.

However any change in kernel/storage stack/config can lead to different
device number.

>
>> The problem is why the anonymous device number change.
>
> Good question. Maybe I have an idea about that. See below.
>
>>> I argued that a desktop application has no business to rely on a
>>> device number and got back that search/indexing is in the middle
>>> between an application and system software. And that Baloo needs an
>>> "invariant" for a file. See comment #11 of that bug report:
>>>
>>> https://bugs.kde.org/show_bug.cgi?id=438434#c11
>>
>> Well, a lot of tools relies on device number to distinguish filesystem
>> boundary, like find.
>> Thus it's a little hard to argue.
>>
>> But on the other hand, it also means baloo can't handle regular fs
>> over LVM cases well neither.
>
> Yes. Also it could not handle the case of a driver loading race
> condition with two or more different controllers in a desktop machine.

Thus the idea from Neil should help, instead of using device number,
using f_fsid from statfs() should provide a way more stable result.

And f_fsid can also handle btrfs subvolumes pretty well.

But this also means, if one day you change your default/mounted
subvolume, baloo will again rebuild the cache using the new f_fsid.

>
>>> I got the suggestion to try to find a way to tell the kernel to use
>>> a fixed device number.
>>
>> I don't think it's possible for btrfs, as each subvolume get its
>> anonymous device number assigned when it gets first read.
>>
>> Thus it's really hard to make it fixed, as the reason for anonymous
>> device number is to avoid conflicts.
>
> Fair enough.
>
>>> I still think, an application or an infrastructure service for a
>>> desktop environment or even anything else in user space should not
>>> rely on a device number to be fixed and never change upon reboots.
>>
>> Well, LVM/device mapper is doing the same thing, a lot of behavior
>> change is never a good idea for the kernel.
>>
>> Thus for use cases where we really need a proper mapping, we use
>> hashes, not just device number, like what we did in dupremover.
>
> I think I suggested that some time ago.
>
>>> Another question would be whether I could somehow make sure that the
>>> device number does not change, even if just as a work-around.
>>
>> If you really just want a fixed device number, you can ensure that by:
>>
>> - Make sure all users of anonymous devices get fixed sequence
>>     Things like device mapper/LVM, btrfs should get loaded/initialized
>>     in a fixed order.
>
> Ah, I see.
>
>> - Make sure the subvolume you care always get mounted/read before any
>>     other subvolumes
>>     So that the target subvolume always get the first device number in
>> the pool.
>
> Hmm, that may be a pointer. This is what I currently have in fstab:
>
> /dev/nvme/home /home btrfs lazytime,compress=zstd 0 0
> /dev/nvme/home /zeit/home btrfs subvol=zeit 0 0
>
> In the first line the default subvolume is used which I changed
> accordingly after creating this BTRFS. I use the approach to keep
> (temporary) snapshots separated from the directory tree in /home.
>
> Could it be that this order between these two mounts is not the same on
> every boot?
> I use Devuan with Runit, so the mounting would happen by
> some init scripts (instead of Systemd).

Then it's out of the scope of btrfs.

I was just wondering if systemd is involved, but you just ruled it out.
But still if the init tool choose to shuffle the mount sequence to do
more parallel mounts, then device number will be even more unreliable.

>
> I am not aware of an option for fstab to mount this one first and then
> the other second, but I could set the second mount to noauto and mount
> it when I need it.
>
>>     But this also means, all later subvolumes not in the fixed
>> mount/read sequence can not get a fixed number.
>
> I somehow thought this would get complicated.

It's already complicated.

So this just proves Neil is right, device number is only reliable at the
lifespan of the fs, nothing else.

Thanks,
Qu

>
> Best,
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
  2021-06-26  9:33     ` Qu Wenruo
@ 2021-06-26 10:18       ` Martin Steigerwald
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Steigerwald @ 2021-06-26 10:18 UTC (permalink / raw)
  To: linux-block, Qu Wenruo; +Cc: linux-btrfs

Qu Wenruo - 26.06.21, 11:33:17 CEST:
> > I am not aware of an option for fstab to mount this one first and
> > then the other second, but I could set the second mount to noauto
> > and mount it when I need it.
> > 
> >> But this also means, all later subvolumes not in the fixed
> >> mount/read sequence can not get a fixed number.
> > 
> > I somehow thought this would get complicated.
> 
> It's already complicated.
> 
> So this just proves Neil is right, device number is only reliable at
> the lifespan of the fs, nothing else.

Thank you again.

I informed upstream about the conclusions from this thread.

Let's see what they come up with.

They have an energy efficiency goal, for that it would be desirable to 
stop indexing files twice or thrice or even more times. :)

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-26 10:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25 19:06 Assumption on fixed device numbers in Plasma's desktop search Baloo Martin Steigerwald
2021-06-26  0:27 ` Qu Wenruo
2021-06-26  8:49   ` Martin Steigerwald
2021-06-26  9:33     ` Qu Wenruo
2021-06-26 10:18       ` Martin Steigerwald
2021-06-26  0:54 ` NeilBrown
2021-06-26  3:38   ` Bart Van Assche
2021-06-26  5:17     ` NeilBrown
2021-06-26  6:14       ` Andrei Borzenkov
2021-06-26  6:24         ` Qu Wenruo
2021-06-26  8:51   ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).