* [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more @ 2024-02-21 0:51 Kent Overstreet 2024-02-21 15:06 ` Miklos Szeredi 0 siblings, 1 reply; 19+ messages in thread From: Kent Overstreet @ 2024-02-21 0:51 UTC (permalink / raw) To: linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc Recently we had a pretty long discussion on statx extensions, which eventually got a bit offtopic but nevertheless hashed out all the major issues. To summarize: - guaranteeing inode number uniqueness is becoming increasingly infeasible, we need a bit to tell userspace "inode number is not unique, use filehandle instead" - we need a new field (st_vol, volume ID) for subvolumes - subvolumes aren't filesystems and st_dev is problematic too - I'd like a bit for flagging subvolume roots, as well That's basic stuff. Beyond that, it would be useful to standardize APIs for - recursively enumerating subvolumes (without walking full fs heirarchy), and translating from volume IDs to paths - exposing snapshot tree structure - which is yet another tree structure we need to expose, completely unrelated to normal fs path tree structure - snapshot recovery - i.e. atomically replacing one subvolume with another subvolume that was a snapshot - setting default subvolume root - exposing disk usage of snapshots & probably more. Hoping to get some real participation from the btrfs crew - if they could talk about what they've done and what else they see a need for, that would be wonderful. Additionally, we tend to not have a lot of userspace people at LSF, but standardizing and improving these APIs is something userspace would _very_ much like us to do, so in addition to the usual crew I'm hoping to bring Neal Gompa to share that perspective. Cheers, Kent ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-21 0:51 [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more Kent Overstreet @ 2024-02-21 15:06 ` Miklos Szeredi 2024-02-21 21:04 ` Kent Overstreet 2024-02-21 21:08 ` Josef Bacik 0 siblings, 2 replies; 19+ messages in thread From: Miklos Szeredi @ 2024-02-21 15:06 UTC (permalink / raw) To: Kent Overstreet Cc: linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Recently we had a pretty long discussion on statx extensions, which > eventually got a bit offtopic but nevertheless hashed out all the major > issues. > > To summarize: > - guaranteeing inode number uniqueness is becoming increasingly > infeasible, we need a bit to tell userspace "inode number is not > unique, use filehandle instead" This is a tough one. POSIX says "The st_ino and st_dev fields taken together uniquely identify the file within the system." Adding a bit that says "from now the above POSIX rule is invalid" doesn't instantly fix all the existing applications that rely on it. Linux did manage to extend st_ino from 32 to 64 bits, but even in that case it's not clear how many instances of stat(path1, &st); unsigned int ino = st.st_ino; stat(path2, &st); if (ino == st.st_ino) ... are waiting to blow up one fine day. Of course the code should have used ino_t, but I think this pattern is not that uncommon. All in all, I don't think adding a flag to statx is the right answer. It entitles filesystem developers to be sloppy about st_ino uniqueness, which is not a good idea. I think what overlayfs is doing (see documentation) is generally the right direction. It makes various compromises but not to uniqueness, and we haven't had complaints (fingers crossed). Nudging userspace developers to use file handles would also be good, but they should do so unconditionally, not based on a flag that has no well defined meaning. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-21 15:06 ` Miklos Szeredi @ 2024-02-21 21:04 ` Kent Overstreet 2024-02-21 21:08 ` Josef Bacik 1 sibling, 0 replies; 19+ messages in thread From: Kent Overstreet @ 2024-02-21 21:04 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc, NeilBrown On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > Recently we had a pretty long discussion on statx extensions, which > > eventually got a bit offtopic but nevertheless hashed out all the major > > issues. > > > > To summarize: > > - guaranteeing inode number uniqueness is becoming increasingly > > infeasible, we need a bit to tell userspace "inode number is not > > unique, use filehandle instead" > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > together uniquely identify the file within the system." > > Adding a bit that says "from now the above POSIX rule is invalid" > doesn't instantly fix all the existing applications that rely on it. Even POSIX must bend when faced with reality. 64 bits is getting uncomfortably cramped already and with filesystems getting bigger it's going to break sooner or later. We don't want to be abusing st_dev, and snapshots and inode number sharding mean we're basically out of bits today. > doing (see documentation) is generally the right direction. It makes > various compromises but not to uniqueness, and we haven't had > complaints (fingers crossed). I haven't seen anything in overlayfs that looked like a real solution, just hacks that would break sooner or later if more filesystems are being stacked. > Nudging userspace developers to use file handles would also be good, > but they should do so unconditionally, not based on a flag that has no > well defined meaning. If we define it, it has a perfectly well defined meaning. I wouldn't be against telling userspace to use file handles unconditionally; they should only need to query it for a file that has handlinks, anyways. But I think we _do_ need this bit, if nothing else, as exactly that nudge. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-21 15:06 ` Miklos Szeredi 2024-02-21 21:04 ` Kent Overstreet @ 2024-02-21 21:08 ` Josef Bacik 2024-02-22 9:14 ` Miklos Szeredi 1 sibling, 1 reply; 19+ messages in thread From: Josef Bacik @ 2024-02-21 21:08 UTC (permalink / raw) To: Miklos Szeredi Cc: Kent Overstreet, linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > Recently we had a pretty long discussion on statx extensions, which > > eventually got a bit offtopic but nevertheless hashed out all the major > > issues. > > > > To summarize: > > - guaranteeing inode number uniqueness is becoming increasingly > > infeasible, we need a bit to tell userspace "inode number is not > > unique, use filehandle instead" > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > together uniquely identify the file within the system." > Which is what btrfs has done forever, and we've gotten yelled at forever for doing it. We have a compromise and a way forward, but it's not a widely held view that changing st_dev to give uniqueness is an acceptable solution. It may have been for overlayfs because you guys are already doing something special, but it's not an option that is afforded the rest of us. > Adding a bit that says "from now the above POSIX rule is invalid" > doesn't instantly fix all the existing applications that rely on it. > > Linux did manage to extend st_ino from 32 to 64 bits, but even in that > case it's not clear how many instances of > > stat(path1, &st); > unsigned int ino = st.st_ino; > stat(path2, &st); > if (ino == st.st_ino) > ... > > are waiting to blow up one fine day. Of course the code should have > used ino_t, but I think this pattern is not that uncommon. > > All in all, I don't think adding a flag to statx is the right answer. > It entitles filesystem developers to be sloppy about st_ino > uniqueness, which is not a good idea. I think what overlayfs is > doing (see documentation) is generally the right direction. It makes > various compromises but not to uniqueness, and we haven't had > complaints (fingers crossed). Again, you haven't, I have, consistently and constantly for a decade. > Nudging userspace developers to use file handles would also be good, > but they should do so unconditionally, not based on a flag that has no > well defined meaning. I think that's what we're trying to do, define it properly. We now have 2 file systems in tree that have this sort of behavior. It's not a new or crazy thing (well I suppose it is when you consider the lifetime of file systems), having a way for user space developers that care to properly identify they've wandered across a subvolume boundary could be useful. As for the proposal itself, we talk about this every year. We're all more or less onboard with the idea, the code just needs to be written. Write the code and post the patches, I assume that there won't be much pushback, probably could even get it into Christian's tree in some branch or another before LSF. Thanks, Josef ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-21 21:08 ` Josef Bacik @ 2024-02-22 9:14 ` Miklos Szeredi 2024-02-22 9:42 ` Kent Overstreet 2024-02-22 15:48 ` Josef Bacik 0 siblings, 2 replies; 19+ messages in thread From: Miklos Szeredi @ 2024-02-22 9:14 UTC (permalink / raw) To: Josef Bacik Cc: Kent Overstreet, linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > issues. > > > > > > To summarize: > > > - guaranteeing inode number uniqueness is becoming increasingly > > > infeasible, we need a bit to tell userspace "inode number is not > > > unique, use filehandle instead" > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > together uniquely identify the file within the system." > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > doing it. We have a compromise and a way forward, but it's not a widely held > view that changing st_dev to give uniqueness is an acceptable solution. It may > have been for overlayfs because you guys are already doing something special, > but it's not an option that is afforded the rest of us. Overlayfs tries hard not to use st_dev to give uniqueness and instead partitions the 64bit st_ino space within the same st_dev. There are various fallback cases, some involve switching st_dev and some using non-persistent st_ino. What overlayfs does may or may not be applicable to btrfs/bcachefs, but that's not my point. My point is that adding a flag to statx does not solve anything. You can't just say that from now on btrfs doesn't have use unique st_ino/st_dev because we've just indicated that in statx and everything is fine. That will trigger the no-regressions rule and then it's game over. At least I would expect that to happen. What we can do instead is introduce a new API that is better, and thankfully we already have one in the form of file handles. The problem I see is that you think you can get away with then reverting back st_dev to be uniform across subvolumes. But you can't. I see two options: a) do some hacks, like overlayfs does b) introduce a new "st_dev_v2" that will do the right thing and applications can move over. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 9:14 ` Miklos Szeredi @ 2024-02-22 9:42 ` Kent Overstreet 2024-02-22 10:25 ` Miklos Szeredi 2024-02-22 11:01 ` [Lsf-pc] " Jan Kara 2024-02-22 15:48 ` Josef Bacik 1 sibling, 2 replies; 19+ messages in thread From: Kent Overstreet @ 2024-02-22 9:42 UTC (permalink / raw) To: Miklos Szeredi Cc: Josef Bacik, linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Thu, Feb 22, 2024 at 10:14:20AM +0100, Miklos Szeredi wrote: > On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > > issues. > > > > > > > > To summarize: > > > > - guaranteeing inode number uniqueness is becoming increasingly > > > > infeasible, we need a bit to tell userspace "inode number is not > > > > unique, use filehandle instead" > > > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > > together uniquely identify the file within the system." > > > > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > > doing it. We have a compromise and a way forward, but it's not a widely held > > view that changing st_dev to give uniqueness is an acceptable solution. It may > > have been for overlayfs because you guys are already doing something special, > > but it's not an option that is afforded the rest of us. > > Overlayfs tries hard not to use st_dev to give uniqueness and instead > partitions the 64bit st_ino space within the same st_dev. There are > various fallback cases, some involve switching st_dev and some using > non-persistent st_ino. Yeah no, you can't crap multiple 64 bit inode number spaces into 64 bits: pigeonhole principle. We need something better than "hacks". > What overlayfs does may or may not be applicable to btrfs/bcachefs, > but that's not my point. My point is that adding a flag to statx does > not solve anything. You can't just say that from now on btrfs > doesn't have use unique st_ino/st_dev because we've just indicated > that in statx and everything is fine. That will trigger the > no-regressions rule and then it's game over. At least I would expect > that to happen. > > What we can do instead is introduce a new API that is better, This isn't a serious proposal. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 9:42 ` Kent Overstreet @ 2024-02-22 10:25 ` Miklos Szeredi 2024-02-22 11:19 ` Kent Overstreet 2024-02-22 11:01 ` [Lsf-pc] " Jan Kara 1 sibling, 1 reply; 19+ messages in thread From: Miklos Szeredi @ 2024-02-22 10:25 UTC (permalink / raw) To: Kent Overstreet Cc: Josef Bacik, linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Thu, 22 Feb 2024 at 10:42, Kent Overstreet <kent.overstreet@linux.dev> wrote: > Yeah no, you can't crap multiple 64 bit inode number spaces into 64 > bits: pigeonhole principle. Obviously not. And I have no idea about the inode number allocation strategy of bcachefs and how many bits would be needed for subvolumes, etc.. I was just telling what overlayfs does and why. It's a pragmatic solution that works. I'd very much like to move to better interfaces, but creating good interfaces is never easy. > We need something better than "hacks". That's the end goal, obviously. But we also need to take care of legacy. Always have. > This isn't a serious proposal. If not, then what is? BTW to expand on the st_dev_v2 idea, it can be done by adding a STATX_DEV_V2 query mask. That way userspace can ask for the uniform stx_dev if it wants, knowing full well that stx_ino will be non-unique within that filesystem. Then kernel is free to return with or without STATX_DEV_V2, which is basically what you proposed. Except it's now negotiated and not forced upon legacy interfaces. The other issue is adding subvolume ID. You seem to think that it's okay to add that to statx and let userspace use (st_ino, st_subvoid) to identify the inode. I'm saying this is wrong, because it doesn't work in the general case. It doesn't work for overlayfs, for example, and we definitely want to avoid having userspace do filesystem specific things *if it isn't absolutely necessary*. So for example "tar" should not care about subvolumes as long as it's not been explicitly told to care. And that means for hard link detection if should using the file handle + st_dev_v2 instead of st_ino + st_subvolid + st_dev_v2. So if that field is added to statx it must come with a stern warning about this type of usage. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 10:25 ` Miklos Szeredi @ 2024-02-22 11:19 ` Kent Overstreet 0 siblings, 0 replies; 19+ messages in thread From: Kent Overstreet @ 2024-02-22 11:19 UTC (permalink / raw) To: Miklos Szeredi Cc: Josef Bacik, linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Thu, Feb 22, 2024 at 11:25:12AM +0100, Miklos Szeredi wrote: > On Thu, 22 Feb 2024 at 10:42, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > Yeah no, you can't crap multiple 64 bit inode number spaces into 64 > > bits: pigeonhole principle. > > Obviously not. And I have no idea about the inode number allocation > strategy of bcachefs and how many bits would be needed for subvolumes, > etc.. I was just telling what overlayfs does and why. It's a > pragmatic solution that works. I'd very much like to move to better > interfaces, but creating good interfaces is never easy. You say "creating good interfaces is never easy" - but we've got a proposal, that's bounced around a fair bit, and you aren't saying anything concrete. > > We need something better than "hacks". > > That's the end goal, obviously. But we also need to take care of > legacy. Always have. So what are you proposing? > > This isn't a serious proposal. > > If not, then what is? > > BTW to expand on the st_dev_v2 idea, it can be done by adding a > STATX_DEV_V2 query mask. Didn't you see Josef just say they're trying to get away from st_dev? > The other issue is adding subvolume ID. You seem to think that it's > okay to add that to statx and let userspace use (st_ino, st_subvoid) > to identify the inode. I'm saying this is wrong, because it doesn't > work in the general case. No, I explicitly said that when INO_NOT_UNIQUE is set the _filehandle_ would be the means to identify the file. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 9:42 ` Kent Overstreet 2024-02-22 10:25 ` Miklos Szeredi @ 2024-02-22 11:01 ` Jan Kara 2024-02-22 11:27 ` Kent Overstreet 2024-02-22 12:48 ` Miklos Szeredi 1 sibling, 2 replies; 19+ messages in thread From: Jan Kara @ 2024-02-22 11:01 UTC (permalink / raw) To: Kent Overstreet Cc: Miklos Szeredi, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu 22-02-24 04:42:07, Kent Overstreet wrote: > On Thu, Feb 22, 2024 at 10:14:20AM +0100, Miklos Szeredi wrote: > > On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > > > issues. > > > > > > > > > > To summarize: > > > > > - guaranteeing inode number uniqueness is becoming increasingly > > > > > infeasible, we need a bit to tell userspace "inode number is not > > > > > unique, use filehandle instead" > > > > > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > > > together uniquely identify the file within the system." > > > > > > > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > > > doing it. We have a compromise and a way forward, but it's not a widely held > > > view that changing st_dev to give uniqueness is an acceptable solution. It may > > > have been for overlayfs because you guys are already doing something special, > > > but it's not an option that is afforded the rest of us. > > > > Overlayfs tries hard not to use st_dev to give uniqueness and instead > > partitions the 64bit st_ino space within the same st_dev. There are > > various fallback cases, some involve switching st_dev and some using > > non-persistent st_ino. > > Yeah no, you can't crap multiple 64 bit inode number spaces into 64 > bits: pigeonhole principle. > > We need something better than "hacks". I agree we should have a better long-term plan than finding ways how to cram things into 64-bits inos. However I don't see a realistic short-term solution other than that. To explicit: Currently, tar and patch and very likely other less well-known tools are broken on bcachefs due to non-unique inode numbers. If you want ot fix them, either you find ways how bcachefs can cram things into 64-bit ino_t or you go and modify these tools (or prod maintainers or whatever) to not depend on ino_t for uniqueness. The application side of things isn't going to magically fix itself by us telling "bad luck, ino_t isn't unique anymore". > > What overlayfs does may or may not be applicable to btrfs/bcachefs, > > but that's not my point. My point is that adding a flag to statx does > > not solve anything. You can't just say that from now on btrfs > > doesn't have use unique st_ino/st_dev because we've just indicated > > that in statx and everything is fine. That will trigger the > > no-regressions rule and then it's game over. At least I would expect > > that to happen. > > > > What we can do instead is introduce a new API that is better, > > This isn't a serious proposal. I think for "unique inode identifier" we don't even have to come up with new APIs. The file handle + fsid pair is an established way to do this, fanotify successfully uses this as object identifier and Amir did quite some work for this to be usable for vast majority of filesystems (including virtual ones). The problem is with rewriting all these applications to use it. If statx flag telling whether inode numbers are unique helps with that, sure we can add it, but that seems like a trivial part of the problem. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 11:01 ` [Lsf-pc] " Jan Kara @ 2024-02-22 11:27 ` Kent Overstreet 2024-02-22 11:44 ` Jan Kara 2024-02-22 12:48 ` Miklos Szeredi 1 sibling, 1 reply; 19+ messages in thread From: Kent Overstreet @ 2024-02-22 11:27 UTC (permalink / raw) To: Jan Kara Cc: Miklos Szeredi, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu, Feb 22, 2024 at 12:01:38PM +0100, Jan Kara wrote: > On Thu 22-02-24 04:42:07, Kent Overstreet wrote: > > On Thu, Feb 22, 2024 at 10:14:20AM +0100, Miklos Szeredi wrote: > > > On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > > > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > > > > issues. > > > > > > > > > > > > To summarize: > > > > > > - guaranteeing inode number uniqueness is becoming increasingly > > > > > > infeasible, we need a bit to tell userspace "inode number is not > > > > > > unique, use filehandle instead" > > > > > > > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > > > > together uniquely identify the file within the system." > > > > > > > > > > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > > > > doing it. We have a compromise and a way forward, but it's not a widely held > > > > view that changing st_dev to give uniqueness is an acceptable solution. It may > > > > have been for overlayfs because you guys are already doing something special, > > > > but it's not an option that is afforded the rest of us. > > > > > > Overlayfs tries hard not to use st_dev to give uniqueness and instead > > > partitions the 64bit st_ino space within the same st_dev. There are > > > various fallback cases, some involve switching st_dev and some using > > > non-persistent st_ino. > > > > Yeah no, you can't crap multiple 64 bit inode number spaces into 64 > > bits: pigeonhole principle. > > > > We need something better than "hacks". > > I agree we should have a better long-term plan than finding ways how to > cram things into 64-bits inos. However I don't see a realistic short-term > solution other than that. > > To explicit: Currently, tar and patch and very likely other less well-known > tools are broken on bcachefs due to non-unique inode numbers. If you want > ot fix them, either you find ways how bcachefs can cram things into 64-bit > ino_t or you go and modify these tools (or prod maintainers or whatever) to > not depend on ino_t for uniqueness. The application side of things isn't > going to magically fix itself by us telling "bad luck, ino_t isn't unique > anymore". My intent is to make a real effort towards getting better interfaces going, prod those maintainers, _then_ look at adding those hacks (that will necessarily be short term solutions since 64 bits is already looking cramped). Kind of seems to me like you guys think kicking the can down the road is supposed to to be the first priority, and that's not what I'm doing here. > > > What overlayfs does may or may not be applicable to btrfs/bcachefs, > > > but that's not my point. My point is that adding a flag to statx does > > > not solve anything. You can't just say that from now on btrfs > > > doesn't have use unique st_ino/st_dev because we've just indicated > > > that in statx and everything is fine. That will trigger the > > > no-regressions rule and then it's game over. At least I would expect > > > that to happen. > > > > > > What we can do instead is introduce a new API that is better, > > > > This isn't a serious proposal. > > I think for "unique inode identifier" we don't even have to come up with > new APIs. The file handle + fsid pair is an established way to do this, > fanotify successfully uses this as object identifier and Amir did quite > some work for this to be usable for vast majority of filesystems (including > virtual ones). The problem is with rewriting all these applications to use > it. If statx flag telling whether inode numbers are unique helps with that, > sure we can add it, but that seems like a trivial part of the problem. Basically, it turns it into a checklist item for userspace. "Are we still doing things the old way (and we've got a documented list of where that will break", or "are we handling the INO_NOT_UNIQUE case correctly". In general when semantics change, APIs ought to change. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 11:27 ` Kent Overstreet @ 2024-02-22 11:44 ` Jan Kara 2024-02-22 11:55 ` Kent Overstreet 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2024-02-22 11:44 UTC (permalink / raw) To: Kent Overstreet Cc: Jan Kara, Miklos Szeredi, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu 22-02-24 06:27:14, Kent Overstreet wrote: > On Thu, Feb 22, 2024 at 12:01:38PM +0100, Jan Kara wrote: > > On Thu 22-02-24 04:42:07, Kent Overstreet wrote: > > > On Thu, Feb 22, 2024 at 10:14:20AM +0100, Miklos Szeredi wrote: > > > > On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > > > > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > > > > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > > > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > > > > > issues. > > > > > > > > > > > > > > To summarize: > > > > > > > - guaranteeing inode number uniqueness is becoming increasingly > > > > > > > infeasible, we need a bit to tell userspace "inode number is not > > > > > > > unique, use filehandle instead" > > > > > > > > > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > > > > > together uniquely identify the file within the system." > > > > > > > > > > > > > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > > > > > doing it. We have a compromise and a way forward, but it's not a widely held > > > > > view that changing st_dev to give uniqueness is an acceptable solution. It may > > > > > have been for overlayfs because you guys are already doing something special, > > > > > but it's not an option that is afforded the rest of us. > > > > > > > > Overlayfs tries hard not to use st_dev to give uniqueness and instead > > > > partitions the 64bit st_ino space within the same st_dev. There are > > > > various fallback cases, some involve switching st_dev and some using > > > > non-persistent st_ino. > > > > > > Yeah no, you can't crap multiple 64 bit inode number spaces into 64 > > > bits: pigeonhole principle. > > > > > > We need something better than "hacks". > > > > I agree we should have a better long-term plan than finding ways how to > > cram things into 64-bits inos. However I don't see a realistic short-term > > solution other than that. > > > > To explicit: Currently, tar and patch and very likely other less well-known > > tools are broken on bcachefs due to non-unique inode numbers. If you want > > ot fix them, either you find ways how bcachefs can cram things into 64-bit > > ino_t or you go and modify these tools (or prod maintainers or whatever) to > > not depend on ino_t for uniqueness. The application side of things isn't > > going to magically fix itself by us telling "bad luck, ino_t isn't unique > > anymore". > > My intent is to make a real effort towards getting better interfaces > going, prod those maintainers, _then_ look at adding those hacks (that > will necessarily be short term solutions since 64 bits is already > looking cramped). OK, fine by me :) So one thing is still not quite clear to me - how do you expect the INO_NOT_UNIQUE flag to be used by these apps? Do you expect them to use st_dev + st_ino by default and fall back to fsid + fhandle only when INO_NOT_UNIQUE is set? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 11:44 ` Jan Kara @ 2024-02-22 11:55 ` Kent Overstreet 2024-02-22 13:10 ` Miklos Szeredi 0 siblings, 1 reply; 19+ messages in thread From: Kent Overstreet @ 2024-02-22 11:55 UTC (permalink / raw) To: Jan Kara Cc: Miklos Szeredi, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu, Feb 22, 2024 at 12:44:17PM +0100, Jan Kara wrote: > On Thu 22-02-24 06:27:14, Kent Overstreet wrote: > > On Thu, Feb 22, 2024 at 12:01:38PM +0100, Jan Kara wrote: > > > On Thu 22-02-24 04:42:07, Kent Overstreet wrote: > > > > On Thu, Feb 22, 2024 at 10:14:20AM +0100, Miklos Szeredi wrote: > > > > > On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > > > > > > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > > > > > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > > > > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > > > > > > issues. > > > > > > > > > > > > > > > > To summarize: > > > > > > > > - guaranteeing inode number uniqueness is becoming increasingly > > > > > > > > infeasible, we need a bit to tell userspace "inode number is not > > > > > > > > unique, use filehandle instead" > > > > > > > > > > > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > > > > > > together uniquely identify the file within the system." > > > > > > > > > > > > > > > > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > > > > > > doing it. We have a compromise and a way forward, but it's not a widely held > > > > > > view that changing st_dev to give uniqueness is an acceptable solution. It may > > > > > > have been for overlayfs because you guys are already doing something special, > > > > > > but it's not an option that is afforded the rest of us. > > > > > > > > > > Overlayfs tries hard not to use st_dev to give uniqueness and instead > > > > > partitions the 64bit st_ino space within the same st_dev. There are > > > > > various fallback cases, some involve switching st_dev and some using > > > > > non-persistent st_ino. > > > > > > > > Yeah no, you can't crap multiple 64 bit inode number spaces into 64 > > > > bits: pigeonhole principle. > > > > > > > > We need something better than "hacks". > > > > > > I agree we should have a better long-term plan than finding ways how to > > > cram things into 64-bits inos. However I don't see a realistic short-term > > > solution other than that. > > > > > > To explicit: Currently, tar and patch and very likely other less well-known > > > tools are broken on bcachefs due to non-unique inode numbers. If you want > > > ot fix them, either you find ways how bcachefs can cram things into 64-bit > > > ino_t or you go and modify these tools (or prod maintainers or whatever) to > > > not depend on ino_t for uniqueness. The application side of things isn't > > > going to magically fix itself by us telling "bad luck, ino_t isn't unique > > > anymore". > > > > My intent is to make a real effort towards getting better interfaces > > going, prod those maintainers, _then_ look at adding those hacks (that > > will necessarily be short term solutions since 64 bits is already > > looking cramped). > > OK, fine by me :) So one thing is still not quite clear to me - how do you > expect the INO_NOT_UNIQUE flag to be used by these apps? Do you expect them > to use st_dev + st_ino by default and fall back to fsid + fhandle only when > INO_NOT_UNIQUE is set? Shouldn't matter. If they care about performance and they're in some strange situation where the syscal overhead matters, they can use fhandle only when the bit is set, but I'd personally prefer to see everyone on the same codepath and just always using fh. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 11:55 ` Kent Overstreet @ 2024-02-22 13:10 ` Miklos Szeredi 0 siblings, 0 replies; 19+ messages in thread From: Miklos Szeredi @ 2024-02-22 13:10 UTC (permalink / raw) To: Kent Overstreet Cc: Jan Kara, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu, 22 Feb 2024 at 12:55, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Thu, Feb 22, 2024 at 12:44:17PM +0100, Jan Kara wrote: > > On Thu 22-02-24 06:27:14, Kent Overstreet wrote: > > > My intent is to make a real effort towards getting better interfaces > > > going, prod those maintainers, _then_ look at adding those hacks (that > > > will necessarily be short term solutions since 64 bits is already > > > looking cramped). > > > > OK, fine by me :) So one thing is still not quite clear to me - how do you > > expect the INO_NOT_UNIQUE flag to be used by these apps? Do you expect them > > to use st_dev + st_ino by default and fall back to fsid + fhandle only when > > INO_NOT_UNIQUE is set? > > Shouldn't matter. If they care about performance and they're in some > strange situation where the syscal overhead matters, If it's expensive, then just make the overhead smaller (by adding fh and uuid to statx(2), for example). Using st_ino is also racy in some filesystems, due to the fact that the ino can be reused. If userspace is converted, it should be converted properly, there's just no excuse to add conditional code like that, which makes things more complex and less reliable. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 11:01 ` [Lsf-pc] " Jan Kara 2024-02-22 11:27 ` Kent Overstreet @ 2024-02-22 12:48 ` Miklos Szeredi 2024-02-22 16:08 ` Jan Kara 1 sibling, 1 reply; 19+ messages in thread From: Miklos Szeredi @ 2024-02-22 12:48 UTC (permalink / raw) To: Jan Kara Cc: Kent Overstreet, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu, 22 Feb 2024 at 12:01, Jan Kara <jack@suse.cz> wrote: > I think for "unique inode identifier" we don't even have to come up with > new APIs. The file handle + fsid pair is an established way to do this, Why not uuid? fsid seems to be just a degraded uuid. We can do better with statx and/or statmount. > fanotify successfully uses this as object identifier and Amir did quite > some work for this to be usable for vast majority of filesystems (including Vast majority != all. Also even uuid is just a statistically unique identifier, while st_dev was guaranteed to be unique (but not persistent, like uuid). If we are going to start fixing userspace, then we better make sure to use the right interfaces, that won't have issues in the future. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 12:48 ` Miklos Szeredi @ 2024-02-22 16:08 ` Jan Kara 2024-02-26 8:27 ` Miklos Szeredi 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2024-02-22 16:08 UTC (permalink / raw) To: Miklos Szeredi Cc: Jan Kara, Kent Overstreet, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu 22-02-24 13:48:45, Miklos Szeredi wrote: > On Thu, 22 Feb 2024 at 12:01, Jan Kara <jack@suse.cz> wrote: > > > I think for "unique inode identifier" we don't even have to come up with > > new APIs. The file handle + fsid pair is an established way to do this, > > Why not uuid? > > fsid seems to be just a degraded uuid. We can do better with statx > and/or statmount. fanotify uses fsid because we have standard interface for querying fsid (statfs(2)) and because not all filesystems (in particular virtual ones) bother with uuid. At least the first thing is being changed now. > > fanotify successfully uses this as object identifier and Amir did quite > > some work for this to be usable for vast majority of filesystems (including > > Vast majority != all. True. If we are going to use this scheme more widely, we need to have a look whether the remaining cases need fixing or we can just ignore them. They were not very interesting for fanotify so we moved on. > Also even uuid is just a statistically unique > identifier, while st_dev was guaranteed to be unique (but not > persistent, like uuid). Well, everything is just statistically true in this world :) If you have conflicting uuids, you are likely to see also other problems so I would not be too concerned about that. > If we are going to start fixing userspace, then we better make sure to > use the right interfaces, that won't have issues in the future. I agree we should give this a good thought which identification of a filesystem is the best. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 16:08 ` Jan Kara @ 2024-02-26 8:27 ` Miklos Szeredi 2024-02-26 16:24 ` Amir Goldstein 0 siblings, 1 reply; 19+ messages in thread From: Miklos Szeredi @ 2024-02-26 8:27 UTC (permalink / raw) To: Jan Kara Cc: Kent Overstreet, Josef Bacik, linux-kernel, linux-bcachefs, linux-fsdevel, lsf-pc, linux-btrfs On Thu, 22 Feb 2024 at 17:08, Jan Kara <jack@suse.cz> wrote: > > If we are going to start fixing userspace, then we better make sure to > > use the right interfaces, that won't have issues in the future. > > I agree we should give this a good thought which identification of a > filesystem is the best. To find mount boundaries statx.stx_mnt_id (especially with STATX_MNT_ID_UNIQUE) is perfect. By supplying stx_mnt_id to statmount(2) it's possible to get the device number associated with that filesystem (statmount.sb_dev_*). I think it's what Josef wants btrfs to return as st_dev. And statx could return that in stx_dev_*, with an interface feature flag, same as we've done with stx_mnt_id. I.e. STATX_DEV_NOHACK would force the vfs to replace anything the filesystem put in kstat.dev with sb->s_dev. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-26 8:27 ` Miklos Szeredi @ 2024-02-26 16:24 ` Amir Goldstein 0 siblings, 0 replies; 19+ messages in thread From: Amir Goldstein @ 2024-02-26 16:24 UTC (permalink / raw) To: Miklos Szeredi Cc: Jan Kara, Kent Overstreet, Josef Bacik, linux-fsdevel, lsf-pc, Christian Brauner On Mon, Feb 26, 2024 at 10:27 AM Miklos Szeredi via Lsf-pc <lsf-pc@lists.linux-foundation.org> wrote: > > On Thu, 22 Feb 2024 at 17:08, Jan Kara <jack@suse.cz> wrote: > > > > If we are going to start fixing userspace, then we better make sure to > > > use the right interfaces, that won't have issues in the future. > > > > I agree we should give this a good thought which identification of a > > filesystem is the best. > > To find mount boundaries statx.stx_mnt_id (especially with > STATX_MNT_ID_UNIQUE) is perfect. > > By supplying stx_mnt_id to statmount(2) it's possible to get the > device number associated with that filesystem (statmount.sb_dev_*). I > think it's what Josef wants btrfs to return as st_dev. > > And statx could return that in stx_dev_*, with an interface feature > flag, same as we've done with stx_mnt_id. I.e. STATX_DEV_NOHACK would > force the vfs to replace anything the filesystem put in kstat.dev with > sb->s_dev. > Miklos, I would really like to have this discussion at LSFMM and it would be great if you could join it remotely. Care to fill out the form and ask for a virtual invite so that we have you on our list? https://forms.gle/TGCgBDH1x5pXiWFo7 Thanks, Amir. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 9:14 ` Miklos Szeredi 2024-02-22 9:42 ` Kent Overstreet @ 2024-02-22 15:48 ` Josef Bacik 2024-02-26 8:14 ` Miklos Szeredi 1 sibling, 1 reply; 19+ messages in thread From: Josef Bacik @ 2024-02-22 15:48 UTC (permalink / raw) To: Miklos Szeredi Cc: Kent Overstreet, linux-bcachefs, linux-btrfs, linux-fsdevel, linux-kernel, lsf-pc On Thu, Feb 22, 2024 at 10:14:20AM +0100, Miklos Szeredi wrote: > On Wed, 21 Feb 2024 at 22:08, Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Wed, Feb 21, 2024 at 04:06:34PM +0100, Miklos Szeredi wrote: > > > On Wed, 21 Feb 2024 at 01:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > Recently we had a pretty long discussion on statx extensions, which > > > > eventually got a bit offtopic but nevertheless hashed out all the major > > > > issues. > > > > > > > > To summarize: > > > > - guaranteeing inode number uniqueness is becoming increasingly > > > > infeasible, we need a bit to tell userspace "inode number is not > > > > unique, use filehandle instead" > > > > > > This is a tough one. POSIX says "The st_ino and st_dev fields taken > > > together uniquely identify the file within the system." > > > > > > > Which is what btrfs has done forever, and we've gotten yelled at forever for > > doing it. We have a compromise and a way forward, but it's not a widely held > > view that changing st_dev to give uniqueness is an acceptable solution. It may > > have been for overlayfs because you guys are already doing something special, > > but it's not an option that is afforded the rest of us. > > Overlayfs tries hard not to use st_dev to give uniqueness and instead > partitions the 64bit st_ino space within the same st_dev. There are > various fallback cases, some involve switching st_dev and some using > non-persistent st_ino. > > What overlayfs does may or may not be applicable to btrfs/bcachefs, > but that's not my point. My point is that adding a flag to statx does > not solve anything. You can't just say that from now on btrfs > doesn't have use unique st_ino/st_dev because we've just indicated > that in statx and everything is fine. That will trigger the > no-regressions rule and then it's game over. At least I would expect > that to happen. Right, nobody is arguing that. Our plan is to 1) Introduce some sort of statx mechanism to expose this information. 2) Introduce an incompat fs feature flag to give unique inode numbers for people that want them, and there stop doing the st_dev thing we currently do. 3) Continue doing the st_dev thing we currently do for the non-feature flag case. 4) Wait 50 years and then maybe stop doing the st_dev thing once people have adopted whatever statx flag has been introduced. God willing I will have been hit by a bus well before then. Nobody is arguing about reverting existing behavior, we're stuck with it. We're trying to define how we'd like it to look going forward, and start working towards that. Thanks, Josef ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more 2024-02-22 15:48 ` Josef Bacik @ 2024-02-26 8:14 ` Miklos Szeredi 0 siblings, 0 replies; 19+ messages in thread From: Miklos Szeredi @ 2024-02-26 8:14 UTC (permalink / raw) To: Josef Bacik Cc: Kent Overstreet, linux-bcachefs, linux-btrfs, linux-fsdevel, Jan Kara, linux-kernel, lsf-pc On Thu, 22 Feb 2024 at 16:48, Josef Bacik <josef@toxicpanda.com> wrote: > Right, nobody is arguing that. Our plan is to > > 1) Introduce some sort of statx mechanism to expose this information. > 2) Introduce an incompat fs feature flag to give unique inode numbers for people > that want them, and there stop doing the st_dev thing we currently do. I don't get it. What does the filesystem (the actual bits on disk) have anything to do with how st_dev is exposed to userspace applications? This is not a filesystem feature, this is an interface feature. And I even doubt that salvaging st_dev is worth it. Userspace should just be converted to use something else. In other words st_ino *and* st_dev are legacy and we need to find superior alternatives. Seems like there's an agreement about file handle being able to replace st_ino. I'm not quite sure fsid or uuid can replace st_dev, but that's up for discussion. Thanks, Miklos ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-02-26 16:25 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-21 0:51 [LSF TOPIC] statx extensions for subvol/snapshot filesystems & more Kent Overstreet 2024-02-21 15:06 ` Miklos Szeredi 2024-02-21 21:04 ` Kent Overstreet 2024-02-21 21:08 ` Josef Bacik 2024-02-22 9:14 ` Miklos Szeredi 2024-02-22 9:42 ` Kent Overstreet 2024-02-22 10:25 ` Miklos Szeredi 2024-02-22 11:19 ` Kent Overstreet 2024-02-22 11:01 ` [Lsf-pc] " Jan Kara 2024-02-22 11:27 ` Kent Overstreet 2024-02-22 11:44 ` Jan Kara 2024-02-22 11:55 ` Kent Overstreet 2024-02-22 13:10 ` Miklos Szeredi 2024-02-22 12:48 ` Miklos Szeredi 2024-02-22 16:08 ` Jan Kara 2024-02-26 8:27 ` Miklos Szeredi 2024-02-26 16:24 ` Amir Goldstein 2024-02-22 15:48 ` Josef Bacik 2024-02-26 8:14 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).