All of lore.kernel.org
 help / color / mirror / Atom feed
* UFS s_maxbytes bogosity
@ 2017-06-04 19:31 Linus Torvalds
  2017-06-04 21:37 ` Al Viro
  2017-06-04 23:23 ` [PATCH 1/1] fs/ufs: Set UFS default maximum bytes per file Richard Narron
  0 siblings, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2017-06-04 19:31 UTC (permalink / raw)
  To: Al Viro, Richard Narron, Will B; +Cc: linux-fsdevel

Al (or anybody else),
 any comments on

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

and in particular the patch in there that just makes UFS use MAX_LFS_FILESIZE:

    https://bugzilla.kernel.org/attachment.cgi?id=256853&action=diff

I'm inclined to just apply it, since clearly the default 2G limit
isn't appropriate for UFS, although it would perhaps be a good idea to
figure out just what the true UFS maximum file size can be.. The
on-disk "ui_size" field seems to be a 64-bit entity, so
MAX_LFS_FILESIZE is certainly better, but there's probably some index
tree limit that depends on the block size or whatever.

Googling ufs file size limits shows that (a) Solaris at some point
also had that same 2GB file size limit for ufs, and (b) finds this:

  "Maximum UFS File and File System Size

    The maximum size of a UFS file system is about 16 TB of usable
space, minus about one percent overhead. A sparse file can have a
logical size of one terabyte. However, the actual amount of data that
can be stored in a file is approximately one percent less than 1 TB
because of the file system overhead."

So 1TB might be the right number, but the default MAX_NON_LFS limit
definitely is not.

Richard - I'd really like to get a sign-off for that patch, even if
it's a trivial one-liner (and even if we might end up using a
different limit in the end).

Afaik, we haven't had a UFS maintainer for a long while, the last
commit from Evgeniy seems to be from 2015, and the last one to UFS
with anything but a cc was in 2010.

               Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 19:31 UFS s_maxbytes bogosity Linus Torvalds
@ 2017-06-04 21:37 ` Al Viro
  2017-06-04 21:58   ` Al Viro
  2017-06-04 23:23 ` [PATCH 1/1] fs/ufs: Set UFS default maximum bytes per file Richard Narron
  1 sibling, 1 reply; 21+ messages in thread
From: Al Viro @ 2017-06-04 21:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 12:31:34PM -0700, Linus Torvalds wrote:

> and in particular the patch in there that just makes UFS use MAX_LFS_FILESIZE:
> 
>     https://bugzilla.kernel.org/attachment.cgi?id=256853&action=diff
 
> I'm inclined to just apply it, since clearly the default 2G limit
> isn't appropriate for UFS, although it would perhaps be a good idea to
> figure out just what the true UFS maximum file size can be.. The
> on-disk "ui_size" field seems to be a 64-bit entity, so
> MAX_LFS_FILESIZE is certainly better, but there's probably some index
> tree limit that depends on the block size or whatever.

Depends.  There had been a lot of UFS variants (hell, ext2 is one), so
limits differ.  They are also kernel-dependent.

One hard limit is the same as in ext2 - indirect blocks contain pointers
to blocks, so you get (10 + n + n^2 + n^3)*block_size, where n is
block_size / pointer size.  For UFS pointers are 32bit (UFS2 is trickier,
but we don't support that).

Another pile of fun is VM-related and that varies from kernel to kernel.
FWIW, current FreeBSD has no problems with that (32bit included), but
there had been 4.4BSD variants that used to.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 21:37 ` Al Viro
@ 2017-06-04 21:58   ` Al Viro
  2017-06-04 22:06     ` Al Viro
  2017-06-04 22:32     ` Theodore Ts'o
  0 siblings, 2 replies; 21+ messages in thread
From: Al Viro @ 2017-06-04 21:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 10:37:36PM +0100, Al Viro wrote:
> On Sun, Jun 04, 2017 at 12:31:34PM -0700, Linus Torvalds wrote:
> 
> > and in particular the patch in there that just makes UFS use MAX_LFS_FILESIZE:
> > 
> >     https://bugzilla.kernel.org/attachment.cgi?id=256853&action=diff
>  
> > I'm inclined to just apply it, since clearly the default 2G limit
> > isn't appropriate for UFS, although it would perhaps be a good idea to
> > figure out just what the true UFS maximum file size can be.. The
> > on-disk "ui_size" field seems to be a 64-bit entity, so
> > MAX_LFS_FILESIZE is certainly better, but there's probably some index
> > tree limit that depends on the block size or whatever.
> 
> Depends.  There had been a lot of UFS variants (hell, ext2 is one), so
> limits differ.  They are also kernel-dependent.
> 
> One hard limit is the same as in ext2 - indirect blocks contain pointers
> to blocks, so you get (10 + n + n^2 + n^3)*block_size, where n is

<grabs some coffee>

12 + n + n^2 + n^3, sorry.  12 direct blocks, plus usual indirects.

Something like (completely untested)

u64 ufs_max_bytes(struct super_block *sb)
{
	struct ufs_sb_private_info *uspi = UFS_SB(sb)->s_uspi;
	int bits = uspi->s_apbshift;
	u64 res;

	if (bits > 21)
		return MAX_LFS_FILESIZE;

	res = UFS_NDADDR + (1LL << bits) + (1LL << (2*bits)) +
		(1LL << (3*bits));

	if (res >= (MAX_LFS_FILESIZE >> uspi->s_bshift))
		return MAX_LFS_FILESIZE;

	return res << uspi->s_bshift;
}

ought to calculate the right thing for modern UFS variants; I would
leave the anything other than UFS_MOUNT_UFSTYPE_44BSD and
UFS_MOUNT_UFSTYPE_UFS2 alone.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 21:58   ` Al Viro
@ 2017-06-04 22:06     ` Al Viro
  2017-06-04 23:26       ` Linus Torvalds
  2017-06-04 22:32     ` Theodore Ts'o
  1 sibling, 1 reply; 21+ messages in thread
From: Al Viro @ 2017-06-04 22:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 10:58:38PM +0100, Al Viro wrote:

> u64 ufs_max_bytes(struct super_block *sb)
> {
> 	struct ufs_sb_private_info *uspi = UFS_SB(sb)->s_uspi;
> 	int bits = uspi->s_apbshift;
> 	u64 res;
> 
> 	if (bits > 21)
> 		return MAX_LFS_FILESIZE;
> 
> 	res = UFS_NDADDR + (1LL << bits) + (1LL << (2*bits)) +
> 		(1LL << (3*bits));
> 
> 	if (res >= (MAX_LFS_FILESIZE >> uspi->s_bshift))
> 		return MAX_LFS_FILESIZE;
> 
> 	return res << uspi->s_bshift;
> }
> 
> ought to calculate the right thing for modern UFS variants; I would
> leave the anything other than UFS_MOUNT_UFSTYPE_44BSD and
> UFS_MOUNT_UFSTYPE_UFS2 alone.

Now that I'm hopefully sufficiently awake...  Folks, could you try this:

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 131b2b77c818..2bab1491a5d4 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -746,6 +746,24 @@ static void ufs_put_super(struct super_block *sb)
 	return;
 }
 
+static u64 ufs_max_bytes(struct super_block *sb)
+{
+	struct ufs_sb_private_info *uspi = UFS_SB(sb)->s_uspi;
+	int bits = uspi->s_apbshift;
+	u64 res;
+
+	if (bits > 21)
+		return MAX_LFS_FILESIZE;
+
+	res = UFS_NDADDR + (1LL << bits) + (1LL << (2*bits)) +
+		(1LL << (3*bits));
+
+	if (res >= (MAX_LFS_FILESIZE >> uspi->s_bshift))
+		return MAX_LFS_FILESIZE;
+
+	return res << uspi->s_bshift;
+}
+
 static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct ufs_sb_info * sbi;
@@ -823,6 +841,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 		uspi->s_fshift = 9;
 		uspi->s_sbsize = super_block_size = 1536;
 		uspi->s_sbbase = 0;
+		sb->s_maxbytes = ufs_max_bytes(sb);
 		flags |= UFS_DE_44BSD | UFS_UID_44BSD | UFS_ST_44BSD | UFS_CG_44BSD;
 		break;
 	case UFS_MOUNT_UFSTYPE_UFS2:
@@ -833,6 +852,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 		uspi->s_fshift = 9;
 		uspi->s_sbsize = super_block_size = 1536;
 		uspi->s_sbbase =  0;
+		sb->s_maxbytes = ufs_max_bytes(sb);
 		flags |= UFS_TYPE_UFS2 | UFS_DE_44BSD | UFS_UID_44BSD | UFS_ST_44BSD | UFS_CG_44BSD;
 		break;
 		

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 21:58   ` Al Viro
  2017-06-04 22:06     ` Al Viro
@ 2017-06-04 22:32     ` Theodore Ts'o
  2017-06-05  0:02       ` Al Viro
  1 sibling, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2017-06-04 22:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 10:58:38PM +0100, Al Viro wrote:
> 
> ought to calculate the right thing for modern UFS variants; I would
> leave the anything other than UFS_MOUNT_UFSTYPE_44BSD and
> UFS_MOUNT_UFSTYPE_UFS2 alone.

If anyone wants to spend time worrying about ancient UFS
incompatilities, or wants to find someone who might care, there's the
The Unix Heritage Society mailing list[1].  (Don't bother trying to
post the list without subscribing first, the list is run even more
strictly than DaveM, and postings from people who aren't subscribers
get unceremoniously dropped on the floor.)

[1] http://minnie.tuhs.org/mailman/listinfo/tuhs

Among other things, recents discussion on this list discuss security
vulnerabilities (stack buffer overruns, et.al) on V6 and V7 Unix, and
there are people on that list who will bring up historical versions of
Unix on emulators, et. al.

More seriously, many of UFS variants don't have any way of
distinguishing between what version they are, or are safe to mount on
which version of Unix.  There's a reason why ext2/3/4 has rather
feature compatibility masks; UFS demonstrated the joys of what happens
when you don't bother with that kind of compatibility markers in file
systems.  So focusing just on what FreeBSD and other modern BSD
implementation use is a completely fair thing to do.  The enthusiasts
on TUHS are perfectly capable of sending patches if they care about V6
Unix <-> Linux compatibility.  :-)

     					- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/1] fs/ufs: Set UFS default maximum bytes per file
  2017-06-04 19:31 UFS s_maxbytes bogosity Linus Torvalds
  2017-06-04 21:37 ` Al Viro
@ 2017-06-04 23:23 ` Richard Narron
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Narron @ 2017-06-04 23:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Will B, Theodore Ts'o, linux-fsdevel

This patch is a response to a problem with reading files larger than 2GB from a UFS-2 file system:

https://bugzilla.kernel.org/show_bug.cgi?id=195721

The problem was started with a patch to Linux 4.8.4 mm/filemap.c to fix a truncation problem:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.8.y&id=3d549dcfbbb0ecdaa571431a27ee5da9f2466716

which truncates UFS-2 file systems because the default maximum file size 
is 2GB (MAX_NON_LFS)

Here I simply increase the default to a common value used by other file 
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
---------------------------------------------------------------------
--- a/fs/ufs/super.c.orig	2017-05-28 17:20:53.000000000 -0700
+++ b/fs/ufs/super.c	2017-06-03 17:08:33.340100262 -0700
@@ -812,9 +812,8 @@ static int ufs_fill_super(struct super_b
  	uspi->s_dirblksize = UFS_SECTOR_SIZE;
  	super_block_offset=UFS_SBLOCK;

-	/* Keep 2Gig file limit. Some UFS variants need to override
-	   this but as I don't know which I'll let those in the know loosen
-	   the rules */
+        sb->s_maxbytes = MAX_LFS_FILESIZE;
+
  	switch (sbi->s_mount_opt & UFS_MOUNT_UFSTYPE) {
  	case UFS_MOUNT_UFSTYPE_44BSD:
  		UFSD("ufstype=44bsd\n");

---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 22:06     ` Al Viro
@ 2017-06-04 23:26       ` Linus Torvalds
  2017-06-05  0:11         ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2017-06-04 23:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 4, 2017 at 3:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Now that I'm hopefully sufficiently awake...  Folks, could you try this:

Ok, that looks believable, but complex. So it does I wonder if it's
worth it, particularly considering that we don't really have a
maintainer, and it took people this long to even notice that huge
glaring 2GB limitat.

In fact, once we raise it past the 2GB limit, most of the s_maxbytes
reasons go away - we will already be passing around values that have
the high bit set in "int", and one of the main reasons for x_maxbyte
was to limit overflow damage in filesystem code that passed "int"
around where they shouldn't.

So assuming we trust UFS doesn't do that (and considering that it uses
the default VFS helpers for reading etc, it's presumably all good), we
might as well just use the MAX_LFS_FILESIZE define.

It's not as if we need to get s_maxbytes exactly right. All we
*really* care about is to get the LFS case ok for code that is limited
to 31 bits, and to not overflow the page index when we use the page
cache (which MAX_LFS_FILESIZE does already).

Past that, any extra precision can avoid a few unnecessary calls down
to the filesystem (ie not bother to do extra readpage calls for cases
we know aren't relevant), but it shouldn't be a big deal.

             Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 22:32     ` Theodore Ts'o
@ 2017-06-05  0:02       ` Al Viro
  0 siblings, 0 replies; 21+ messages in thread
From: Al Viro @ 2017-06-05  0:02 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Linus Torvalds, Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 06:32:49PM -0400, Theodore Ts'o wrote:
> On Sun, Jun 04, 2017 at 10:58:38PM +0100, Al Viro wrote:
> > 
> > ought to calculate the right thing for modern UFS variants; I would
> > leave the anything other than UFS_MOUNT_UFSTYPE_44BSD and
> > UFS_MOUNT_UFSTYPE_UFS2 alone.
> 
> If anyone wants to spend time worrying about ancient UFS
> incompatilities, or wants to find someone who might care, there's the
> The Unix Heritage Society mailing list[1].  (Don't bother trying to
> post the list without subscribing first, the list is run even more
> strictly than DaveM, and postings from people who aren't subscribers
> get unceremoniously dropped on the floor.)
> 
> [1] http://minnie.tuhs.org/mailman/listinfo/tuhs
> 
> Among other things, recents discussion on this list discuss security
> vulnerabilities (stack buffer overruns, et.al) on V6 and V7 Unix, and
> there are people on that list who will bring up historical versions of
> Unix on emulators, et. al.
> 
> More seriously, many of UFS variants don't have any way of
> distinguishing between what version they are, or are safe to mount on
> which version of Unix.  There's a reason why ext2/3/4 has rather
> feature compatibility masks; UFS demonstrated the joys of what happens
> when you don't bother with that kind of compatibility markers in file
> systems.  So focusing just on what FreeBSD and other modern BSD
> implementation use is a completely fair thing to do.  The enthusiasts
> on TUHS are perfectly capable of sending patches if they care about V6
> Unix <-> Linux compatibility.  :-)

FFS and derivatives postdate v7, let alone v6, as you bloody well know ;-)
Anyway, there are two issues - access on Linux not getting unhappy and
*BSD kernels being OK with created huge files.

After rereading that code, for UFS2 the function I'd posted would do the
right thing; for UFS1 there's an additional complication - on-disk
ui_blocks (in units of 512 bytes) being only 32bit.  So for UFS1 we need
to make sure we won't overflow that thing, making for hard limit a bit
below 2^41 bytes, in addition to the limit imposed by the tree structure.

How about this:

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 131b2b77c818..eefa48499a6e 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -746,6 +746,23 @@ static void ufs_put_super(struct super_block *sb)
 	return;
 }
 
+static u64 ufs_max_bytes(struct super_block *sb)
+{
+	struct ufs_sb_private_info *uspi = UFS_SB(sb)->s_uspi;
+	int bits = uspi->s_apbshift;
+	u64 res;
+
+	if (bits > 21)
+		res = ~0ULL;
+	else
+		res = UFS_NDADDR + (1LL << bits) + (1LL << (2*bits)) +
+			(1LL << (3*bits));
+
+	if (res >= (MAX_LFS_FILESIZE >> uspi->s_bshift))
+		return MAX_LFS_FILESIZE;
+	return res << uspi->s_bshift;
+}
+
 static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct ufs_sb_info * sbi;
@@ -823,6 +840,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 		uspi->s_fshift = 9;
 		uspi->s_sbsize = super_block_size = 1536;
 		uspi->s_sbbase = 0;
+		sb->s_maxbytes = min(ufs_max_bytes(sb),1ULL<<40);
 		flags |= UFS_DE_44BSD | UFS_UID_44BSD | UFS_ST_44BSD | UFS_CG_44BSD;
 		break;
 	case UFS_MOUNT_UFSTYPE_UFS2:
@@ -833,6 +851,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 		uspi->s_fshift = 9;
 		uspi->s_sbsize = super_block_size = 1536;
 		uspi->s_sbbase =  0;
+		sb->s_maxbytes = ufs_max_bytes(sb);
 		flags |= UFS_TYPE_UFS2 | UFS_DE_44BSD | UFS_UID_44BSD | UFS_ST_44BSD | UFS_CG_44BSD;
 		break;
 		

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-04 23:26       ` Linus Torvalds
@ 2017-06-05  0:11         ` Al Viro
  2017-06-05  3:00           ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2017-06-05  0:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 04:26:51PM -0700, Linus Torvalds wrote:

> So assuming we trust UFS doesn't do that (and considering that it uses
> the default VFS helpers for reading etc, it's presumably all good), we
> might as well just use the MAX_LFS_FILESIZE define.

Umm...  Might start hitting this (in ufs_block_to_path()):
                ufs_warning(inode->i_sb, "ufs_block_to_path", "block > big");

> It's not as if we need to get s_maxbytes exactly right. All we
> *really* care about is to get the LFS case ok for code that is limited
> to 31 bits, and to not overflow the page index when we use the page
> cache (which MAX_LFS_FILESIZE does already).
> 
> Past that, any extra precision can avoid a few unnecessary calls down
> to the filesystem (ie not bother to do extra readpage calls for cases
> we know aren't relevant), but it shouldn't be a big deal.

For UFS2 - yes, for UFS1 we have another hard limit I'd missed.  i_blocks
is in half-kilobyte units there and it's 32bit on-disk.  So for UFS1 I'd
cap it with 1Tb (the real limit is ~ 2Tb - 2Mb, but accurate calculation
is a bit of a bother).  Come to think of that, the minimal block size for
UFS1 is 4K with pointers-per-block >= 1024.  So tree-imposed limit is
no lower than 1024^3*4096, i.e. greater than that and we could make
->s_maxbytes unconditional 1Tb for UFS1.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-05  0:11         ` Al Viro
@ 2017-06-05  3:00           ` Linus Torvalds
  2017-06-05  3:49             ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2017-06-05  3:00 UTC (permalink / raw)
  To: Al Viro; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 4, 2017 at 5:11 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> For UFS2 - yes, for UFS1 we have another hard limit I'd missed.  i_blocks
> is in half-kilobyte units there and it's 32bit on-disk.  So for UFS1 I'd
> cap it with 1Tb (the real limit is ~ 2Tb - 2Mb, but accurate calculation
> is a bit of a bother).  Come to think of that, the minimal block size for
> UFS1 is 4K with pointers-per-block >= 1024.  So tree-imposed limit is
> no lower than 1024^3*4096, i.e. greater than that and we could make
> ->s_maxbytes unconditional 1Tb for UFS1.

The nblocks limit (and the 32-bit block numbers) might not limit a
sparse file, so I think the tree-imposed limit might be the final true
limit even on UFS1, no?

                      Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-05  3:00           ` Linus Torvalds
@ 2017-06-05  3:49             ` Al Viro
  2017-06-07 23:48               ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2017-06-05  3:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Narron, Will B, linux-fsdevel

On Sun, Jun 04, 2017 at 08:00:26PM -0700, Linus Torvalds wrote:
> On Sun, Jun 4, 2017 at 5:11 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > For UFS2 - yes, for UFS1 we have another hard limit I'd missed.  i_blocks
> > is in half-kilobyte units there and it's 32bit on-disk.  So for UFS1 I'd
> > cap it with 1Tb (the real limit is ~ 2Tb - 2Mb, but accurate calculation
> > is a bit of a bother).  Come to think of that, the minimal block size for
> > UFS1 is 4K with pointers-per-block >= 1024.  So tree-imposed limit is
> > no lower than 1024^3*4096, i.e. greater than that and we could make
> > ->s_maxbytes unconditional 1Tb for UFS1.
> 
> The nblocks limit (and the 32-bit block numbers) might not limit a
> sparse file, so I think the tree-imposed limit might be the final true
> limit even on UFS1, no?

Yes, but... where would we notice that overflow on allocation?  Matter of fact...
<looks>
<starts swearing>
commit 8f45c33decf62e1aaaa9411aae8fef6a38f95845
Author: Jan Kara <jack@suse.cz>
Date:   Thu May 20 16:00:36 2010 +0200

    ufs: Remove dead quota code
    
    UFS quota is non-functional at least since 2.6.12 because dq_op was set
    to NULL. Since the filesystem exists mainly to allow cooperation with Solaris
    and quota format isn't standard, just remove the dead code.
    
    CC: Evgeniy Dushistov <dushistov@mail.ru>
    Signed-off-by: Jan Kara <jack@suse.cz>
<swears some more>

Jan?  You do realize that "non-functional" or not, dquot_alloc_block() updates
->i_bytes/->i_blocks, right?  And removing that should've had the updates put
back explicitly...

Grrr...  OK, I'll put together a patch fixing that idiocy.  As it is, rw mounts
of ufs on-disk ->i_blocks not updated and, AFAICS, can lead to disk space leaks
on inode deletion.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-05  3:49             ` Al Viro
@ 2017-06-07 23:48               ` Al Viro
  2017-06-08  0:35                 ` Richard Narron
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2017-06-07 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Narron, Will B, linux-fsdevel, Evgeniy Dushistov

On Mon, Jun 05, 2017 at 04:49:36AM +0100, Al Viro wrote:

> Grrr...  OK, I'll put together a patch fixing that idiocy.  As it is, rw mounts
> of ufs on-disk ->i_blocks not updated and, AFAICS, can lead to disk space leaks
> on inode deletion.

Turns out it's even worse than that, and some of the breakage in there is my fault.
I'll fix what I can, but for now I would suggest
	* making UFS_FS_WRITE depend on BROKEN; it had always been experimental,
but since 2010 it had been actively buggering filesystems (aforementioned Jan's
bug + a bug of mine in 2015 series).  I'll fix that stuff, but I'd expect
it to take a month or so, including serious testing.  I have set the things up
for such testing (which has promptly caught all kinds of fun), but we are at -rc4
now, so it'll probably be the next cycle fodder, with interesting backporting
afterwards.
	* for ->s_maxbytes, let's go with tree-imposed limit.  Proper way of
dealing with ->i_blocks overflows is -ENOSPC when attempt to grab a block would've
caused such an overflow.
	* If Evgeniy can resurface and take part in that fun, I'd be happy to help.
If not, well... guess I'll take the thing over until somebody willing to adopt it
comes along.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-07 23:48               ` Al Viro
@ 2017-06-08  0:35                 ` Richard Narron
  2017-06-08  2:20                   ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Narron @ 2017-06-08  0:35 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Thu, 8 Jun 2017, Al Viro wrote:

> On Mon, Jun 05, 2017 at 04:49:36AM +0100, Al Viro wrote:
>
>> Grrr...  OK, I'll put together a patch fixing that idiocy.  As it is, rw mounts
>> of ufs on-disk ->i_blocks not updated and, AFAICS, can lead to disk space leaks
>> on inode deletion.
>
> Turns out it's even worse than that, and some of the breakage in there is my fault.
> I'll fix what I can, but for now I would suggest
> 	* making UFS_FS_WRITE depend on BROKEN; it had always been experimental,
> but since 2010 it had been actively buggering filesystems (aforementioned Jan's
> bug + a bug of mine in 2015 series).  I'll fix that stuff, but I'd expect
> it to take a month or so, including serious testing.  I have set the things up
> for such testing (which has promptly caught all kinds of fun), but we are at -rc4
> now, so it'll probably be the next cycle fodder, with interesting backporting
> afterwards.
> 	* for ->s_maxbytes, let's go with tree-imposed limit.  Proper way of
> dealing with ->i_blocks overflows is -ENOSPC when attempt to grab a block would've
> caused such an overflow.
> 	* If Evgeniy can resurface and take part in that fun, I'd be happy to help.
> If not, well... guess I'll take the thing over until somebody willing to adopt it
> comes along.

I am willing to test.  I just turned on UFS_FS_WRITE for the very first 
time running 4.12-rc4 and was able to copy a file of more than 2GB from 
one r/o FreeBSD subpartition to another r/w FreeBSD subpartition.

So it is already looking pretty good.

I did notice an ->i_blocks bug where it was set to zero on the output file 
instead of the actual block count.  This showed up when I poped over to FreeBSD 11.0 and 
did an fsck on the r/w subpartition.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-08  0:35                 ` Richard Narron
@ 2017-06-08  2:20                   ` Al Viro
  2017-06-08 22:15                     ` Al Viro
  2017-06-09  0:11                     ` Richard Narron
  0 siblings, 2 replies; 21+ messages in thread
From: Al Viro @ 2017-06-08  2:20 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Wed, Jun 07, 2017 at 05:35:31PM -0700, Richard Narron wrote:

> I am willing to test.  I just turned on UFS_FS_WRITE for the very first time
> running 4.12-rc4 and was able to copy a file of more than 2GB from one r/o
> FreeBSD subpartition to another r/w FreeBSD subpartition.
> 
> So it is already looking pretty good.

The nasty cases are around short files, especially short files with holes.
Linear writes as done by cp(1) will do nothing worse than bogus i_blocks
(and possibly mangled counters in cylinder groups).  Random write access
to short files, OTOH, steps into a lot more codepaths...

As for ->i_blocks, it triggers this:

root@kvm1:/mnt# df .; mkdir a; rmdir a; df .
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/loop0        507420  4504    462340   1% /mnt
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/loop0        507420  4536    462308   1% /mnt

Note the 32Kb (== one block on that ufs2) leaked here.
Every iteration will leak another one.  Similar for long
symlinks...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-08  2:20                   ` Al Viro
@ 2017-06-08 22:15                     ` Al Viro
  2017-06-08 22:36                       ` Linus Torvalds
  2017-06-09  0:11                     ` Richard Narron
  1 sibling, 1 reply; 21+ messages in thread
From: Al Viro @ 2017-06-08 22:15 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Thu, Jun 08, 2017 at 03:20:57AM +0100, Al Viro wrote:
> On Wed, Jun 07, 2017 at 05:35:31PM -0700, Richard Narron wrote:
> 
> > I am willing to test.  I just turned on UFS_FS_WRITE for the very first time
> > running 4.12-rc4 and was able to copy a file of more than 2GB from one r/o
> > FreeBSD subpartition to another r/w FreeBSD subpartition.
> > 
> > So it is already looking pretty good.
> 
> The nasty cases are around short files, especially short files with holes.
> Linear writes as done by cp(1) will do nothing worse than bogus i_blocks
> (and possibly mangled counters in cylinder groups).  Random write access
> to short files, OTOH, steps into a lot more codepaths...
> 
> As for ->i_blocks, it triggers this:
> 
> root@kvm1:/mnt# df .; mkdir a; rmdir a; df .
> Filesystem     1K-blocks  Used Available Use% Mounted on
> /dev/loop0        507420  4504    462340   1% /mnt
> Filesystem     1K-blocks  Used Available Use% Mounted on
> /dev/loop0        507420  4536    462308   1% /mnt
> 
> Note the 32Kb (== one block on that ufs2) leaked here.
> Every iteration will leak another one.  Similar for long
> symlinks...

Spot the bogosity:

static inline int _ubh_isblockset_(struct ufs_sb_private_info * uspi,
        struct ufs_buffer_head * ubh, unsigned begin, unsigned block)
{
        switch (uspi->s_fpb) {
        case 8:
                return (*ubh_get_addr (ubh, begin + block) == 0xff);
        case 4:
                return (*ubh_get_addr (ubh, begin + (block >> 1)) == (0x0f << ((block & 0x01) << 2)));
        case 2:
                return (*ubh_get_addr (ubh, begin + (block >> 2)) == (0x03 << ((block & 0x03) << 1)));
        case 1:
                return (*ubh_get_addr (ubh, begin + (block >> 3)) == (0x01 << (block & 0x07)));
        }
        return 0;
}

with

static inline void _ubh_setblock_(struct ufs_sb_private_info * uspi,
        struct ufs_buffer_head * ubh, unsigned begin, unsigned block)
{
        switch (uspi->s_fpb) {
        case 8:
                *ubh_get_addr(ubh, begin + block) = 0xff;
                return;
        case 4:
                *ubh_get_addr(ubh, begin + (block >> 1)) |= (0x0f << ((block & 0x01) << 2));
                return;
        case 2:
                *ubh_get_addr(ubh, begin + (block >> 2)) |= (0x03 << ((block & 0x03) << 1));
                return;
        case 1:
                *ubh_get_addr(ubh, begin + (block >> 3)) |= (0x01 << ((block & 0x07)));
                return;
        }
}

The only saving grace is that UFS defaults to 8 fragments per block...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-08 22:15                     ` Al Viro
@ 2017-06-08 22:36                       ` Linus Torvalds
  0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2017-06-08 22:36 UTC (permalink / raw)
  To: Al Viro; +Cc: Richard Narron, Will B, linux-fsdevel, Evgeniy Dushistov

On Thu, Jun 8, 2017 at 3:15 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Spot the bogosity:

Heh.. No masking on the testing side.

> The only saving grace is that UFS defaults to 8 fragments per block...

Yeah, that case just works, because it's the whole byte.

                Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-08  2:20                   ` Al Viro
  2017-06-08 22:15                     ` Al Viro
@ 2017-06-09  0:11                     ` Richard Narron
  2017-06-09  3:35                       ` Al Viro
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Narron @ 2017-06-09  0:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Thu, 8 Jun 2017, Al Viro wrote:

> The nasty cases are around short files, especially short files with holes.
> Linear writes as done by cp(1) will do nothing worse than bogus i_blocks
> (and possibly mangled counters in cylinder groups).  Random write access
> to short files, OTOH, steps into a lot more codepaths...
>
> As for ->i_blocks, it triggers this:
>
> root@kvm1:/mnt# df .; mkdir a; rmdir a; df .
> Filesystem     1K-blocks  Used Available Use% Mounted on
> /dev/loop0        507420  4504    462340   1% /mnt
> Filesystem     1K-blocks  Used Available Use% Mounted on
> /dev/loop0        507420  4536    462308   1% /mnt
>
> Note the 32Kb (== one block on that ufs2) leaked here.
> Every iteration will leak another one.  Similar for long
> symlinks...
>
Test results don't look pretty on FreeBSD.  (I will also test OpenBSD and 
NetBSD.)

Freebsd:
   #newfs ada0s2d
Linux:
   #mount -t ufs -o rw,ufstype=ufs2 /dev/sda18 /fbsdd
   cd /fbsdd
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda18      20307184     8  18682632   1% /fbsdd
   #mkdir a
   #rmdir a
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda18      20307184    40  18682600   1% /fbsdd
   #umount /dev/sda18
FreeBSD:
   # fsck -n /dev/ada0s2d
   ** /dev/ada0s2d (NO WRITE)
   ** Last Mounted on /diskd
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups

   FREE BLK COUNT(S) WRONG IN SUPERBLK
   SALVAGE? no

   SUMMARY INFORMATION BAD
   SALVAGE? no

   BLK(S) MISSING IN BIT MAPS
   SALVAGE? no

   2 files, 2 used, 5076786 free (26 frags, 634595 blocks, 0.0%
   fragmentation)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-09  0:11                     ` Richard Narron
@ 2017-06-09  3:35                       ` Al Viro
  2017-06-09 17:34                         ` Al Viro
                                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Al Viro @ 2017-06-09  3:35 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Thu, Jun 08, 2017 at 05:11:39PM -0700, Richard Narron wrote:

> Test results don't look pretty on FreeBSD.  (I will also test OpenBSD and
> NetBSD.)

OK, here's the cumulative diff so far - easy-to-backport parts only; that'll
be split into 6 commits (plus whatever else gets added).   It really needs
beating...

diff --git a/fs/stat.c b/fs/stat.c
index f494b182c7c7..c35610845ab1 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -672,6 +672,7 @@ void __inode_add_bytes(struct inode *inode, loff_t bytes)
 		inode->i_bytes -= 512;
 	}
 }
+EXPORT_SYMBOL(__inode_add_bytes);
 
 void inode_add_bytes(struct inode *inode, loff_t bytes)
 {
diff --git a/fs/ufs/balloc.c b/fs/ufs/balloc.c
index a0376a2c1c29..d642cc0a8271 100644
--- a/fs/ufs/balloc.c
+++ b/fs/ufs/balloc.c
@@ -82,7 +82,8 @@ void ufs_free_fragments(struct inode *inode, u64 fragment, unsigned count)
 			ufs_error (sb, "ufs_free_fragments",
 				   "bit already cleared for fragment %u", i);
 	}
-	
+
+	inode_sub_bytes(inode, count << uspi->s_fshift);
 	fs32_add(sb, &ucg->cg_cs.cs_nffree, count);
 	uspi->cs_total.cs_nffree += count;
 	fs32_add(sb, &UFS_SB(sb)->fs_cs(cgno).cs_nffree, count);
@@ -184,6 +185,7 @@ void ufs_free_blocks(struct inode *inode, u64 fragment, unsigned count)
 			ufs_error(sb, "ufs_free_blocks", "freeing free fragment");
 		}
 		ubh_setblock(UCPI_UBH(ucpi), ucpi->c_freeoff, blkno);
+		inode_sub_bytes(inode, uspi->s_fpb << uspi->s_fshift);
 		if ((UFS_SB(sb)->s_flags & UFS_CG_MASK) == UFS_CG_44BSD)
 			ufs_clusteracct (sb, ucpi, blkno, 1);
 
@@ -494,6 +496,20 @@ u64 ufs_new_fragments(struct inode *inode, void *p, u64 fragment,
 	return 0;
 }		
 
+static bool try_add_frags(struct inode *inode, unsigned frags)
+{
+	unsigned size = frags * i_blocksize(inode);
+	spin_lock(&inode->i_lock);
+	__inode_add_bytes(inode, size);
+	if (unlikely((u32)inode->i_blocks != inode->i_blocks)) {
+		__inode_sub_bytes(inode, size);
+		spin_unlock(&inode->i_lock);
+		return false;
+	}
+	spin_unlock(&inode->i_lock);
+	return true;
+}
+
 static u64 ufs_add_fragments(struct inode *inode, u64 fragment,
 			     unsigned oldcount, unsigned newcount)
 {
@@ -530,6 +546,9 @@ static u64 ufs_add_fragments(struct inode *inode, u64 fragment,
 	for (i = oldcount; i < newcount; i++)
 		if (ubh_isclr (UCPI_UBH(ucpi), ucpi->c_freeoff, fragno + i))
 			return 0;
+
+	if (!try_add_frags(inode, count))
+		return 0;
 	/*
 	 * Block can be extended
 	 */
@@ -647,6 +666,7 @@ static u64 ufs_alloc_fragments(struct inode *inode, unsigned cgno,
 			ubh_setbit (UCPI_UBH(ucpi), ucpi->c_freeoff, goal + i);
 		i = uspi->s_fpb - count;
 
+		inode_sub_bytes(inode, i << uspi->s_fshift);
 		fs32_add(sb, &ucg->cg_cs.cs_nffree, i);
 		uspi->cs_total.cs_nffree += i;
 		fs32_add(sb, &UFS_SB(sb)->fs_cs(cgno).cs_nffree, i);
@@ -657,6 +677,8 @@ static u64 ufs_alloc_fragments(struct inode *inode, unsigned cgno,
 	result = ufs_bitmap_search (sb, ucpi, goal, allocsize);
 	if (result == INVBLOCK)
 		return 0;
+	if (!try_add_frags(inode, count))
+		return 0;
 	for (i = 0; i < count; i++)
 		ubh_clrbit (UCPI_UBH(ucpi), ucpi->c_freeoff, result + i);
 	
@@ -716,6 +738,8 @@ static u64 ufs_alloccg_block(struct inode *inode,
 		return INVBLOCK;
 	ucpi->c_rotor = result;
 gotit:
+	if (!try_add_frags(inode, uspi->s_fpb))
+		return 0;
 	blkno = ufs_fragstoblks(result);
 	ubh_clrblock (UCPI_UBH(ucpi), ucpi->c_freeoff, blkno);
 	if ((UFS_SB(sb)->s_flags & UFS_CG_MASK) == UFS_CG_44BSD)
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index 7e41aee7b69a..9bf10285c628 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -235,7 +235,8 @@ ufs_extend_tail(struct inode *inode, u64 writes_to,
 
 	p = ufs_get_direct_data_ptr(uspi, ufsi, block);
 	tmp = ufs_new_fragments(inode, p, lastfrag, ufs_data_ptr_to_cpu(sb, p),
-				new_size, err, locked_page);
+				new_size - (lastfrag & uspi->s_fpbmask), err,
+				locked_page);
 	return tmp != 0;
 }
 
@@ -284,7 +285,7 @@ ufs_inode_getfrag(struct inode *inode, unsigned index,
 			goal += uspi->s_fpb;
 	}
 	tmp = ufs_new_fragments(inode, p, ufs_blknum(new_fragment),
-				goal, uspi->s_fpb, err, locked_page);
+				goal, nfrags, err, locked_page);
 
 	if (!tmp) {
 		*err = -ENOSPC;
@@ -402,7 +403,9 @@ static int ufs_getfrag_block(struct inode *inode, sector_t fragment, struct buff
 
 	if (!create) {
 		phys64 = ufs_frag_map(inode, offsets, depth);
-		goto out;
+		if (phys64)
+			map_bh(bh_result, sb, phys64 + frag);
+		return 0;
 	}
 
         /* This code entered only while writing ....? */
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 131b2b77c818..d9aa2627c9df 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -746,6 +746,23 @@ static void ufs_put_super(struct super_block *sb)
 	return;
 }
 
+static u64 ufs_max_bytes(struct super_block *sb)
+{
+	struct ufs_sb_private_info *uspi = UFS_SB(sb)->s_uspi;
+	int bits = uspi->s_apbshift;
+	u64 res;
+
+	if (bits > 21)
+		res = ~0ULL;
+	else
+		res = UFS_NDADDR + (1LL << bits) + (1LL << (2*bits)) +
+			(1LL << (3*bits));
+
+	if (res >= (MAX_LFS_FILESIZE >> uspi->s_bshift))
+		return MAX_LFS_FILESIZE;
+	return res << uspi->s_bshift;
+}
+
 static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct ufs_sb_info * sbi;
@@ -1212,6 +1229,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 			    "fast symlink size (%u)\n", uspi->s_maxsymlinklen);
 		uspi->s_maxsymlinklen = maxsymlen;
 	}
+	sb->s_maxbytes = ufs_max_bytes(sb);
 	sb->s_max_links = UFS_LINK_MAX;
 
 	inode = ufs_iget(sb, UFS_ROOTINO);
diff --git a/fs/ufs/util.h b/fs/ufs/util.h
index b7fbf53dbc81..398019fb1448 100644
--- a/fs/ufs/util.h
+++ b/fs/ufs/util.h
@@ -473,15 +473,19 @@ static inline unsigned _ubh_find_last_zero_bit_(
 static inline int _ubh_isblockset_(struct ufs_sb_private_info * uspi,
 	struct ufs_buffer_head * ubh, unsigned begin, unsigned block)
 {
+	u8 mask;
 	switch (uspi->s_fpb) {
 	case 8:
 	    	return (*ubh_get_addr (ubh, begin + block) == 0xff);
 	case 4:
-		return (*ubh_get_addr (ubh, begin + (block >> 1)) == (0x0f << ((block & 0x01) << 2)));
+		mask = 0x0f << ((block & 0x01) << 2);
+		return (*ubh_get_addr (ubh, begin + (block >> 1)) & mask) == mask;
 	case 2:
-		return (*ubh_get_addr (ubh, begin + (block >> 2)) == (0x03 << ((block & 0x03) << 1)));
+		mask = 0x03 << ((block & 0x03) << 1);
+		return (*ubh_get_addr (ubh, begin + (block >> 2)) & mask) == mask;
 	case 1:
-		return (*ubh_get_addr (ubh, begin + (block >> 3)) == (0x01 << (block & 0x07)));
+		mask = 0x01 << (block & 0x07);
+		return (*ubh_get_addr (ubh, begin + (block >> 3)) & mask) == mask;
 	}
 	return 0;	
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-09  3:35                       ` Al Viro
@ 2017-06-09 17:34                         ` Al Viro
  2017-06-09 21:55                         ` Richard Narron
  2017-06-10  0:09                         ` Richard Narron
  2 siblings, 0 replies; 21+ messages in thread
From: Al Viro @ 2017-06-09 17:34 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Fri, Jun 09, 2017 at 04:35:26AM +0100, Al Viro wrote:
> On Thu, Jun 08, 2017 at 05:11:39PM -0700, Richard Narron wrote:
> 
> > Test results don't look pretty on FreeBSD.  (I will also test OpenBSD and
> > NetBSD.)
> 
> OK, here's the cumulative diff so far - easy-to-backport parts only; that'll
> be split into 6 commits (plus whatever else gets added).   It really needs
> beating...

FWIW, so far it seems to survive xfstest generic/*, modulo simulated power
loss - I'm running it without -o sync and we don't have UFS2 journalling
support, so that's to be expected...  Tons of tests don't run due to lack
of various (mis)features, so it's not _that_ much, and there's nothing
that would try to deliberately hit UFS-specific interesting cases.
xattrs and acls can be supported reasonably easily, so can quota.
O_DIRECT is a real bitch for fragment reallocation handling - no idea
how painful would that be.

UFS2 journal support is probably a lot more massive work than I'm willing
to go into.

Another bug I see there is recovery after failing copy from userland in
write() on append-only file.  We have allocated blocks already, so we
might need to truncate the damn things.  However, ufs_truncate_blocks()
will see IS_APPEND(inode) and bail out, leaving garbage in the end of
file.  Not that hard to fix - these checks are simply not needed in the
ufs_write_failed() case.

I'm not happy with the way tail unpacking is done - we *probably* manage
to avoid deadlocks, but the proof is a whole lot more subtle than I'd like,
assuming it is correct in the first place.  And we have a nasty trap
caused by the way balloc works: when doing reallocation on failing
attempt to extend tail in place we do have logics that tries to put the
new copy into an empty block if filesystem is not too fragmented, but
the *first* allocation has nothing of that sort going on.  So if you
have a block with 7 fragments in it in each cylinder group (just create
a bunch of  28Kb files in different directories), any attempt to write
more than 4K into a new file will *always* go like this:
	* for the first page, allocate 4Kb fragment.  That has a goof
chance of going into that almost full block - all 
	* for the next page, notice that we need to expand that tail
and can't do that in place.  Now the anti-fragmentation heuristics
hits and we pick two fragments in an empty block.  And copy the one
we'd just written into the new place.
	* next 6 pages go extending the tail we'd got.  However, on
the next page the whole thing repeats.

	FreeBSD avoids that mess by doing bigger allocations - in the
same scenario it would've gone in 32Kb steps rather than 4Kb ones.
Looks like we need a different ->write_iter() there; generic one is
bloody painful in that respect...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-09  3:35                       ` Al Viro
  2017-06-09 17:34                         ` Al Viro
@ 2017-06-09 21:55                         ` Richard Narron
  2017-06-10  0:09                         ` Richard Narron
  2 siblings, 0 replies; 21+ messages in thread
From: Richard Narron @ 2017-06-09 21:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Fri, 9 Jun 2017, Al Viro wrote:

> On Thu, Jun 08, 2017 at 05:11:39PM -0700, Richard Narron wrote:
>
>> Test results don't look pretty on FreeBSD.  (I will also test OpenBSD and
>> NetBSD.)
>
> OK, here's the cumulative diff so far - easy-to-backport parts only; that'll
> be split into 6 commits (plus whatever else gets added).   It really needs
> beating...
>
This looks much better.

The Linux "df ." test looks good.  No more missing 32k.

cd /fbsdd
#df .
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sda18      20307184     8  18682632   1% /fbsdd
#mkdir a
#rmdir a
#df .
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sda18      20307184     8  18682632   1% /fbsdd

And the FreeBSD "fsck" looks good:
  # fsck -f ada0s2d
** /dev/ada0s2d
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2 files, 2 used, 5076794 free (26 frags, 634596 blocks, 0.0% 
fragmentation)
***** FILE SYSTEM IS CLEAN *****

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: UFS s_maxbytes bogosity
  2017-06-09  3:35                       ` Al Viro
  2017-06-09 17:34                         ` Al Viro
  2017-06-09 21:55                         ` Richard Narron
@ 2017-06-10  0:09                         ` Richard Narron
  2 siblings, 0 replies; 21+ messages in thread
From: Richard Narron @ 2017-06-10  0:09 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Will B, linux-fsdevel, Evgeniy Dushistov

On Fri, 9 Jun 2017, Al Viro wrote:

> On Thu, Jun 08, 2017 at 05:11:39PM -0700, Richard Narron wrote:
>
>> Test results don't look pretty on FreeBSD.  (I will also test OpenBSD and
>> NetBSD.)
>
> OK, here's the cumulative diff so far - easy-to-backport parts only; that'll
> be split into 6 commits (plus whatever else gets added).   It really needs
> beating...

On OpenBSD 6.1 (UFS-1) the "df ." test did not look so good before 
applying the cumulative diff:

Linux 4.12-rc4:
   #mount -t ufs -o rw,ufstype=44bsd /dev/sda19 /diske
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda19      16511214     2  15685652   1% /diske
   #mkdir a
   #rmdir a
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda19      16511214    18  15685636   1% /diske

OpenBSD 6.1:
   #fsck -f sd0e
   ** /dev/rsd0e (NO WRITE)
   ** File system is already clean
   ** Last Mounted on
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups
   FREE BLK COUNT(S) WRONG IN SUPERBLK
   SALVAGE? no

   SUMMARY INFORMATION BAD
   SALVAGE? no

   BLK(S) MISSING IN BIT MAPS
   SALVAGE? no

   1 files, 1 used, 8255598 free (14 frags, 1031948 blocks, 0.0%
   fragmentation)

But after applying the accumulative diff patch set it looks great:

Linux 4.12-rc4-viro-1:
   #mount -t ufs -o rw,ufstype=44bsd /dev/sda19 /diske
   #cd /diske
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda19      16511214     2  15685652   1% /diske
   #mkdir a
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda19      16511214     4  15685650   1% /diske
   #rmdir a
   #df .
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda19      16511214     2  15685652   1% /diske

OpenBSD 6.1:
   #fsck -f sd0e
   ** /dev/rsd0e
   ** File system is already clean
   ** Last Mounted on
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups
   1 files, 1 used, 8255606 free (14 frags, 1031949 blocks, 0.0%
   fragmentation)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-06-10  0:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-04 19:31 UFS s_maxbytes bogosity Linus Torvalds
2017-06-04 21:37 ` Al Viro
2017-06-04 21:58   ` Al Viro
2017-06-04 22:06     ` Al Viro
2017-06-04 23:26       ` Linus Torvalds
2017-06-05  0:11         ` Al Viro
2017-06-05  3:00           ` Linus Torvalds
2017-06-05  3:49             ` Al Viro
2017-06-07 23:48               ` Al Viro
2017-06-08  0:35                 ` Richard Narron
2017-06-08  2:20                   ` Al Viro
2017-06-08 22:15                     ` Al Viro
2017-06-08 22:36                       ` Linus Torvalds
2017-06-09  0:11                     ` Richard Narron
2017-06-09  3:35                       ` Al Viro
2017-06-09 17:34                         ` Al Viro
2017-06-09 21:55                         ` Richard Narron
2017-06-10  0:09                         ` Richard Narron
2017-06-04 22:32     ` Theodore Ts'o
2017-06-05  0:02       ` Al Viro
2017-06-04 23:23 ` [PATCH 1/1] fs/ufs: Set UFS default maximum bytes per file Richard Narron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.