linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [git pull] first batch of ufs fixes
@ 2017-06-09 21:38 Al Viro
  2017-06-10 13:03 ` Richard Narron
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-06-09 21:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

	That's just the obvious backport fodder; I'm pretty sure that there
will be more - definitely so wrt performance and quite possibly correctness
as well.

The following changes since commit a8c39544a6eb2093c04afd5005b6192bd0e880c6:

  osf_wait4(): fix infoleak (2017-05-21 13:10:07 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

for you to fetch changes up to babef37dccbaa49249a22bae9150686815d7be71:

  excessive checks in ufs_write_failed() and ufs_evict_inode() (2017-06-09 16:28:01 -0400)

----------------------------------------------------------------
Al Viro (7):
      ufs: restore proper tail allocation
      fix ufs_isblockset()
      ufs: restore maintaining ->i_blocks
      ufs: set correct ->s_maxsize
      ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
      ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
      excessive checks in ufs_write_failed() and ufs_evict_inode()

 fs/stat.c       |  1 +
 fs/ufs/balloc.c | 26 +++++++++++++++++++++++++-
 fs/ufs/inode.c  | 27 +++++++++++----------------
 fs/ufs/super.c  | 18 ++++++++++++++++++
 fs/ufs/util.h   | 10 +++++++---
 5 files changed, 62 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-09 21:38 [git pull] first batch of ufs fixes Al Viro
@ 2017-06-10 13:03 ` Richard Narron
  2017-06-10 16:07   ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Narron @ 2017-06-10 13:03 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Fri, 9 Jun 2017, Al Viro wrote:

> 	That's just the obvious backport fodder; I'm pretty sure that there
> will be more - definitely so wrt performance and quite possibly correctness
> as well.

These fixes improve the ufs code and they are a good start.

Here are a couple of bugs that still remain:

1) Just after creating a new filesystem on FreeBSD 11.0, the FreeBSD df 
is different from the Linux 4.12.0-rc4 with the ufs fixes.  The available 
count on Linux is is 16k higher than on FreeBSD:

FreeBSD 11.0:
   #df /diske
   Filesystem   1K-blocks Used    Avail Capacity  Mounted on
   /dev/ada0s3e  14217008    8 13079640     0%    /diske

Linux 4.12-0-rc4 with ufs fixes:
   #df /fbsd23
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda23      14217008     8  13079656   1% /fbsd23

2) After creating a new filesystem on FreeBSD, then on Linux copying 
a larger than 2GB file and creating a directory, the fsck back on FreeBSD 
looks ok.

But after going back to Linux and removing the large file and removing the 
directory, the fsck on FreeBSD looks not so good:

Linux:
   #cp /fbsd/tmp/usr-local.tar /fbsd23
   #mkdir                      /fbsd23/a

FreeBSD:
   #fsck -f /dev/ada0s3e
   ** /dev/ada0s3e
   ** Last Mounted on /diske
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups
   4 files, 636219 used, 2918033 free (25 frags, 364751 blocks, 0.0%
   fragmentation)
   ***** FILE SYSTEM IS CLEAN *****

Linux:
   #cd /fbsd23
   rm    usr-local.tar
   rmdir a

FreeBSD:
   #fsck -n -f /dev/ada0s3e
   ** /dev/ada0s3e (NO WRITE)
   ** Last Mounted on /diske
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   UNREF DIR  I=5  OWNER=root MODE=40755
   SIZE=512 MTIME=Jun 10 01:51 2017
   RECONNECT? no

   ** Phase 4 - Check Reference Counts
   LINK COUNT DIR I=2  OWNER=root MODE=40755
   SIZE=512 MTIME=Jun 10 02:04 2017  COUNT 3 SHOULD BE 4
   ADJUST? no

   UNREF FILE  I=4  OWNER=root MODE=100644
   SIZE=2605231616 MTIME=Jun 10 01:47 2017
   RECONNECT? no

   CLEAR? no

   LINK COUNT DIR I=5  OWNER=root MODE=40755
   SIZE=512 MTIME=Jun 10 01:51 2017  COUNT 2 SHOULD BE 1
   ADJUST? no

   ** Phase 5 - Check Cyl groups
   FREE BLK COUNT(S) WRONG IN SUPERBLK
   SALVAGE? no

   SUMMARY INFORMATION BAD
   SALVAGE? no

   BLK(S) MISSING IN BIT MAPS
   SALVAGE? no

   4 files, 636219 used, 3554250 free (26 frags, 444278 blocks, 0.0%
   fragmentation)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-10 13:03 ` Richard Narron
@ 2017-06-10 16:07   ` Al Viro
  2017-06-10 18:08     ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-06-10 16:07 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Sat, Jun 10, 2017 at 06:03:24AM -0700, Richard Narron wrote:

> 2) After creating a new filesystem on FreeBSD, then on Linux copying a
> larger than 2GB file and creating a directory, the fsck back on FreeBSD
> looks ok.
> 
> But after going back to Linux and removing the large file and removing the
> directory, the fsck on FreeBSD looks not so good:

What happens is ufs_evict_inode() buggering off without syncing the inode
in case of final removal.  Incremental on top of that branch is
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index 34f11cf0900a..da553ffec85b 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -848,6 +848,7 @@ void ufs_evict_inode(struct inode * inode)
 		    (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
 		     S_ISLNK(inode->i_mode)))
 			ufs_truncate_blocks(inode);
+		ufs_update_inode(inode, inode_needs_sync(inode));
 	}
 
 	invalidate_inode_buffers(inode);

Committed and pushed out...

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-10 16:07   ` Al Viro
@ 2017-06-10 18:08     ` Al Viro
  2017-06-10 18:12       ` Linus Torvalds
  2017-06-11 19:47       ` Richard Narron
  0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2017-06-10 18:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

On Sat, Jun 10, 2017 at 05:07:38PM +0100, Al Viro wrote:
> On Sat, Jun 10, 2017 at 06:03:24AM -0700, Richard Narron wrote:
> 
> > 2) After creating a new filesystem on FreeBSD, then on Linux copying a
> > larger than 2GB file and creating a directory, the fsck back on FreeBSD
> > looks ok.
> > 
> > But after going back to Linux and removing the large file and removing the
> > directory, the fsck on FreeBSD looks not so good:
> 
> What happens is ufs_evict_inode() buggering off without syncing the inode
> in case of final removal.  Incremental on top of that branch is
> diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
> index 34f11cf0900a..da553ffec85b 100644
> --- a/fs/ufs/inode.c
> +++ b/fs/ufs/inode.c
> @@ -848,6 +848,7 @@ void ufs_evict_inode(struct inode * inode)
>  		    (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
>  		     S_ISLNK(inode->i_mode)))
>  			ufs_truncate_blocks(inode);
> +		ufs_update_inode(inode, inode_needs_sync(inode));
>  	}
>  
>  	invalidate_inode_buffers(inode);
> 
> Committed and pushed out...

BTW, should I send an updated pull request in such situation?  It's the
same branch with the one-liner above added on top, head should be
at 67a70017fa0a152657bc7e337e69bb9c9f5549bf, stats
Al Viro (8):
      ufs: restore proper tail allocation
      fix ufs_isblockset()
      ufs: restore maintaining ->i_blocks
      ufs: set correct ->s_maxsize
      ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
      ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
      excessive checks in ufs_write_failed() and ufs_evict_inode()
      ufs: we need to sync inode before freeing it

 fs/stat.c       |  1 +
 fs/ufs/balloc.c | 26 +++++++++++++++++++++++++-
 fs/ufs/inode.c  | 28 ++++++++++++----------------
 fs/ufs/super.c  | 18 ++++++++++++++++++
 fs/ufs/util.h   | 10 +++++++---
 5 files changed, 63 insertions(+), 20 deletions(-)

If you prefer the full git-request-pull output (I've no idea what kind
of scripts you are using and how much PITA do they prevent), please
yell.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-10 18:08     ` Al Viro
@ 2017-06-10 18:12       ` Linus Torvalds
  2017-06-11 19:47       ` Richard Narron
  1 sibling, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2017-06-10 18:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Linux Kernel Mailing List, linux-fsdevel

On Sat, Jun 10, 2017 at 11:08 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, should I send an updated pull request in such situation?

It's better if you do, although in this case it was obvious that you'd
just added a single line and I could see the diffstat still match with
that addition.

But in general it just makes things easier for me when I see that
updated pull request, and it is obvious that "yes, Al clearly meant me
to pull that, despite it not matching the original pull request".

                   Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-10 18:08     ` Al Viro
  2017-06-10 18:12       ` Linus Torvalds
@ 2017-06-11 19:47       ` Richard Narron
  2017-06-11 21:30         ` Al Viro
  2017-06-12  6:14         ` Al Viro
  1 sibling, 2 replies; 19+ messages in thread
From: Richard Narron @ 2017-06-11 19:47 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

The ufs fixes are looking good, much better than what we started with.

After applying the 8 ufs patches:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
/commit/?id=5faab9e0f03c4eef97886b45436015e107f79f5f

Copying files to FreeBSD 11.0 (ufs2) and NetBSD 7.1 (ufs2) looks
pretty good.

I can copy a large >2GB file, create a directory and then run and 
fsck without errors with a FreeBSD disk and a NetBSD disk.

But...

1) Creating an OpenBSD 6.1 (44bsd) disk and then on Linux copying a 
large > 2GB file and creating a directory, there are errors on OpenBSD
with the fsck:

OpenBSD (44bsd):
   #fsck sd0e
   ** /dev/rsd0e
   ** File system is already clean
   ** Last Mounted on /diske
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups
   FREE BLK COUNT(S) WRONG IN SUPERBLK
   SALVAGE? [Fyn?]
   3 files, 1272410 used, 6983197 free (13 frags, 872898 blocks, 0.0% fragmentation)

   ***** FILE SYSTEM WAS MODIFIED *****

And also after removing the files on Linux, there were errors in
the fsck on OpenBSD:

   #fsck sd0e
   ** /dev/rsd0e
   ** File system is already clean
   ** Last Mounted on /diske
   ** Phase 1 - Check Blocks and Sizes
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups
   FREE BLK COUNT(S) WRONG IN SUPERBLK
   SALVAGE? [Fyn?]
   1 files, 1 used, 8255606 free (14 frags, 1031949 blocks, 0.0% fragmentation)

   ***** FILE SYSTEM WAS MODIFIED *****

2) The available block counts on a newly created FreeBSD 11.0
     ufs2 filesystem (13079640) differ from Linux 4.12-rc4 (13079656):

Freebsd (ufs2):
   #df /diske
   Filesystem   1K-blocks Used    Avail Capacity  Mounted on
   /dev/ada0s3e  14217008    8 13079640     0%    /diske
Linux:
   #df /fbsd23
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda23      14217008     8  13079656   1% /fbsd23

3) The available block counts on a newly created NetBSD 7.1
     ufs2 filesystem (9518316) differ from Linux 4.12-rc4 (9518314):

NetBSD (ufs2):
   #df /nbsdf
   Filesystem     1K-blocks  Used Available Use% Mounted on
   /dev/sda22      10019278     2   9518316   1% /nbsdf
Linux:
   #df /diskf
   Filesystem    1K-blocks       Used      Avail %Cap Mounted on
   /dev/wd0f      10019278          2    9518314   0% /diskf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-11 19:47       ` Richard Narron
@ 2017-06-11 21:30         ` Al Viro
  2017-06-12  6:14         ` Al Viro
  1 sibling, 0 replies; 19+ messages in thread
From: Al Viro @ 2017-06-11 21:30 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Sun, Jun 11, 2017 at 12:47:40PM -0700, Richard Narron wrote:

> 1) Creating an OpenBSD 6.1 (44bsd) disk and then on Linux copying a large >
> 2GB file and creating a directory, there are errors on OpenBSD
> with the fsck:
> 
> OpenBSD (44bsd):
>   #fsck sd0e
>   ** /dev/rsd0e
>   ** File system is already clean
>   ** Last Mounted on /diske
>   ** Phase 1 - Check Blocks and Sizes
>   ** Phase 2 - Check Pathnames
>   ** Phase 3 - Check Connectivity
>   ** Phase 4 - Check Reference Counts
>   ** Phase 5 - Check Cyl groups
>   FREE BLK COUNT(S) WRONG IN SUPERBLK
>   SALVAGE? [Fyn?]
>   3 files, 1272410 used, 6983197 free (13 frags, 872898 blocks, 0.0% fragmentation)
> 
>   ***** FILE SYSTEM WAS MODIFIED *****
> 
> And also after removing the files on Linux, there were errors in
> the fsck on OpenBSD:
> 
>   #fsck sd0e
>   ** /dev/rsd0e
>   ** File system is already clean
>   ** Last Mounted on /diske
>   ** Phase 1 - Check Blocks and Sizes
>   ** Phase 2 - Check Pathnames
>   ** Phase 3 - Check Connectivity
>   ** Phase 4 - Check Reference Counts
>   ** Phase 5 - Check Cyl groups
>   FREE BLK COUNT(S) WRONG IN SUPERBLK
>   SALVAGE? [Fyn?]
>   1 files, 1 used, 8255606 free (14 frags, 1031949 blocks, 0.0% fragmentation)
> 
>   ***** FILE SYSTEM WAS MODIFIED *****

Interesting...  The main difference between UFS1 and UFS2 in that area
is the misbegotten rotation latency optimizations (always ugly as hell,
pointless by mid-90s, dropped completely in UFS2 and dropped in FreeBSD
UFS1 in 2002, at the same time when Kirk had merged UFS2).  I hadn't
looked at OpenBSD situation, but their filesystem and VFS side tends
to be, er, somewhat antique.  So that would be my first guess; I'll
resurrect openbsd-in-KVM setup here and see what's really going on...

> 2) The available block counts on a newly created FreeBSD 11.0
>     ufs2 filesystem (13079640) differ from Linux 4.12-rc4 (13079656):
> 
> Freebsd (ufs2):
>   #df /diske
>   Filesystem   1K-blocks Used    Avail Capacity  Mounted on
>   /dev/ada0s3e  14217008    8 13079640     0%    /diske
> Linux:
>   #df /fbsd23
>   Filesystem     1K-blocks  Used Available Use% Mounted on
>   /dev/sda23      14217008     8  13079656   1% /fbsd23

Linux:
        buf->f_bavail = (buf->f_bfree > (((long)buf->f_blocks / 100) * uspi->s_minfree))
                ? (buf->f_bfree - (((long)buf->f_blocks / 100) * uspi->s_minfree)) : 0;
FreeBSD:
	sbp->f_bavail = freespace(fs, fs->fs_minfree) +
            dbtofsb(fs, fs->fs_pendingblocks);
#define freespace(fs, percentreserved) \
        (blkstofrags((fs), (fs)->fs_cstotal.cs_nbfree) + \
        (fs)->fs_cstotal.cs_nffree - \
        (((off_t)((fs)->fs_dsize)) * (percentreserved) / 100))

Note that all those values are in units of fragments, so the size is
not 14217008, it's 3554252.  Now, you clearly have minfree at 8 and
we get (3554252/100)*8 vs. (3554252*8)/100, i.e. 284336 and 284340
respectively.  There's your 16Kb of difference...

> 3) The available block counts on a newly created NetBSD 7.1
>     ufs2 filesystem (9518316) differ from Linux 4.12-rc4 (9518314):
> 
> NetBSD (ufs2):
>   #df /nbsdf
>   Filesystem     1K-blocks  Used Available Use% Mounted on
>   /dev/sda22      10019278     2   9518316   1% /nbsdf
> Linux:
>   #df /diskf
>   Filesystem    1K-blocks       Used      Avail %Cap Mounted on
>   /dev/wd0f      10019278          2    9518314   0% /diskf

Huh?  The other way round?  I'm fairly sure you've mislabeled those -
/dev/wd* makes a plausible NetBSD device name, but /dev/sda22 doesn't.
And "Use%" vs "%Cap" also points in the same direction.  If the first
one is on Linux and the second on NetBSD, I'm fairly certain that it's
the same story.  And if it is, I'm not sure it's worth worrying about -
the difference will never be greater than minfree * fragment size, and
that's pretty much noise.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-11 19:47       ` Richard Narron
  2017-06-11 21:30         ` Al Viro
@ 2017-06-12  6:14         ` Al Viro
  2017-06-13  0:54           ` Richard Narron
  1 sibling, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-06-12  6:14 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Sun, Jun 11, 2017 at 12:47:40PM -0700, Richard Narron wrote:
> 1) Creating an OpenBSD 6.1 (44bsd) disk and then on Linux copying a large >
> 2GB file and creating a directory, there are errors on OpenBSD
> with the fsck:
> 
> OpenBSD (44bsd):
>   #fsck sd0e
>   ** /dev/rsd0e
>   ** File system is already clean
>   ** Last Mounted on /diske
>   ** Phase 1 - Check Blocks and Sizes
>   ** Phase 2 - Check Pathnames
>   ** Phase 3 - Check Connectivity
>   ** Phase 4 - Check Reference Counts
>   ** Phase 5 - Check Cyl groups
>   FREE BLK COUNT(S) WRONG IN SUPERBLK
>   SALVAGE? [Fyn?]
>   3 files, 1272410 used, 6983197 free (13 frags, 872898 blocks, 0.0% fragmentation)
> 
>   ***** FILE SYSTEM WAS MODIFIED *****

Can't reproduce...
# fsck -f /dev/rwd1c
** /dev/rwd1c
** File system is already clean
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
3 files, 1573258 used, 489445 free (13 frags, 61179 blocks, 0.0% fragmentation)
# uname -mrsv
OpenBSD 6.1 GENERIC#19 amd64
# file -s /dev/rwd1c 
/dev/rwd1c: Unix Fast File system [v1] (little-endian), last mounted on , last written at Mon Jun 12 06:03:57 2017, clean flag 1, number of blocks 2097152, number of data blocks 2062703, number of cylinder groups 21, block size 16384, fragment size 2048, minimum percentage of free blocks 5, rotational delay 0ms, disk rotational speed 60rps, TIME optimization

That's after
# mkdir /mnt/a; dd if=/dev/zero of=/mnt/a/foo bs=1M count=3072
on the Linux side, with
root@kvm1:~# uname -msrv
Linux 4.12.0-rc1+ #112 SMP Fri Jun 9 17:16:00 EDT 2017 x86_64
there.  -o loop mount on Linux (image living on 9p), direct -hdb ../9p/ufs on OpenBSD side
of things (both in KVM on the same host)...

Reproducer would be nice - ideally on the level of "device is this large, newfs was with
such and such options, series of operations done on the Linux side after mounting that
sucker".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-12  6:14         ` Al Viro
@ 2017-06-13  0:54           ` Richard Narron
  2017-06-13  1:43             ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Narron @ 2017-06-13  0:54 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Mon, 12 Jun 2017, Al Viro wrote:

> On Sun, Jun 11, 2017 at 12:47:40PM -0700, Richard Narron wrote:
>> 1) Creating an OpenBSD 6.1 (44bsd) disk and then on Linux copying a large >
>> 2GB file and creating a directory, there are errors on OpenBSD
>> with the fsck:
>>
>
> Can't reproduce...
> # fsck -f /dev/rwd1c
> ** /dev/rwd1c
> ** File system is already clean
> ** Last Mounted on
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> 3 files, 1573258 used, 489445 free (13 frags, 61179 blocks, 0.0% fragmentation)
> # uname -mrsv
> OpenBSD 6.1 GENERIC#19 amd64
> # file -s /dev/rwd1c
> /dev/rwd1c: Unix Fast File system [v1] (little-endian), last mounted on , last written at Mon Jun 12 06:03:57 2017, clean flag 1, number of blocks 2097152, number of data blocks 2062703, number of cylinder groups 21, block size 16384, fragment size 2048, minimum percentage of free blocks 5, rotational delay 0ms, disk rotational speed 60rps, TIME optimization
>
> That's after
> # mkdir /mnt/a; dd if=/dev/zero of=/mnt/a/foo bs=1M count=3072
> on the Linux side, with
> root@kvm1:~# uname -msrv
> Linux 4.12.0-rc1+ #112 SMP Fri Jun 9 17:16:00 EDT 2017 x86_64
> there.  -o loop mount on Linux (image living on 9p), direct -hdb ../9p/ufs on OpenBSD side
> of things (both in KVM on the same host)...

Earlier today I could not reproduce the OpenBSD 6.1 ufs1 fsck error after 
Linux 4.12-rc5 copy of my >2GB file using "cp".

But later today I get the error when I copy using your "dd" method...

In any case I always get a ufs1 fsck error after the Linux rm and rmdir.

Here is the error after rm of the files created by the "dd" method:

#/usr/bin/uname -mrsv
OpenBSD 6.1 GENERIC.MP#6 amd64
#/sbin/fsck -n -f /dev/sd0e
** /dev/rsd0e (NO WRITE)
** File system is already clean
** Last Mounted on /diske
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? no

1 files, 1 used, 6682349 free (13 frags, 835292 blocks, 0.0% 
fragmentation)
#/sbin/fsck -f /dev/sd0e
** /dev/rsd0e
** File system is already clean
** Last Mounted on /diske
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [Fyn?]
1 files, 1 used, 8255606 free (14 frags, 1031949 blocks, 0.0% 
fragmentation)

***** FILE SYSTEM WAS MODIFIED *****
#/sbin/mount /dev/sd0e
#/bin/df /diske
Filesystem  512-blocks      Used     Avail Capacity  Mounted on
/dev/sd0e     33022428         4  31371304     0%    /diske
#/sbin/mount /dev/sd0e
#/bin/df /diske
Filesystem  512-blocks      Used     Avail Capacity  Mounted on
/dev/sd0e     33022428         4  31371304     0%    /diske

OpenBSD 6.1 is not very stable for me.

I will test FreeBSD and NetBSD ufs1 to see if they have a problem...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-13  0:54           ` Richard Narron
@ 2017-06-13  1:43             ` Al Viro
  2017-06-13 21:56               ` Richard Narron
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-06-13  1:43 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Mon, Jun 12, 2017 at 05:54:06PM -0700, Richard Narron wrote:

> Earlier today I could not reproduce the OpenBSD 6.1 ufs1 fsck error after
> Linux 4.12-rc5 copy of my >2GB file using "cp".
> 
> But later today I get the error when I copy using your "dd" method...
> 
> In any case I always get a ufs1 fsck error after the Linux rm and rmdir.

Interesting...  Could you put together an image (starting with zeroing the
device before newfs, and ideally with dd from /dev/zero to create files)
that would
	a) pass fsck on OpenBSD
	b) after rm on Linux fail the same
then convert it to qcow2 and publish?  Or just compress it - all free and
data blocks would contain only zeroes, so any kind of compression (gzip,
bzip2, whatever) would reduce the size to something more managable...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-13  1:43             ` Al Viro
@ 2017-06-13 21:56               ` Richard Narron
  2017-06-14  7:11                 ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Narron @ 2017-06-13 21:56 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Tue, 13 Jun 2017, Al Viro wrote:

> On Mon, Jun 12, 2017 at 05:54:06PM -0700, Richard Narron wrote:
>
>> Earlier today I could not reproduce the OpenBSD 6.1 ufs1 fsck error after
>> Linux 4.12-rc5 copy of my >2GB file using "cp".
>>
>> But later today I get the error when I copy using your "dd" method...
>>
>> In any case I always get a ufs1 fsck error after the Linux rm and rmdir.
>
> Interesting...  Could you put together an image (starting with zeroing the
> device before newfs, and ideally with dd from /dev/zero to create files)
> that would
> 	a) pass fsck on OpenBSD
> 	b) after rm on Linux fail the same
> then convert it to qcow2 and publish?  Or just compress it - all free and
> data blocks would contain only zeroes, so any kind of compression (gzip,
> bzip2, whatever) would reduce the size to something more managable...

I created a gzip and sent you an email with the link to a UFS1 OpenBSD 
filesytem image.

I finished simple testing of UFS1 with FreeBSD and NetBSD and found no 
problems except for the differences between "available" blocks in df 
commands.

And testing UFS2 was fine with all 3 systems, FreeBSD, OpenBSD and 
NetBSD.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-13 21:56               ` Richard Narron
@ 2017-06-14  7:11                 ` Al Viro
  2017-06-14 20:33                   ` Richard Narron
  2017-06-15  8:00                   ` Al Viro
  0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2017-06-14  7:11 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Tue, Jun 13, 2017 at 02:56:23PM -0700, Richard Narron wrote:
> On Tue, 13 Jun 2017, Al Viro wrote:
> 
> > On Mon, Jun 12, 2017 at 05:54:06PM -0700, Richard Narron wrote:
> > 
> > > Earlier today I could not reproduce the OpenBSD 6.1 ufs1 fsck error after
> > > Linux 4.12-rc5 copy of my >2GB file using "cp".
> > > 
> > > But later today I get the error when I copy using your "dd" method...
> > > 
> > > In any case I always get a ufs1 fsck error after the Linux rm and rmdir.
> > 
> > Interesting...  Could you put together an image (starting with zeroing the
> > device before newfs, and ideally with dd from /dev/zero to create files)
> > that would
> > 	a) pass fsck on OpenBSD
> > 	b) after rm on Linux fail the same
> > then convert it to qcow2 and publish?  Or just compress it - all free and
> > data blocks would contain only zeroes, so any kind of compression (gzip,
> > bzip2, whatever) would reduce the size to something more managable...
> 
> I created a gzip and sent you an email with the link to a UFS1 OpenBSD
> filesytem image.
> 
> I finished simple testing of UFS1 with FreeBSD and NetBSD and found no
> problems except for the differences between "available" blocks in df
> commands.

AFAICS, what happens is a combination of OpenBSD and FreeBSD acting differently
on when reading UFS1 and Linux "[PATCH] ufs: make fsck -f happy" getting the
logics wrong.  First of all, on UFS1 writing a superblock always duplicates the
values into old locations, UFS_FLAGS_UPDATED or not.  Linux implementation
writes either only to new or only to old locations.  What's more, on the read
side the rules are different between FreeBSD and OpenBSD.  The former does
	if we hadn't set fs_un.fs_u2.fs_maxbsize to block size
		set it so
		read from old locations (and copy them to new ones)
The latter *always* reads from old locations.  It also sets FS_FLAGS_UPDATED
at the same spot (FreeBSD does it a bit upstream) and has an ifdefed out "if
flag is already set, bugger off" logics.

Hell knows...  Using FS_FLAGS_UPDATED as a predicate is wrong, due to OpenBSD
fsck clearing it when it modifies a superblock for any reason.  FWIW, using
fs_maxbsize as an indicator looks like a good idea.  The thing is, it lives
in place where the first two elements of ->opostbl used to be.  In filesystems
with ->s_postblformat equal to UFS_42POSTBLFMT.  Which excludes everything
created by 4.4 newfs; in fact, 4.3-Reno is already too recent for that.
All of those will have zeroes in the entire ->opostbl area.

AFAICS, a conservative approach would be
	* reject UFS_42POSTBLFMT for 44bsd ones - it's almost certainly
*not* one.
	* check if fs_maxbsize is equal to frag size; treat that as
"counts are read from new location and stored both to old and new".
44bsd fs_maxbsize != block size => not converted, just use old locations
for everything.  UFS2 => use new locations for everything, don't bother
with old ones.  IOW, something like this (WARNING: completely untested,
might screw your filesystem) might do.

NOTE: all I have is your image *after* it had counters buggered; I don't
know the exact sequence of operations that fucked it in your case.  One
way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount
on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD.  This patch
apparently fixes that, but your reproducer might be something different.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index d9aa2627c9df..eca838a8b43e 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -480,7 +480,7 @@ static void ufs_setup_cstotal(struct super_block *sb)
 	usb3 = ubh_get_usb_third(uspi);
 
 	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
-	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
+	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) ||
 	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
 		/*we have statistic in different place, then usual*/
 		uspi->cs_total.cs_ndir = fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir);
@@ -596,9 +596,7 @@ static void ufs_put_cstotal(struct super_block *sb)
 	usb2 = ubh_get_usb_second(uspi);
 	usb3 = ubh_get_usb_third(uspi);
 
-	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
-	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
-	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
+	if (mtype == UFS_MOUNT_UFSTYPE_UFS2) {
 		/*we have statistic in different place, then usual*/
 		usb2->fs_un.fs_u2.cs_ndir =
 			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
@@ -608,16 +606,26 @@ static void ufs_put_cstotal(struct super_block *sb)
 			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
 		usb3->fs_un1.fs_u2.cs_nffree =
 			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
-	} else {
-		usb1->fs_cstotal.cs_ndir =
-			cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
-		usb1->fs_cstotal.cs_nbfree =
-			cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
-		usb1->fs_cstotal.cs_nifree =
-			cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
-		usb1->fs_cstotal.cs_nffree =
-			cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
+		goto out;
 	}
+
+	if (mtype == UFS_MOUNT_UFSTYPE_44BSD &&
+	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) {
+		/* store stats in both old and new places */
+		usb2->fs_un.fs_u2.cs_ndir =
+			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
+		usb2->fs_un.fs_u2.cs_nbfree =
+			cpu_to_fs64(sb, uspi->cs_total.cs_nbfree);
+		usb3->fs_un1.fs_u2.cs_nifree =
+			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
+		usb3->fs_un1.fs_u2.cs_nffree =
+			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
+	}
+	usb1->fs_cstotal.cs_ndir = cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
+	usb1->fs_cstotal.cs_nbfree = cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
+	usb1->fs_cstotal.cs_nifree = cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
+	usb1->fs_cstotal.cs_nffree = cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
+out:
 	ubh_mark_buffer_dirty(USPI_UBH(uspi));
 	ufs_print_super_stuff(sb, usb1, usb2, usb3);
 	UFSD("EXIT\n");
@@ -997,6 +1005,13 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 		flags |=  UFS_ST_SUN;
 	}
 
+	if ((flags & UFS_ST_MASK) == UFS_ST_44BSD &&
+	    uspi->s_postblformat == UFS_42POSTBLFMT) {
+		if (!silent)
+			pr_err("this is not a 44bsd filesystem");
+		goto failed;
+	}
+
 	/*
 	 * Check ufs magic number
 	 */

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-14  7:11                 ` Al Viro
@ 2017-06-14 20:33                   ` Richard Narron
  2017-06-15  8:00                   ` Al Viro
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Narron @ 2017-06-14 20:33 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, 14 Jun 2017, Al Viro wrote:
...
> AFAICS, a conservative approach would be
> 	* reject UFS_42POSTBLFMT for 44bsd ones - it's almost certainly
> *not* one.
> 	* check if fs_maxbsize is equal to frag size; treat that as
> "counts are read from new location and stored both to old and new".
> 44bsd fs_maxbsize != block size => not converted, just use old locations
> for everything.  UFS2 => use new locations for everything, don't bother
> with old ones.  IOW, something like this (WARNING: completely untested,
> might screw your filesystem) might do.
>
> NOTE: all I have is your image *after* it had counters buggered; I don't
> know the exact sequence of operations that fucked it in your case.  One
> way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount
> on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD.  This patch
> apparently fixes that, but your reproducer might be something different.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/ufs/super.c b/fs/ufs/super.c
> index d9aa2627c9df..eca838a8b43e 100644
> --- a/fs/ufs/super.c
> +++ b/fs/ufs/super.c
> @@ -480,7 +480,7 @@ static void ufs_setup_cstotal(struct super_block *sb)
> 	usb3 = ubh_get_usb_third(uspi);
>
> 	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
> -	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
> +	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) ||
> 	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
> 		/*we have statistic in different place, then usual*/
> 		uspi->cs_total.cs_ndir = fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir);
> @@ -596,9 +596,7 @@ static void ufs_put_cstotal(struct super_block *sb)
> 	usb2 = ubh_get_usb_second(uspi);
> 	usb3 = ubh_get_usb_third(uspi);
>
> -	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
> -	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
> -	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
> +	if (mtype == UFS_MOUNT_UFSTYPE_UFS2) {
> 		/*we have statistic in different place, then usual*/
> 		usb2->fs_un.fs_u2.cs_ndir =
> 			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
> @@ -608,16 +606,26 @@ static void ufs_put_cstotal(struct super_block *sb)
> 			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
> 		usb3->fs_un1.fs_u2.cs_nffree =
> 			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
> -	} else {
> -		usb1->fs_cstotal.cs_ndir =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
> -		usb1->fs_cstotal.cs_nbfree =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
> -		usb1->fs_cstotal.cs_nifree =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
> -		usb1->fs_cstotal.cs_nffree =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
> +		goto out;
> 	}
> +
> +	if (mtype == UFS_MOUNT_UFSTYPE_44BSD &&
> +	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) {
> +		/* store stats in both old and new places */
> +		usb2->fs_un.fs_u2.cs_ndir =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
> +		usb2->fs_un.fs_u2.cs_nbfree =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_nbfree);
> +		usb3->fs_un1.fs_u2.cs_nifree =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
> +		usb3->fs_un1.fs_u2.cs_nffree =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
> +	}
> +	usb1->fs_cstotal.cs_ndir = cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
> +	usb1->fs_cstotal.cs_nbfree = cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
> +	usb1->fs_cstotal.cs_nifree = cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
> +	usb1->fs_cstotal.cs_nffree = cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
> +out:
> 	ubh_mark_buffer_dirty(USPI_UBH(uspi));
> 	ufs_print_super_stuff(sb, usb1, usb2, usb3);
> 	UFSD("EXIT\n");
> @@ -997,6 +1005,13 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
> 		flags |=  UFS_ST_SUN;
> 	}
>
> +	if ((flags & UFS_ST_MASK) == UFS_ST_44BSD &&
> +	    uspi->s_postblformat == UFS_42POSTBLFMT) {
> +		if (!silent)
> +			pr_err("this is not a 44bsd filesystem");
> +		goto failed;
> +	}
> +
> 	/*
> 	 * Check ufs magic number
> 	 */
>

Al this patch looks good to me (so far). I tested all 6 combinations of 
ufs1 and ufs2 in FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1

For each combination, I do 5 steps:

   1) BSD: Make a ufs filesystem
      dd if=/dev/zero to the BSD subpartition
      make (newfs) a ufs (1 or 2) filesystem on the BSD subpartiton

   2) Linux: Create a subdirectory and make a large file
      mkdir a
      dd if=/dev/zero bs=1M count=3072

   3) BSD: Check a ufs filesystem
      fsck -f

   4) Linux: Remove the large file and the subdirectory
      rm
      rmdir

   5) BSD; check a ufs filesystem
      fsck -f

Tested-By: Richard Narron <comet.berkeley@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-14  7:11                 ` Al Viro
  2017-06-14 20:33                   ` Richard Narron
@ 2017-06-15  8:00                   ` Al Viro
  2017-06-16 14:29                     ` Richard Narron
  1 sibling, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-06-15  8:00 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote:
> NOTE: all I have is your image *after* it had counters buggered; I don't
> know the exact sequence of operations that fucked it in your case.  One
> way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount
> on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD.  This patch
> apparently fixes that, but your reproducer might be something different.

FWIW, it seems to work here.  Said that, *BSD fsck_ffs is not worth much -
play a bit with redundancy in UFS superblock (starting with fragment
and block sizes, their ratio, logarithms, bitmasks, etc.) and you can
screw at least 10.3 into the ground when mounting an image that passes
their fsck.  Sure, anyone who mounts untrusted images is a cretin who
deserves everything they get, fsck or no fsck, but... no complaints from
fsck is not a reliable indicator of image being in good condition and
that's PITA for testing.

Another pile of fun: "reserve ->s_minfree percents of total" logics had
been broken.
	* using hardwired 5% is wrong - especially for ufs2, where it's
not even the default
	* ufs_freespace() returns u64; testing for <= 0 is not doing the
right thing
	* no capability checks before we need them, TYVM...
	* ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it).
64bit variants were even calculated - and never used.
	* while we are at it, doing "multiply the total data frags by
s_minfree and divide by 100" every time we allocate a block is bloody
dumb - that should be calculated once.

We really need to get the sodding tail unpacking moved up from the place
where it's buried - turns out that my doubts about that code managing to
avoid deadlocks had been correct.  Long-term we need to move that thing
to iomap-based ->write_iter() and do unpacking there and in truncate().
For now I've slapped together something that is easier to backport -
avoiding ->truncate_mutex when possible and not holding ->s_lock over
ufs_change_blocknr().

Another bug in the same area: ufs_get_locked_page() doesn't guarantee
that buffer_heads are attached (race with vmscan trying to evict the
page in question can end with buffer_heads freed and page left alive
and uptodate).  Callers do expect buffer_heads to be there, so we either
need to do create_empty_buffers() in those callers or in ufs_get_locked_page();
I went for the latter for now.

Off-by-one in ufs_truncate_blocks(): the logics when deciding whether
we need to do anything with direct blocks is broken when new size is
within the last direct block.  It's better to find the path to the
last byte _not_ to be removed and use that instead of the path to the
beginning of the first block to be freed.

I've pushed fixes for those into vfs.git#ufs-fixes; they do need more
testing before I send a pull request, though.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-15  8:00                   ` Al Viro
@ 2017-06-16 14:29                     ` Richard Narron
  2017-06-17  2:15                       ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Narron @ 2017-06-16 14:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, 15 Jun 2017, Al Viro wrote:

> On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote:
> FWIW, it seems to work here.  Said that, *BSD fsck_ffs is not worth much -
> play a bit with redundancy in UFS superblock (starting with fragment
> and block sizes, their ratio, logarithms, bitmasks, etc.) and you can
> screw at least 10.3 into the ground when mounting an image that passes
> their fsck.  Sure, anyone who mounts untrusted images is a cretin who
> deserves everything they get, fsck or no fsck, but... no complaints from
> fsck is not a reliable indicator of image being in good condition and
> that's PITA for testing.
>
> Another pile of fun: "reserve ->s_minfree percents of total" logics had
> been broken.
> 	* using hardwired 5% is wrong - especially for ufs2, where it's
> not even the default
> 	* ufs_freespace() returns u64; testing for <= 0 is not doing the
> right thing
> 	* no capability checks before we need them, TYVM...
> 	* ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it).
> 64bit variants were even calculated - and never used.
> 	* while we are at it, doing "multiply the total data frags by
> s_minfree and divide by 100" every time we allocate a block is bloody
> dumb - that should be calculated once.
>
> We really need to get the sodding tail unpacking moved up from the place
> where it's buried - turns out that my doubts about that code managing to
> avoid deadlocks had been correct.  Long-term we need to move that thing
> to iomap-based ->write_iter() and do unpacking there and in truncate().
> For now I've slapped together something that is easier to backport -
> avoiding ->truncate_mutex when possible and not holding ->s_lock over
> ufs_change_blocknr().
>
> Another bug in the same area: ufs_get_locked_page() doesn't guarantee
> that buffer_heads are attached (race with vmscan trying to evict the
> page in question can end with buffer_heads freed and page left alive
> and uptodate).  Callers do expect buffer_heads to be there, so we either
> need to do create_empty_buffers() in those callers or in ufs_get_locked_page();
> I went for the latter for now.
>
> Off-by-one in ufs_truncate_blocks(): the logics when deciding whether
> we need to do anything with direct blocks is broken when new size is
> within the last direct block.  It's better to find the path to the
> last byte _not_ to be removed and use that instead of the path to the
> beginning of the first block to be freed.
>
> I've pushed fixes for those into vfs.git#ufs-fixes; they do need more
> testing before I send a pull request, though.

The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5.
They seem to work fine with the simple testing that I do.

I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2 
filesystems, 44bsd (ufs1) and ufs2.
I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm 
large file, rmdir and BSD fsck in any of the 6 combinations.

Doing a "df" on BSD and Linux now match on the counts including the 
"Available" counts.

It might be worth testing with ufs filesystems using softdep and/or 
journaling.  Should the Linux mount command reject such filesystems?

Now that ufs write access is working more or less, we're dangerous.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-16 14:29                     ` Richard Narron
@ 2017-06-17  2:15                       ` Al Viro
  2017-06-18  1:09                         ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2017-06-17  2:15 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Fri, Jun 16, 2017 at 07:29:00AM -0700, Richard Narron wrote:

> The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5.
> They seem to work fine with the simple testing that I do.
> 
> I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2
> filesystems, 44bsd (ufs1) and ufs2.
> I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm
> large file, rmdir and BSD fsck in any of the 6 combinations.

FWIW, with xfstests I see the following:
	* _require_metadata_journaling needs to be told not to bother with
our ufs
	* generic/{409,410,411} lack $MOUNT_OPTIONS in several invocations
of mount -t $FSTYP; for ufs it's obviosuly a problem.  Trivial bug in tests,
fixing it makes them pass.
	* generic/258 (timestamp wraparound) fails; fs is left intact
	* generic/426 (fhandle stuff) fails and buggers the filesystem
Everything else passes with no fs corruption that could be detected by
fsck.ufs.  Note that crash simulation tests are *not* run - no point
even trying.  I tried several of those with -o sync; they seem to pass,
but there's free blocks, etc. stats summaries being slightly incorrect.
Inevitable, AFAICS.

> Doing a "df" on BSD and Linux now match on the counts including the
> "Available" counts.
> 
> It might be worth testing with ufs filesystems using softdep and/or
> journaling.  Should the Linux mount command reject such filesystems?

Depends...  Do you mean rejecting mount of fs with pending log replay?

> Now that ufs write access is working more or less, we're dangerous.

*snort*
It's marked experimental for a very good reason.  The stuff in #ufs-fixes
deals with fairly obvious bugs and with those the thing is reasonably
well-behaved on xfstests, but it needs a lot more testing (and probably
somebody doing journalling side of things as well) before it can be
considered fully safe for sharing with *BSD.

And I'm still not entirely convinced that mixing tail-unpacking, truncate
and mmap works correctly.  FWIW, switch to iomap-based approach would
probably deal with all of that nicely, but that's well beyond "obvious
bugfixes"; definitely not this cycle fodder.

I'm not even sure if #ufs-fixes is OK for this stage in the cycle - it
is local to fs/ufs, doesn't affect anything else and fixes fairly
obvious bugs, but we are nearly at -rc6.  Pull request for that would
be as below, but it's really up to Linus (and Evgeniy, if he is not
completely MIA) whether to pull it now or wait until the next window.
Linus, what do you think of that?  I'm honestly not sure...

As for my immediate plans, I'll look into the two failing tests,
but any further active work on ufs will have to wait for the next
cycle.  It had been a fun couple of weeks, but I have more than
enough other stuff to deal with.  And I would still very much prefer
for somebody to adopt that puppy.

----
Fix assorted ufs bugs; a couple of deadlocks, fs corruption in truncate(),
oopsen on tail unpacking and truncate when racing with vmscan, mild fs
corruption (free blocks stats summary buggered, *BSD fsck would
complain and fix), several instances of broken logics around reserved
blocks (starting with "check almost never triggers when it should" and
then there are issues with sufficiently large UFS2).

The following changes since commit 67a70017fa0a152657bc7e337e69bb9c9f5549bf:

  ufs: we need to sync inode before freeing it (2017-06-10 12:02:28 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git ufs-fixes

for you to fetch changes up to a8fad984833832d5ca11a9ed64ddc55646da30e3:

  ufs_truncate_blocks(): fix the case when size is in the last direct block (2017-06-15 03:57:46 -0400)

----------------------------------------------------------------
Al Viro (8):
      ufs: fix logics in "ufs: make fsck -f happy"
      ufs: make ufs_freespace() return signed
      ufs: fix reserved blocks check
      ufs: fix s_size/s_dsize users
      ufs_get_locked_page(): make sure we have buffer_heads
      ufs: avoid grabbing ->truncate_mutex if possible
      ufs: more deadlock prevention on tail unpacking
      ufs_truncate_blocks(): fix the case when size is in the last direct block

 fs/ufs/balloc.c | 22 ++++++++++++--------
 fs/ufs/inode.c  | 47 ++++++++++++++++++++++++++++--------------
 fs/ufs/super.c  | 64 +++++++++++++++++++++++++++++++++++----------------------
 fs/ufs/ufs_fs.h |  7 +++----
 fs/ufs/util.c   | 17 ++++++++-------
 fs/ufs/util.h   |  9 ++------
 6 files changed, 98 insertions(+), 68 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-17  2:15                       ` Al Viro
@ 2017-06-18  1:09                         ` Al Viro
  2017-06-18 20:45                           ` Richard Narron
  2017-06-20  5:17                           ` [git pull] " Al Viro
  0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2017-06-18  1:09 UTC (permalink / raw)
  To: Richard Narron; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Sat, Jun 17, 2017 at 03:15:48AM +0100, Al Viro wrote:
> On Fri, Jun 16, 2017 at 07:29:00AM -0700, Richard Narron wrote:
> 
> > The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5.
> > They seem to work fine with the simple testing that I do.
> > 
> > I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2
> > filesystems, 44bsd (ufs1) and ufs2.
> > I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm
> > large file, rmdir and BSD fsck in any of the 6 combinations.
> 
> FWIW, with xfstests I see the following:
> 	* _require_metadata_journaling needs to be told not to bother with
> our ufs
> 	* generic/{409,410,411} lack $MOUNT_OPTIONS in several invocations
> of mount -t $FSTYP; for ufs it's obviosuly a problem.  Trivial bug in tests,
> fixing it makes them pass.
> 	* generic/258 (timestamp wraparound) fails; fs is left intact

Trivially fixed (cast to (signed) in ufs1_read_inode(), similar to what
other filesystems with 32bit timestamps are doing); ufs2 has no problem
at all)

> 	* generic/426 (fhandle stuff) fails and buggers the filesystem
> Everything else passes with no fs corruption that could be detected by
> fsck.ufs.

Also trivially fixed - it's a self-inflicted wound.  Just have zero nlink in
ufs{1,2}_read_inode() fail with -ESTALE instead of triggering ufs_error().

> As for my immediate plans, I'll look into the two failing tests,
> but any further active work on ufs will have to wait for the next
> cycle.  It had been a fun couple of weeks, but I have more than
> enough other stuff to deal with.  And I would still very much prefer
> for somebody to adopt that puppy.

Another piece of fun spotted: the logics for switching between two allocation
policies when relocating a packed tail that can't be expanded in place had
been b0rken since typo in 2.4.14.7 - switch back from OPTTIME to OPTSPACE
had been screwed by this:
-               usb1->fs_optim = SWAB32(UFS_OPTSPACE);
+               usb1->fs_optim = cpu_to_fs32(sb, UFS_OPTTIME);

And fragmentation levels for switching back and force really ought to be
calculated at mount time.  Another (minor) issue is mentioned in this
commit message from Kirck McKusick back in 1995:
        The threshold for switching from time-space and space-time is too small
        when minfree is 5%...so make it stay at space in this case.
Not that minfree at 5% had been frequently seen - default has never been that
low (back in 4.2BSD it was 10%, these days it's 8%)

Resulting kernel passes xfstests clean and now I'm definitely done with UFS for
this cycle.  Linus, in case you want to pull that sucker, pull request would
be as below:

The following changes since commit a8fad984833832d5ca11a9ed64ddc55646da30e3:

  ufs_truncate_blocks(): fix the case when size is in the last direct block (2017-06-15 03:57:46 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git ufs-fixes

for you to fetch changes up to 77e9ce327d9b607cd6e57c0f4524a654dc59c4b1:

  ufs: fix the logics for tail relocation (2017-06-17 17:22:42 -0400)

----------------------------------------------------------------
Al Viro (3):
      fix signedness of timestamps on ufs1
      ufs_iget(): fail with -ESTALE on deleted inode
      ufs: fix the logics for tail relocation

 fs/ufs/balloc.c | 22 ++++++----------------
 fs/ufs/inode.c  | 27 +++++++++++----------------
 fs/ufs/super.c  |  9 +++++++++
 fs/ufs/ufs_fs.h |  2 ++
 4 files changed, 28 insertions(+), 32 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [git pull] first batch of ufs fixes
  2017-06-18  1:09                         ` Al Viro
@ 2017-06-18 20:45                           ` Richard Narron
  2017-06-20  5:17                           ` [git pull] " Al Viro
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Narron @ 2017-06-18 20:45 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Sun, 18 Jun 2017, Al Viro wrote:

> On Sat, Jun 17, 2017 at 03:15:48AM +0100, Al Viro wrote:
>> On Fri, Jun 16, 2017 at 07:29:00AM -0700, Richard Narron wrote:
>>
>>> The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5.
>>> They seem to work fine with the simple testing that I do.
>>>
>>> I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2
>>> filesystems, 44bsd (ufs1) and ufs2.
>>> I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm
>>> large file, rmdir and BSD fsck in any of the 6 combinations.
>>
>> FWIW, with xfstests I see the following:
>> 	* _require_metadata_journaling needs to be told not to bother with
>> our ufs
>> 	* generic/{409,410,411} lack $MOUNT_OPTIONS in several invocations
>> of mount -t $FSTYP; for ufs it's obviosuly a problem.  Trivial bug in tests,
>> fixing it makes them pass.
>> 	* generic/258 (timestamp wraparound) fails; fs is left intact
>
> Trivially fixed (cast to (signed) in ufs1_read_inode(), similar to what
> other filesystems with 32bit timestamps are doing); ufs2 has no problem
> at all)
>
>> 	* generic/426 (fhandle stuff) fails and buggers the filesystem
>> Everything else passes with no fs corruption that could be detected by
>> fsck.ufs.
>
> Also trivially fixed - it's a self-inflicted wound.  Just have zero nlink in
> ufs{1,2}_read_inode() fail with -ESTALE instead of triggering ufs_error().
>
>> As for my immediate plans, I'll look into the two failing tests,
>> but any further active work on ufs will have to wait for the next
>> cycle.  It had been a fun couple of weeks, but I have more than
>> enough other stuff to deal with.  And I would still very much prefer
>> for somebody to adopt that puppy.
>
> Another piece of fun spotted: the logics for switching between two allocation
> policies when relocating a packed tail that can't be expanded in place had
> been b0rken since typo in 2.4.14.7 - switch back from OPTTIME to OPTSPACE
> had been screwed by this:
> -               usb1->fs_optim = SWAB32(UFS_OPTSPACE);
> +               usb1->fs_optim = cpu_to_fs32(sb, UFS_OPTTIME);
>
> And fragmentation levels for switching back and force really ought to be
> calculated at mount time.  Another (minor) issue is mentioned in this
> commit message from Kirck McKusick back in 1995:
>        The threshold for switching from time-space and space-time is too small
>        when minfree is 5%...so make it stay at space in this case.
> Not that minfree at 5% had been frequently seen - default has never been that
> low (back in 4.2BSD it was 10%, these days it's 8%)
>
> Resulting kernel passes xfstests clean and now I'm definitely done with UFS for
> this cycle.  Linus, in case you want to pull that sucker, pull request would
> be as below:
>
> The following changes since commit a8fad984833832d5ca11a9ed64ddc55646da30e3:
>
>  ufs_truncate_blocks(): fix the case when size is in the last direct block (2017-06-15 03:57:46 -0400)
>
> are available in the git repository at:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git ufs-fixes
>
> for you to fetch changes up to 77e9ce327d9b607cd6e57c0f4524a654dc59c4b1:
>
>  ufs: fix the logics for tail relocation (2017-06-17 17:22:42 -0400)
>
> ----------------------------------------------------------------
> Al Viro (3):
>      fix signedness of timestamps on ufs1
>      ufs_iget(): fail with -ESTALE on deleted inode
>      ufs: fix the logics for tail relocation
>
> fs/ufs/balloc.c | 22 ++++++----------------
> fs/ufs/inode.c  | 27 +++++++++++----------------
> fs/ufs/super.c  |  9 +++++++++
> fs/ufs/ufs_fs.h |  2 ++
> 4 files changed, 28 insertions(+), 32 deletions(-)
>

I just tested these 3 patches along with the earlier 8 patches against 
Linux 4.12-rc5 and they look fine. All 6 test cases look good.

The ufs code is much better now than it was before all these patches.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [git pull] ufs fixes
  2017-06-18  1:09                         ` Al Viro
  2017-06-18 20:45                           ` Richard Narron
@ 2017-06-20  5:17                           ` Al Viro
  1 sibling, 0 replies; 19+ messages in thread
From: Al Viro @ 2017-06-20  5:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

	More UFS fixes, unfortunately including build regression fix for the
64bit s_dsize commit.  Fixed in this pile:
	* trivial bug in signedness of 32bit timestamps on ufs1
	* ESTALE instead of ufs_error() when doing open-by-fhandle on
something deleted
	* build regression on 32bit in ufs_new_fragments() - calculating
that many percents of u64 pulls libgcc stuff on some of those.  Mea culpa.
	* fix hysteresis loop broken by typo in 2.4.14.7 (right next to
the location of previous bug).
	* fix the insane limits of said hysteresis loop on filesystems with
very low percentage of reserved blocks.  If it's 5% or less, just use the
OPTSPACE policy.
	* calculate those limits once and mount time.

I can separate and send just the build regression part if you wish.  I would
rather avoid that, though - mul_64_32_div() called again and again, to
calculate the same value... ugh.

That tree does pass xfstests clean (both ufs1 and ufs2) and it _does_ survive
cross-builds.  Again, my apologies for missing that, especially since I have
noticed a related percentage-of-64bit issue in earlier patches (when dealing
with amount of reserved blocks).  Self-LART applied...

The following changes since commit a8fad984833832d5ca11a9ed64ddc55646da30e3:

  ufs_truncate_blocks(): fix the case when size is in the last direct block (2017-06-15 03:57:46 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git ufs-fixes

for you to fetch changes up to 77e9ce327d9b607cd6e57c0f4524a654dc59c4b1:

  ufs: fix the logics for tail relocation (2017-06-17 17:22:42 -0400)

----------------------------------------------------------------
Al Viro (3):
      fix signedness of timestamps on ufs1
      ufs_iget(): fail with -ESTALE on deleted inode
      ufs: fix the logics for tail relocation

 fs/ufs/balloc.c | 22 ++++++----------------
 fs/ufs/inode.c  | 27 +++++++++++----------------
 fs/ufs/super.c  |  9 +++++++++
 fs/ufs/ufs_fs.h |  2 ++
 4 files changed, 28 insertions(+), 32 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-06-20  5:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-09 21:38 [git pull] first batch of ufs fixes Al Viro
2017-06-10 13:03 ` Richard Narron
2017-06-10 16:07   ` Al Viro
2017-06-10 18:08     ` Al Viro
2017-06-10 18:12       ` Linus Torvalds
2017-06-11 19:47       ` Richard Narron
2017-06-11 21:30         ` Al Viro
2017-06-12  6:14         ` Al Viro
2017-06-13  0:54           ` Richard Narron
2017-06-13  1:43             ` Al Viro
2017-06-13 21:56               ` Richard Narron
2017-06-14  7:11                 ` Al Viro
2017-06-14 20:33                   ` Richard Narron
2017-06-15  8:00                   ` Al Viro
2017-06-16 14:29                     ` Richard Narron
2017-06-17  2:15                       ` Al Viro
2017-06-18  1:09                         ` Al Viro
2017-06-18 20:45                           ` Richard Narron
2017-06-20  5:17                           ` [git pull] " Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).