linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
       [not found]         ` <18384.50909.866848.966192@notabene.brown>
@ 2008-03-07  5:55           ` Adam Schrotenboer
  2008-03-12 17:55             ` Adam Schrotenboer
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Schrotenboer @ 2008-03-07  5:55 UTC (permalink / raw)
  To: Neil Brown
  Cc: Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel, jesper.juhl

[-- Attachment #1: Type: text/plain, Size: 2260 bytes --]

Neil Brown wrote:
> On Wednesday March 5, adam@m2000.com wrote:
>   
>> Neil Brown wrote:
>>     
>>> On Wednesday March 5, adam@m2000.com wrote:
>>>   
>>> Where :-)
>>>
>>> A excerpt from the logs to show actual error messages could be
>>> helpful.
>>>
>>> Thanks,
>>> NeilBrown
>>>       
>> sorry, must have missed the paste
>>
>> Feb 28 17:27:58 koi kernel: nfs_update_inode: inode 3221350147 mode 
>> changed, 0040755 to 0100644
>> Feb 28 21:57:22 koi kernel: nfs_update_inode: inode 2149056279 mode 
>> changed, 0040755 to 0100755
>> Mar  5 01:43:21 koi kernel: nfs_update_inode: inode 3222473680 mode 
>> changed, 0040755 to 0100644
>> Mar  5 08:52:19 koi kernel: nfs_update_inode: inode 3222473568 mode 
>> changed, 0120777 to 0100644
>> Mar  5 15:00:01 koi kernel: nfs_update_inode: inode 3222473569 mode 
>> changed, 0120777 to 0100644
>> Mar  5 15:00:31 koi kernel: nfs_update_inode: inode 3222473674 mode 
>> changed, 0120777 to 0100644
>> Mar  5 15:00:31 koi kernel: nfs_update_inode: inode 3222473675 mode 
>> changed, 0120777 to 0100644
>> Mar  5 15:00:31 koi kernel: nfs_update_inode: inode 3222473672 mode 
>> changed, 0040755 to 0100644
>> Mar  5 15:00:31 koi kernel: nfs_update_inode: inode 3222473673 mode 
>> changed, 0120777 to 0100644
>>
>>
>>     
> Hmmm...
> Directories and symlinks "changing" to regular files.
>
> 32bit inode numbers with the high bit set.
>
> Random times, but a burst at 3pm one afternoon.
>
> The fact that it happens so rarely make is very hard to collect more
> data....
>
>
> What happens when a user experiences an error? 
well, sometimes it shows up as a read/write error (that is the error 
code returned by the application).
SVN tend to be more verbose (although I have no examples)
>  What sort of file is
> the error reported against?  
It does not to my knowledge only happen to certain classes of files, 
although it does seem to happen most often when lots of files are being 
created.
> Does the file remain inaccessible or does
> it start working again soon?  
No data, maybe Thomas knows something.
> Is the same file accessible from another
> client?
>   
No data, esp as it never happens to me personally.
> NeilBrown
>   



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-07  5:55           ` [opensuse] nfs_update_inode: inode X mode changed, Y to Z Adam Schrotenboer
@ 2008-03-12 17:55             ` Adam Schrotenboer
  2008-03-12 18:08               ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Schrotenboer @ 2008-03-12 17:55 UTC (permalink / raw)
  To: Neil Brown
  Cc: Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
	jesper.juhl, Fred Revenu

[-- Attachment #1: Type: text/plain, Size: 7557 bytes --]


Another excerpt from last night.

Mar 12 00:53:16 wanda kernel: nfs: server 192.168.1.27 not responding, 
still trying
Mar 12 00:53:18 wanda kernel: nfs: server 192.168.1.27 OK
Mar 12 00:54:29 wanda kernel: nfs_update_inode: inode 1075049148 mode 
changed, 0040755 to 0100644
Mar 12 00:54:49 wanda kernel: nfs_update_inode: inode 1611920610 mode 
changed, 0100644 to 0040755
Mar 12 00:54:49 wanda kernel: nfs_update_inode: inode 1611920612 mode 
changed, 0040755 to 0100644
Mar 12 00:54:50 wanda kernel: nfs_update_inode: inode 2685631887 mode 
changed, 0100644 to 0040755
Mar 12 00:54:50 wanda kernel: nfs_update_inode: inode 3222506021 mode 
changed, 0100644 to 0040755
Mar 12 00:54:55 wanda kernel: nfs_update_inode: inode 1372131 mode 
changed, 0100644 to 0040755
Mar 12 00:55:25 wanda kernel: nfs_update_inode: inode 3759413120 mode 
changed, 0040755 to 0100644
Mar 12 00:56:05 wanda kernel: nfs_update_inode: inode 1075051607 mode 
changed, 0100644 to 0040755
Mar 12 00:56:05 wanda kernel: nfs_update_inode: inode 1075051608 mode 
changed, 0040755 to 0100644
Mar 12 00:56:05 wanda kernel: nfs_update_inode: inode 1075051612 mode 
changed, 0100644 to 0040755
Mar 12 00:56:05 wanda kernel: nfs_update_inode: inode 1075051613 mode 
changed, 0040755 to 0100644
Mar 12 00:56:06 wanda kernel: nfs_update_inode: inode 1075051616 mode 
changed, 0100644 to 0040755
Mar 12 00:56:06 wanda kernel: nfs_update_inode: inode 1075051617 mode 
changed, 0040755 to 0100644
Mar 12 00:56:09 wanda kernel: nfs_update_inode: inode 1075051624 mode 
changed, 0100644 to 0040755
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926434 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926438 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926442 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926446 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926450 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926454 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926458 mode 
changed, 0040755 to 0100644
Mar 12 01:00:33 wanda kernel: nfs_update_inode: inode 1611926462 mode 
changed, 0040755 to 0100644
Mar 12 01:00:34 wanda kernel: nfs_update_inode: inode 1611926463 mode 
changed, 0100644 to 0040755
Mar 12 09:36:46 wanda kernel: nfs_update_inode: inode 538186966 mode 
changed, 0100644 to 0040755
Mar 12 09:36:46 wanda kernel: nfs_update_inode: inode 538186967 mode 
changed, 0040755 to 0100644
Mar 12 09:36:46 wanda kernel: nfs_update_inode: inode 538186970 mode 
changed, 0100644 to 0040755
Mar 12 09:36:46 wanda kernel: nfs_update_inode: inode 538186972 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186974 mode 
changed, 0100644 to 0040755
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186977 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186979 mode 
changed, 0100644 to 0040755
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186982 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186984 mode 
changed, 0100644 to 0040755
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186987 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186989 mode 
changed, 0100644 to 0040755
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186992 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186994 mode 
changed, 0100644 to 0040755
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186997 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538186999 mode 
changed, 0100644 to 0040755
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538187002 mode 
changed, 0040755 to 0100644
Mar 12 09:36:47 wanda kernel: nfs_update_inode: inode 538187004 mode 
changed, 0100644 to 0040755

It keeps going...

Another machine, different times.

Mar 12 03:35:19 minnow kernel: nfs_update_inode: inode 2685628653 mode 
changed, 0100644 to 0040755
Mar 12 03:38:01 minnow kernel: nfs_update_inode: inode 2685628658 mode 
changed, 0100644 to 0040755
Mar 12 03:38:04 minnow kernel: nfs_update_inode: inode 2685628707 mode 
changed, 0100644 to 0040755
Mar 12 03:38:04 minnow kernel: nfs_update_inode: inode 2685628715 mode 
changed, 0100644 to 0040755
Mar 12 03:38:04 minnow kernel: nfs_update_inode: inode 2685628719 mode 
changed, 0100644 to 0040755
Mar 12 03:38:04 minnow kernel: nfs_update_inode: inode 2685628731 mode 
changed, 0100644 to 0040755
Mar 12 03:38:05 minnow kernel: nfs_update_inode: inode 3222503513 mode 
changed, 0100644 to 0040755
Mar 12 03:38:05 minnow kernel: nfs_update_inode: inode 2685628818 mode 
changed, 0100644 to 0040755
Mar 12 03:38:06 minnow kernel: nfs_update_inode: inode 2685628842 mode 
changed, 0100644 to 0040755
Mar 12 03:38:06 minnow kernel: nfs_update_inode: inode 2685628846 mode 
changed, 0100644 to 0040755
Mar 12 03:38:06 minnow kernel: nfs_update_inode: inode 2685628851 mode 
changed, 0100644 to 0040755
Mar 12 03:38:06 minnow kernel: nfs_update_inode: inode 2685628856 mode 
changed, 0100644 to 0040755
Mar 12 03:38:07 minnow kernel: nfs_update_inode: inode 3222503585 mode 
changed, 0100644 to 0040755
Mar 12 03:38:09 minnow kernel: nfs_update_inode: inode 2685628923 mode 
changed, 0100644 to 0040755
Mar 12 03:38:09 minnow kernel: nfs_update_inode: inode 2685628927 mode 
changed, 0100644 to 0040755
Mar 12 04:00:48 minnow kernel: nfs_update_inode: inode 2149158555 mode 
changed, 0100644 to 0040755
Mar 12 04:00:51 minnow kernel: nfs_update_inode: inode 1611926234 mode 
changed, 0040755 to 0100644
Mar 12 04:01:04 minnow kernel: nfs_update_inode: inode 1611926235 mode 
changed, 0100644 to 0040755
Mar 12 04:01:04 minnow kernel: nfs_update_inode: inode 1611926238 mode 
changed, 0040755 to 0100644
Mar 12 04:01:04 minnow kernel: nfs_update_inode: inode 1611926240 mode 
changed, 0100644 to 0040755
Mar 12 04:01:04 minnow kernel: nfs_update_inode: inode 1611926243 mode 
changed, 0040755 to 0100644

I have other machines it has occurred to as well. The only usage pattern 
I have is that it happened mostly while running Mentor's Precision, but 
I am doubting that that is relevant.

This is the line from /proc/mounts on Minnow:

192.168.1.27:/mnt/storage0/users /home/users nfs 
rw,v3,rsize=32768,wsize=32768,hard,lock,proto=tcp,addr=192.168.1.27 0 0

192.168.1.27 is our machine dolphin, which is running OpenSuSE 10.2 
x86_64 on  Intel(R) Xeon(R) CPU E5310 @ 1.60GHz with 4G of RAM, and is 
an XFS file system running on a Dell PERC5/i
adam@dolphin:~$ df -h /mnt/storage0/
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             2.8T  2.0T  807G  72% /mnt/storage0
from /etc/fstab
/dev/disk/by-label/storage0 /mnt/storage0 xfs rw,usrquota,grpquota 0 0
It is not running with inode64, but probably should be. Hasn't become a 
problem yet. Otoh, I understand that NFS v3 doesn't support 64bit inode 
numbers.
FYI almost all of our NFS clients are x86_64 Linux(except 2)

I have CC:d Fred Revenu, who was the person who brought it to my 
attention this morning that it had happened again last night.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-12 17:55             ` Adam Schrotenboer
@ 2008-03-12 18:08               ` Trond Myklebust
  2008-03-12 18:16                 ` Adam Schrotenboer
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2008-03-12 18:08 UTC (permalink / raw)
  To: Adam Schrotenboer
  Cc: Neil Brown, linux-kernel, linux-nfs, Thomas Daniel, jesper.juhl,
	Fred Revenu


On Wed, 2008-03-12 at 10:55 -0700, Adam Schrotenboer wrote:
> Another excerpt from last night.
> 
> Mar 12 00:53:16 wanda kernel: nfs: server 192.168.1.27 not responding, 
> still trying
> Mar 12 00:53:18 wanda kernel: nfs: server 192.168.1.27 OK
> Mar 12 00:54:29 wanda kernel: nfs_update_inode: inode 1075049148 mode 
> changed, 0040755 to 0100644

Hang on. That does not look like an XID collision problem...

That code path basically means that the fileids/inode numbers match,
which would not be the case if we were talking about an XID collision
causing the reply cache to replay an old request...

That message rather symptomatic of a filehandle reuse problem. In other
words a NFS filehandle appears to be reused to label a regular file
after it has been used for a directory with the same fileid/inode
number. That is a definite server bug.

Are you sure that you are using the regular kernel nfs server?

-- 
Trond Myklebust
NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-12 18:08               ` Trond Myklebust
@ 2008-03-12 18:16                 ` Adam Schrotenboer
  0 siblings, 0 replies; 10+ messages in thread
From: Adam Schrotenboer @ 2008-03-12 18:16 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Neil Brown, linux-kernel, linux-nfs, Thomas Daniel, jesper.juhl,
	Fred Revenu

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

Trond Myklebust wrote:
> On Wed, 2008-03-12 at 10:55 -0700, Adam Schrotenboer wrote:
>   
>> Another excerpt from last night.
>>
>> Mar 12 00:53:16 wanda kernel: nfs: server 192.168.1.27 not responding, 
>> still trying
>> Mar 12 00:53:18 wanda kernel: nfs: server 192.168.1.27 OK
>> Mar 12 00:54:29 wanda kernel: nfs_update_inode: inode 1075049148 mode 
>> changed, 0040755 to 0100644
>>     
>
> Hang on. That does not look like an XID collision problem...
>
> That code path basically means that the fileids/inode numbers match,
> which would not be the case if we were talking about an XID collision
> causing the reply cache to replay an old request...
>
> That message rather symptomatic of a filehandle reuse problem. In other
> words a NFS filehandle appears to be reused to label a regular file
> after it has been used for a directory with the same fileid/inode
> number. That is a definite server bug.
>
> Are you sure that you are using the regular kernel nfs server?
>
>   
Quite. OpenSuSE 10.2 stock unmodified kernel.

dolphin:~ # uname -a
Linux dolphin 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 
x86_64 x86_64 x86_64 GNU/Linux

dolphin:~ # top -bn1 |grep nfsd


  PID  PPID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3916     1 root      15   0     0    0    0 S    0  0.0   3:04.42 nfsd
 3942     1 root      15   0     0    0    0 S    0  0.0   3:11.88 nfsd
(there's quite a bit more of these threads than this (32), but they all 
look the same in top or ps)



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-26  5:02                       ` David Chinner
@ 2008-04-17 19:37                         ` Adam Schrotenboer
  0 siblings, 0 replies; 10+ messages in thread
From: Adam Schrotenboer @ 2008-04-17 19:37 UTC (permalink / raw)
  To: David Chinner
  Cc: Josef 'Jeff' Sipek, NeilBrown, J. Bruce Fields, xfs,
	Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs,
	Thomas Daniel, Frederic Revenu, Jeff Doan

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

David Chinner wrote:
> On Wed, Mar 26, 2008 at 02:37:38PM +1100, David Chinner wrote:
>   
>> Given this state of affairs (i.e. HSM using ikeep), I guess we can do
>> anything we want for the noikeep case. I'll cook up a patch that does
>> something similar to ext3 generation numbers for the initial seeding....
>>     
>
> Patch below for comments. It passes xfsqa, but there's no userspace
> support for it yet. 2.6.26 is the likely target for this change.
>   
2.6.26 merge window begins now. Has this been pushed yet? Is it in 
linux-next tree ?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-26  3:37                     ` David Chinner
@ 2008-03-26  5:02                       ` David Chinner
  2008-04-17 19:37                         ` Adam Schrotenboer
  0 siblings, 1 reply; 10+ messages in thread
From: David Chinner @ 2008-03-26  5:02 UTC (permalink / raw)
  To: David Chinner
  Cc: Josef 'Jeff' Sipek, NeilBrown, J. Bruce Fields, xfs,
	Adam Schrotenboer, Jesper Juhl, Trond Myklebust, linux-kernel,
	linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan

On Wed, Mar 26, 2008 at 02:37:38PM +1100, David Chinner wrote:
> Given this state of affairs (i.e. HSM using ikeep), I guess we can do
> anything we want for the noikeep case. I'll cook up a patch that does
> something similar to ext3 generation numbers for the initial seeding....

Patch below for comments. It passes xfsqa, but there's no userspace
support for it yet. 2.6.26 is the likely target for this change.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---

Don't initialise new inode generation numbers to zero

When we allocation new inode chunks, we initialise the generation
numbers to zero. This works fine until we delete a chunk and then
reallocate it, resulting in the same inode numbers but with a
reset generation count. This can result in inode/generation
pairs of different inodes occurring relatively close together.

Given that the inode/gen pair makes up the "unique" portion of
an NFS filehandle on XFS, this can result in file handles cached
on clients being seen on the wire from the server but refer to
a different file. This causes .... issues for NFS clients.

Hence we need a unique generation number initialisation for
each inode to prevent reuse of a small portion of the generation
number space. Make this initialiser per-allocation group so
that it is not a single point of contention in the filesystem,
and increment it on every allocation within an AG to reduce the
chance that a generation number is reused for a given inode number
if the inode chunk is deleted and reallocated immediately
afterwards.

It is safe to add the agi_newinogen field to the AGI without
using a feature bit. If an older kernel is used, it simply
will not update the field on allocation. If the kernel is
updated and the field has garbage in it, then it's like having a
random seed to the generation number....

Signed-off-by: Dave Chinner <dgc@sgi.com>
---
 fs/xfs/xfs_ag.h     |    4 +++-
 fs/xfs/xfs_ialloc.c |   30 ++++++++++++++++++++++--------
 2 files changed, 25 insertions(+), 9 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h	2008-01-18 18:30:06.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h	2008-03-26 13:03:41.122918236 +1100
@@ -121,6 +121,7 @@ typedef struct xfs_agi {
 	 * still being referenced.
 	 */
 	__be32		agi_unlinked[XFS_AGI_UNLINKED_BUCKETS];
+	__be32		agi_newinogen;	/* inode cluster generation */
 } xfs_agi_t;
 
 #define	XFS_AGI_MAGICNUM	0x00000001
@@ -134,7 +135,8 @@ typedef struct xfs_agi {
 #define	XFS_AGI_NEWINO		0x00000100
 #define	XFS_AGI_DIRINO		0x00000200
 #define	XFS_AGI_UNLINKED	0x00000400
-#define	XFS_AGI_NUM_BITS	11
+#define	XFS_AGI_NEWINOGEN	0x00000800
+#define	XFS_AGI_NUM_BITS	12
 #define	XFS_AGI_ALL_BITS	((1 << XFS_AGI_NUM_BITS) - 1)
 
 /* disk block (xfs_daddr_t) in the AG */
Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c	2008-03-25 15:41:27.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c	2008-03-26 14:29:47.998554368 +1100
@@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc(
 			free = XFS_MAKE_IPTR(args.mp, fbuf, i);
 			free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
 			free->di_core.di_version = version;
+			free->di_core.di_gen = agi->agi_newinogen;
+			be32_add_cpu(&agi->agi_newinogen, 1);
 			free->di_next_unlinked = cpu_to_be32(NULLAGINO);
 			xfs_ialloc_log_di(tp, fbuf, i,
 				XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
@@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc(
 	 * Log allocation group header fields
 	 */
 	xfs_ialloc_log_agi(tp, agbp,
-		XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO);
+		XFS_AGI_COUNT | XFS_AGI_FREECOUNT |
+		XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN);
 	/*
 	 * Modify/log superblock values for inode count and inode free count.
 	 */
@@ -896,11 +899,12 @@ nextag:
 	ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset);
 	XFS_INOBT_CLR_FREE(&rec, offset);
 	rec.ir_freecount--;
+	be32_add_cpu(&agi->agi_newinogen, 1);
 	if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount,
 			rec.ir_free)))
 		goto error0;
 	be32_add(&agi->agi_freecount, -1);
-	xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT);
+	xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN);
 	down_read(&mp->m_peraglock);
 	mp->m_perag[tagno].pagi_freecount--;
 	up_read(&mp->m_peraglock);
@@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels(
 
 /*
  * Log specified fields for the ag hdr (inode section)
+ *
+ * We don't log the unlinked inode fields through here; they
+ * get logged directly to the buffer. Hence we have a discontinuity
+ * in the fields we are logging and we need two calls to map all
+ * the dirtied parts of the agi....
  */
 void
 xfs_ialloc_log_agi(
@@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi(
 		offsetof(xfs_agi_t, agi_newino),
 		offsetof(xfs_agi_t, agi_dirino),
 		offsetof(xfs_agi_t, agi_unlinked),
+		offsetof(xfs_agi_t, agi_newinogen),
 		sizeof(xfs_agi_t)
 	};
+	int			log_newino = fields & XFS_AGI_NEWINOGEN;
+
 #ifdef DEBUG
 	xfs_agi_t		*agi;	/* allocation group header */
 
 	agi = XFS_BUF_TO_AGI(bp);
 	ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC);
 #endif
-	/*
-	 * Compute byte offsets for the first and last fields.
-	 */
+	fields &= ~XFS_AGI_NEWINOGEN;
+
+	/* Compute byte offsets for the first and last fields.  */
 	xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last);
-	/*
-	 * Log the allocation group inode header buffer.
-	 */
 	xfs_trans_log_buf(tp, bp, first, last);
+	if (log_newino) {
+		xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS,
+					&first, &last);
+		xfs_trans_log_buf(tp, bp, first, last);
+	}
 }
 
 /*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-25 22:13                   ` Josef 'Jeff' Sipek
  2008-03-25 23:09                     ` NeilBrown
@ 2008-03-26  3:37                     ` David Chinner
  2008-03-26  5:02                       ` David Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: David Chinner @ 2008-03-26  3:37 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek
  Cc: NeilBrown, J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
	Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
	Frederic Revenu, Jeff Doan

On Tue, Mar 25, 2008 at 06:13:21PM -0400, Josef 'Jeff' Sipek wrote:
> On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote:
> ...
> > However you still need to do something about the generation number.  It
> > must be set to something.
.....
> > Even better would be store store that 'next generation number' in the
> > superblock so there would be even less risk of the 'random' generation
> > producing repeats.
> > This is what ext3 does.  It doesn't dynamically allocate inodes,
> > but it doesn't want to pay the cost of reading an old inode from
> > storage just to see what the generation number is.  So it has
> > a number in the superblock which is incremented on each inode allocation
> > and is used as the generation number.
>  
> Something tells me that the SGI folks might not be all too happy with the
> in-sb number...
.....
> Perhaps a per-ag variable would be better,

/me goes back to the bug from last year about stable inode/gen numbers
for a HSM.

dgc> Right, except the last thing we want is yet more global state needing to
dgc> be updated in inode allocation. The best way to do this is a max generation
dgc> number per AG (held in the AGI) so that it can be updated at the same time
dgc> inodes are freed and not cause additional serialisation.

Which was soundly rejected by the HSM folk because it wraps at 4 billion
inode create/unlink cycles in an AG rather than per inode. The only thing
they were happy with was the old behaviour and so they now mount their
filesystems with ikeep. At that point the issue was dropped on the floor;
the NFS side of things apparently weren't causing any problems so we didn't
consider it urgent to fix....

Given this state of affairs (i.e. HSM using ikeep), I guess we can do
anything we want for the noikeep case. I'll cook up a patch that does
something similar to ext3 generation numbers for the initial seeding....

> but I remember reading that parallelizing updates
> to some inode count variable (I forget which) in the superblock
> \cite{dchinner-ols2006} led to a rather big improvement.

That was for in memory counters not on disk, and the problem really was
free block counts rather than free inode counts. Yes, I converted the
inode counters at the same time, but that wasn't the limiting factor.
Updates to the on disk superblock, OTOH, are a limiting factor and
that was the lazy superblock counter modifications solve....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
  2008-03-25 22:13                   ` Josef 'Jeff' Sipek
@ 2008-03-25 23:09                     ` NeilBrown
  2008-03-26  3:37                     ` David Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: NeilBrown @ 2008-03-25 23:09 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek
  Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
	Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
	Frederic Revenu, Jeff Doan

On Wed, March 26, 2008 9:13 am, Josef 'Jeff' Sipek wrote:
> On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote:
> ...
>> However you still need to do something about the generation number.  It
>> must be set to something.
>
> Right.
>
>> When you allocate an inode that doesn't currently exist on the device,
>> you obviously cannot increment the old value and use that.
>
> Makes sense.
>
>> However you can do a lot better than always using 0.
>
> I looked at the code (xfs_ialloc.c:xfs_ialloc_ag_alloc)
>
>  290                 /*
>  291                  * Set initial values for the inodes in this buffer.
>  292                  */
>  293                 xfs_biozero(fbuf, 0, ninodes <<
> args.mp->m_sb.sb_inodelog);
>  294                 for (i = 0; i < ninodes; i++) {
>  295                         free = XFS_MAKE_IPTR(args.mp, fbuf, i);
>  296                         free->di_core.di_magic =
> cpu_to_be16(XFS_DINODE_MAGIC);
>  297                         free->di_core.di_version = version;
>  298                         free->di_next_unlinked =
> cpu_to_be32(NULLAGINO);
>  299                         xfs_ialloc_log_di(tp, fbuf, i,
>  300                                 XFS_DI_CORE_BITS |
> XFS_DI_NEXT_UNLINKED);
>  301                 }
>
> xfs_biozero(...) turns into a memset(buf, 0, len), and since the loop that
> follows doesn't change the generation number, it'll stay 0.
>
>> The simplest would be to generate a 'random' number (get_random_bytes).
>> Slightly better would be to generate a random number at boot time
>> and use that, incrementing it each time it is used to set the
>> generation number for an inode.
>
> I'm not familiar enough with NFS, do you want something that's
> monotonically
> increasing or do you just test for inequality?  If it is inequality, why
> not
> just use something like the jiffies - that should be unique enough.
>

What we need is for the "filehandle" to be stable and unique.
By 'stable' I mean that every time I get the filehandle for a particular
file, I get the same string of bytes.
By 'uniqie' I mean that if I get two filehandles for two different
files, they must differ in at least one bit.
If a file is deleted and the inode is re-used for a new file, then the
old and new files are different and must have different file handles.

The filehandle is traditionally generated from the inode number and
a generation number, but the filesystem can actually do whatever it
likes.  xfs does it with xfs_fs_encode_fh().

Certainly you could initialise the i_generation to jiffies in
xfs_ialloc_ag_alloc.  That would be a suitable fix.  get_random_bytes
might be better, but the difference probably wouldn't be noticeable.

NeilBrown


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
       [not found]                 ` <34178.192.168.1.70.1206481102.squirrel@neil.brown.name>
@ 2008-03-25 22:13                   ` Josef 'Jeff' Sipek
  2008-03-25 23:09                     ` NeilBrown
  2008-03-26  3:37                     ` David Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Josef 'Jeff' Sipek @ 2008-03-25 22:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
	Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
	Frederic Revenu, Jeff Doan

On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote:
...
> However you still need to do something about the generation number.  It
> must be set to something.
 
Right.

> When you allocate an inode that doesn't currently exist on the device,
> you obviously cannot increment the old value and use that.

Makes sense.

> However you can do a lot better than always using 0.

I looked at the code (xfs_ialloc.c:xfs_ialloc_ag_alloc)

 290                 /*
 291                  * Set initial values for the inodes in this buffer.
 292                  */
 293                 xfs_biozero(fbuf, 0, ninodes << args.mp->m_sb.sb_inodelog);
 294                 for (i = 0; i < ninodes; i++) {
 295                         free = XFS_MAKE_IPTR(args.mp, fbuf, i);
 296                         free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
 297                         free->di_core.di_version = version;
 298                         free->di_next_unlinked = cpu_to_be32(NULLAGINO);
 299                         xfs_ialloc_log_di(tp, fbuf, i,
 300                                 XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
 301                 }

xfs_biozero(...) turns into a memset(buf, 0, len), and since the loop that
follows doesn't change the generation number, it'll stay 0.

> The simplest would be to generate a 'random' number (get_random_bytes).
> Slightly better would be to generate a random number at boot time
> and use that, incrementing it each time it is used to set the
> generation number for an inode.

I'm not familiar enough with NFS, do you want something that's monotonically
increasing or do you just test for inequality?  If it is inequality, why not
just use something like the jiffies - that should be unique enough.

> Even better would be store store that 'next generation number' in the
> superblock so there would be even less risk of the 'random' generation
> producing repeats.
> This is what ext3 does.  It doesn't dynamically allocate inodes,
> but it doesn't want to pay the cost of reading an old inode from
> storage just to see what the generation number is.  So it has
> a number in the superblock which is incremented on each inode allocation
> and is used as the generation number.
 
Something tells me that the SGI folks might not be all too happy with the
in-sb number...XFS tries to be as parallel as possible, and this would cause
the counter variable to bounce around their NUMA systems.  Perhaps a per-ag
variable would be better, but I remember reading that parallelizing updates
to some inode count variable (I forget which) in the superblock
\cite{dchinner-ols2006} led to a rather big improvement.  It's almost
morning down under, so I guess we'll get their comments on this soon.

Josef 'Jeff' Sipek.

-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
		- Brian W. Kernighan 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
       [not found]   ` <3AD71C5E-B45A-4BDB-8C94-73D62256BEBF@astro.wisc.edu>
@ 2008-03-12 18:06     ` Adam Schrotenboer
  0 siblings, 0 replies; 10+ messages in thread
From: Adam Schrotenboer @ 2008-03-12 18:06 UTC (permalink / raw)
  To: Stephan Jansen
  Cc: opensuse, Thomas Daniel, Trond Myklebust, linux-kernel,
	linux-nfs, jesper.juhl, Fred Revenu, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 2529 bytes --]

Stephan Jansen wrote:
>
> Hi,
>
> We've just run into a similar thing.  We have an OpenSuSE 10.3 NFS server
> exporting a filesystem to about 40 machines, most of which are also
> running OpenSuSE 10.3.  The client machines are running a distributed
> data processing pipeline.  The clients create, move and delete 
> directories
> and files on the server.  We've seen a few instances where the 
> programs on
> the clients die with a "no such file or directory" error but no errors in
> syslog.  Finally last night many of the clients gave these errors in 
> syslog
> at about the same time as the client programs died with "no such file or
> directory" errors:
>
> Mar 11 20:57:53 glimpse5 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:57:54 glimpse6 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:57:49 tsingtao kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:57:16 glimpse12 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:59:04 cecilia kernel: nfs_update_inode: inode 402653248 mode 
> changed, 0100644 to 0040755
> Mar 11 20:59:11 glimpse28 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:58:20 glimpse27 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:59:12 glimpse18 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:58:22 glimpse19 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:59:25 glimpse20 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:58:35 glimpse21 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:59:27 glimpse22 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:59:27 glimpse23 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
> Mar 11 20:59:39 glimpse24 kernel: nfs_update_inode: inode 402653248 
> mode changed, 0100644 to 0040755
>
> I notice that the offending inodes are all the same.  This all worked 
> without
> problems when the NFS server was SuSE 10.0.  The exported filesystem 
> is XFS.
>
> Anyone have any ideas what's going on?
I personally don't, but I have a running thread with some of the NFS 
maintainers, that I have added this mail to now.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-04-17 20:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <47CF0829.4020502@m2000.com>
     [not found] ` <1204752463.5035.34.camel@heimdal.trondhjem.org>
     [not found]   ` <47CF157B.1010908@m2000.com>
     [not found]     ` <18383.24847.381754.517731@notabene.brown>
     [not found]       ` <47CF62C5.7000908@m2000.com>
     [not found]         ` <18384.50909.866848.966192@notabene.brown>
2008-03-07  5:55           ` [opensuse] nfs_update_inode: inode X mode changed, Y to Z Adam Schrotenboer
2008-03-12 17:55             ` Adam Schrotenboer
2008-03-12 18:08               ` Trond Myklebust
2008-03-12 18:16                 ` Adam Schrotenboer
     [not found] <47C5EC81.6080004@m2000.com>
     [not found] ` <47C71EC6.3050702@m2000.com>
     [not found]   ` <3AD71C5E-B45A-4BDB-8C94-73D62256BEBF@astro.wisc.edu>
2008-03-12 18:06     ` Adam Schrotenboer
     [not found] <9a8748490803121513w285cd45rb6b26a3d842cac1b@mail.gmail.com>
     [not found] ` <20080312221511.GC31632@fieldses.org>
     [not found]   ` <9a8748490803121516u36395872i70cc88b0439adc74@mail.gmail.com>
     [not found]     ` <18394.1501.991087.80264@notabene.brown>
     [not found]       ` <47DAEFD0.9020407@m2000.com>
     [not found]         ` <47E92F8E.7030504@m2000.com>
     [not found]           ` <20080325190943.GF2237@fieldses.org>
     [not found]             ` <32953.192.168.1.70.1206477121.squirrel@neil.brown.name>
     [not found]               ` <20080325212425.GA20257@josefsipek.net>
     [not found]                 ` <34178.192.168.1.70.1206481102.squirrel@neil.brown.name>
2008-03-25 22:13                   ` Josef 'Jeff' Sipek
2008-03-25 23:09                     ` NeilBrown
2008-03-26  3:37                     ` David Chinner
2008-03-26  5:02                       ` David Chinner
2008-04-17 19:37                         ` Adam Schrotenboer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).