All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 13:54 Justin Piszcz
  2011-07-27 16:07 ` J. Bruce Fields
  0 siblings, 1 reply; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 13:54 UTC (permalink / raw)
  To: linux-kernel

Hi,

Kernel 2.6.30 on client.
Kernel 2.6.28 on server.

p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
readdir loop.  Please contact your server vendor.  Offending cookie: 10272

In the past I used NFS to push -> imagery -> NFS server.
Now I've flipped it so I am storing the images locally and viewing them 
remotely, what causes this?

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 13:54 2.6.xx: NFS: directory motion/cam2 contains a readdir loop Justin Piszcz
@ 2011-07-27 16:07 ` J. Bruce Fields
  2011-07-27 16:28     ` Justin Piszcz
  0 siblings, 1 reply; 69+ messages in thread
From: J. Bruce Fields @ 2011-07-27 16:07 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-nfs

On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
> Hi,
> 
> Kernel 2.6.30 on client.
> Kernel 2.6.28 on server.
> 
> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
> readdir loop.  Please contact your server vendor.  Offending cookie: 10272

What filesystem on the server are you exporting?

> In the past I used NFS to push -> imagery -> NFS server.
> Now I've flipped it so I am storing the images locally and viewing
> them remotely, what causes this?

Sorry, I don't understand what you mean.

--b.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 16:07 ` J. Bruce Fields
@ 2011-07-27 16:28     ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 16:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-kernel, linux-nfs, xfs



On Wed, 27 Jul 2011, J. Bruce Fields wrote:

> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
>> Hi,
>>
>> Kernel 2.6.30 on client.
>> Kernel 2.6.28 on server.
>>
>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
>> readdir loop.  Please contact your server vendor.  Offending cookie: 10272
>
> What filesystem on the server are you exporting?

Hi,

xfs.
/dev/sda1 on / type xfs (rw,noatime)

Nothing special, thoughts?

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 16:28     ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 16:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, J. Bruce Fields wrote:

> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
>> Hi,
>>
>> Kernel 2.6.30 on client.
>> Kernel 2.6.28 on server.
>>
>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
>> readdir loop.  Please contact your server vendor.  Offending cookie: 10272
>
> What filesystem on the server are you exporting?

Hi,

xfs.
/dev/sda1 on / type xfs (rw,noatime)

Nothing special, thoughts?

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 16:28     ` Justin Piszcz
@ 2011-07-27 16:40       ` Bryan Schumaker
  -1 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 16:40 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: J. Bruce Fields, linux-kernel, linux-nfs, xfs

On 07/27/2011 12:28 PM, Justin Piszcz wrote:
> 
> 
> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
> 
>> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
>>> Hi,
>>>
>>> Kernel 2.6.30 on client.
>>> Kernel 2.6.28 on server.
>>>
>>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
>>> readdir loop.  Please contact your server vendor.  Offending cookie: 10272
>>
>> What filesystem on the server are you exporting?
> 
> Hi,
> 
> xfs.
> /dev/sda1 on / type xfs (rw,noatime)
> 
> Nothing special, thoughts?

Are there a lot of files in the directory you're exporting?  It looks like cookie 10272 is mapped to multiple files. When the client tries to resume reading from this cookie, xfs will reply from the first matching file and cause the client to enter a loop.

- Bryan

> 
> Justin.
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 16:40       ` Bryan Schumaker
  0 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 16:40 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs

On 07/27/2011 12:28 PM, Justin Piszcz wrote:
> 
> 
> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
> 
>> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
>>> Hi,
>>>
>>> Kernel 2.6.30 on client.
>>> Kernel 2.6.28 on server.
>>>
>>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
>>> readdir loop.  Please contact your server vendor.  Offending cookie: 10272
>>
>> What filesystem on the server are you exporting?
> 
> Hi,
> 
> xfs.
> /dev/sda1 on / type xfs (rw,noatime)
> 
> Nothing special, thoughts?

Are there a lot of files in the directory you're exporting?  It looks like cookie 10272 is mapped to multiple files. When the client tries to resume reading from this cookie, xfs will reply from the first matching file and cause the client to enter a loop.

- Bryan

> 
> Justin.
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 16:40       ` Bryan Schumaker
@ 2011-07-27 17:00         ` Ruediger Meier
  -1 siblings, 0 replies; 69+ messages in thread
From: Ruediger Meier @ 2011-07-27 17:00 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: Justin Piszcz, J. Bruce Fields, linux-kernel, linux-nfs, xfs

On Wednesday 27 July 2011, Bryan Schumaker wrote:
> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
> > On Wed, 27 Jul 2011, J. Bruce Fields wrote:
> >>
> >> What filesystem on the server are you exporting?
> >
> > xfs.
> > /dev/sda1 on / type xfs (rw,noatime)
> >
> > Nothing special, thoughts?
>
> Are there a lot of files in the directory you're exporting?  It looks
> like cookie 10272 is mapped to multiple files.

I thought xfs is immune to readdir loops!?
Is your export directory really located directly within / on /dev/sda1?

cu,
Rudi

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 17:00         ` Ruediger Meier
  0 siblings, 0 replies; 69+ messages in thread
From: Ruediger Meier @ 2011-07-27 17:00 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: J. Bruce Fields, linux-nfs, xfs, Justin Piszcz, linux-kernel

On Wednesday 27 July 2011, Bryan Schumaker wrote:
> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
> > On Wed, 27 Jul 2011, J. Bruce Fields wrote:
> >>
> >> What filesystem on the server are you exporting?
> >
> > xfs.
> > /dev/sda1 on / type xfs (rw,noatime)
> >
> > Nothing special, thoughts?
>
> Are there a lot of files in the directory you're exporting?  It looks
> like cookie 10272 is mapped to multiple files.

I thought xfs is immune to readdir loops!?
Is your export directory really located directly within / on /dev/sda1?

cu,
Rudi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 17:00         ` Ruediger Meier
@ 2011-07-27 17:09           ` Bryan Schumaker
  -1 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 17:09 UTC (permalink / raw)
  To: Ruediger Meier
  Cc: Justin Piszcz, J. Bruce Fields, linux-kernel, linux-nfs, xfs

On 07/27/2011 01:00 PM, Ruediger Meier wrote:
> On Wednesday 27 July 2011, Bryan Schumaker wrote:
>> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>>>
>>>> What filesystem on the server are you exporting?
>>>
>>> xfs.
>>> /dev/sda1 on / type xfs (rw,noatime)
>>>
>>> Nothing special, thoughts?
>>
>> Are there a lot of files in the directory you're exporting?  It looks
>> like cookie 10272 is mapped to multiple files.
> 
> I thought xfs is immune to readdir loops!?

I guess that depends how it generates the cookie... I want to try out the ext4 patches that were posted earlier today.  I'll double check xfs while I'm at it.

- Bryan

> Is your export directory really located directly within / on /dev/sda1?
> 
> cu,
> Rudi


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 17:09           ` Bryan Schumaker
  0 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 17:09 UTC (permalink / raw)
  To: Ruediger Meier
  Cc: J. Bruce Fields, linux-nfs, xfs, Justin Piszcz, linux-kernel

On 07/27/2011 01:00 PM, Ruediger Meier wrote:
> On Wednesday 27 July 2011, Bryan Schumaker wrote:
>> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>>>
>>>> What filesystem on the server are you exporting?
>>>
>>> xfs.
>>> /dev/sda1 on / type xfs (rw,noatime)
>>>
>>> Nothing special, thoughts?
>>
>> Are there a lot of files in the directory you're exporting?  It looks
>> like cookie 10272 is mapped to multiple files.
> 
> I thought xfs is immune to readdir loops!?

I guess that depends how it generates the cookie... I want to try out the ext4 patches that were posted earlier today.  I'll double check xfs while I'm at it.

- Bryan

> Is your export directory really located directly within / on /dev/sda1?
> 
> cu,
> Rudi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 16:40       ` Bryan Schumaker
@ 2011-07-27 17:15         ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 17:15 UTC (permalink / raw)
  To: Bryan Schumaker; +Cc: J. Bruce Fields, linux-kernel, linux-nfs, xfs



On Wed, 27 Jul 2011, Bryan Schumaker wrote:

> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>
>>
>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>
>>> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
>>>> Hi,
>>>>
>>>> Kernel 2.6.30 on client.
>>>> Kernel 2.6.28 on server.
>>>>
>>>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
>>>> readdir loop.  Please contact your server vendor.  Offending cookie: 10272
>>>
>>> What filesystem on the server are you exporting?
>>
>> Hi,
>>
>> xfs.
>> /dev/sda1 on / type xfs (rw,noatime)
>>
>> Nothing special, thoughts?
>
> Are there a lot of files in the directory you're exporting?  It looks like cookie 10272 is mapped to multiple files. When the client tries to resume reading from this cookie, xfs will reply from the first matching file and cause the client to enter a loop.

Should I be using a different filesystem?

user@atom:/d1$ cd /d1/motion/cam1
user@atom:/d1/motion/cam1$ ls|wc
    5198    5198  140346
user@atom:/d1/motion/cam1$

Justin.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 17:15         ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 17:15 UTC (permalink / raw)
  To: Bryan Schumaker; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Bryan Schumaker wrote:

> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>
>>
>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>
>>> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote:
>>>> Hi,
>>>>
>>>> Kernel 2.6.30 on client.
>>>> Kernel 2.6.28 on server.
>>>>
>>>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a
>>>> readdir loop.  Please contact your server vendor.  Offending cookie: 10272
>>>
>>> What filesystem on the server are you exporting?
>>
>> Hi,
>>
>> xfs.
>> /dev/sda1 on / type xfs (rw,noatime)
>>
>> Nothing special, thoughts?
>
> Are there a lot of files in the directory you're exporting?  It looks like cookie 10272 is mapped to multiple files. When the client tries to resume reading from this cookie, xfs will reply from the first matching file and cause the client to enter a loop.

Should I be using a different filesystem?

user@atom:/d1$ cd /d1/motion/cam1
user@atom:/d1/motion/cam1$ ls|wc
    5198    5198  140346
user@atom:/d1/motion/cam1$

Justin.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 17:00         ` Ruediger Meier
@ 2011-07-27 17:17           ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 17:17 UTC (permalink / raw)
  To: Ruediger Meier
  Cc: Bryan Schumaker, J. Bruce Fields, linux-kernel, linux-nfs, xfs



On Wed, 27 Jul 2011, Ruediger Meier wrote:

> On Wednesday 27 July 2011, Bryan Schumaker wrote:
>> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>>>
>>>> What filesystem on the server are you exporting?
>>>
>>> xfs.
>>> /dev/sda1 on / type xfs (rw,noatime)
>>>
>>> Nothing special, thoughts?
>>
>> Are there a lot of files in the directory you're exporting?  It looks
>> like cookie 10272 is mapped to multiple files.
>
> I thought xfs is immune to readdir loops!?
> Is your export directory really located directly within / on /dev/sda1?

Hi,

I was sharing out a directory on the NFS server:
/d1         192.168.0.0/24(async,rw,no_root_squash,no_subtree_check,fsid=1)

Should I share out / instead?
Is this a known problem?

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              30G   13G   18G  43% /
tmpfs                 2.0G  8.0K  2.0G   1% /lib/init/rw
udev                   10M  192K  9.9M   2% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
$

Justin.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 17:17           ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 17:17 UTC (permalink / raw)
  To: Ruediger Meier
  Cc: J. Bruce Fields, linux-nfs, xfs, linux-kernel, Bryan Schumaker



On Wed, 27 Jul 2011, Ruediger Meier wrote:

> On Wednesday 27 July 2011, Bryan Schumaker wrote:
>> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>>>
>>>> What filesystem on the server are you exporting?
>>>
>>> xfs.
>>> /dev/sda1 on / type xfs (rw,noatime)
>>>
>>> Nothing special, thoughts?
>>
>> Are there a lot of files in the directory you're exporting?  It looks
>> like cookie 10272 is mapped to multiple files.
>
> I thought xfs is immune to readdir loops!?
> Is your export directory really located directly within / on /dev/sda1?

Hi,

I was sharing out a directory on the NFS server:
/d1         192.168.0.0/24(async,rw,no_root_squash,no_subtree_check,fsid=1)

Should I share out / instead?
Is this a known problem?

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              30G   13G   18G  43% /
tmpfs                 2.0G  8.0K  2.0G   1% /lib/init/rw
udev                   10M  192K  9.9M   2% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
$

Justin.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 17:17           ` Justin Piszcz
@ 2011-07-27 17:45             ` J. Bruce Fields
  -1 siblings, 0 replies; 69+ messages in thread
From: J. Bruce Fields @ 2011-07-27 17:45 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Ruediger Meier, Bryan Schumaker, linux-kernel, linux-nfs, xfs

On Wed, Jul 27, 2011 at 01:17:35PM -0400, Justin Piszcz wrote:
> 
> 
> On Wed, 27 Jul 2011, Ruediger Meier wrote:
> 
> >On Wednesday 27 July 2011, Bryan Schumaker wrote:
> >>On 07/27/2011 12:28 PM, Justin Piszcz wrote:
> >>>On Wed, 27 Jul 2011, J. Bruce Fields wrote:
> >>>>
> >>>>What filesystem on the server are you exporting?
> >>>
> >>>xfs.
> >>>/dev/sda1 on / type xfs (rw,noatime)
> >>>
> >>>Nothing special, thoughts?
> >>
> >>Are there a lot of files in the directory you're exporting?  It looks
> >>like cookie 10272 is mapped to multiple files.
> >
> >I thought xfs is immune to readdir loops!?
> >Is your export directory really located directly within / on /dev/sda1?
> 
> Hi,
> 
> I was sharing out a directory on the NFS server:
> /d1         192.168.0.0/24(async,rw,no_root_squash,no_subtree_check,fsid=1)
> 
> Should I share out / instead?

You can do that if you want, but note that anyone malicious on that
network can get access to / by guessing filehandles.  (Safer would be to
mount a separate partition at /d1.)

But in any case that's got nothing to do with readdir cookie problems.

--b.

> Is this a known problem?
> 
> $ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1              30G   13G   18G  43% /
> tmpfs                 2.0G  8.0K  2.0G   1% /lib/init/rw
> udev                   10M  192K  9.9M   2% /dev
> tmpfs                 2.0G     0  2.0G   0% /dev/shm
> $
> 
> Justin.
> 
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 17:45             ` J. Bruce Fields
  0 siblings, 0 replies; 69+ messages in thread
From: J. Bruce Fields @ 2011-07-27 17:45 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Ruediger Meier, linux-nfs, linux-kernel, xfs, Bryan Schumaker

On Wed, Jul 27, 2011 at 01:17:35PM -0400, Justin Piszcz wrote:
> 
> 
> On Wed, 27 Jul 2011, Ruediger Meier wrote:
> 
> >On Wednesday 27 July 2011, Bryan Schumaker wrote:
> >>On 07/27/2011 12:28 PM, Justin Piszcz wrote:
> >>>On Wed, 27 Jul 2011, J. Bruce Fields wrote:
> >>>>
> >>>>What filesystem on the server are you exporting?
> >>>
> >>>xfs.
> >>>/dev/sda1 on / type xfs (rw,noatime)
> >>>
> >>>Nothing special, thoughts?
> >>
> >>Are there a lot of files in the directory you're exporting?  It looks
> >>like cookie 10272 is mapped to multiple files.
> >
> >I thought xfs is immune to readdir loops!?
> >Is your export directory really located directly within / on /dev/sda1?
> 
> Hi,
> 
> I was sharing out a directory on the NFS server:
> /d1         192.168.0.0/24(async,rw,no_root_squash,no_subtree_check,fsid=1)
> 
> Should I share out / instead?

You can do that if you want, but note that anyone malicious on that
network can get access to / by guessing filehandles.  (Safer would be to
mount a separate partition at /d1.)

But in any case that's got nothing to do with readdir cookie problems.

--b.

> Is this a known problem?
> 
> $ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1              30G   13G   18G  43% /
> tmpfs                 2.0G  8.0K  2.0G   1% /lib/init/rw
> udev                   10M  192K  9.9M   2% /dev
> tmpfs                 2.0G     0  2.0G   0% /dev/shm
> $
> 
> Justin.
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 16:28     ` Justin Piszcz
@ 2011-07-27 18:11       ` Christoph Hellwig
  -1 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 18:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]

Justin,

can you please run the attached test program on the affected directory
on the server, and see if you see duplicates in the d_off colum.  Unless
you have privacy concerns I would also love to see the full output.


[-- Attachment #2: getdents.c --]
[-- Type: text/plain, Size: 1150 bytes --]

#define _GNU_SOURCE

#include <dirent.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>

#define handle_error(msg) \
               do { perror(msg); exit(EXIT_FAILURE); } while (0)

struct linux_dirent64 {
	unsigned long long d_ino;
	long long d_off;
	unsigned short d_reclen;
	unsigned char d_type;
	char d_name[];
};

#define BUF_SIZE 131072

int main(int argc, char *argv[])
{
	int fd, nread;
	char buf[BUF_SIZE];
	struct linux_dirent64 *d;
	int bpos;

	fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
	if (fd == -1)
		handle_error("open");

	for (;;) {
		nread = syscall(SYS_getdents64, fd, buf, BUF_SIZE);
		if (nread == -1)
			handle_error("getdents");

		if (nread == 0)
			break;

		printf("--------------- nread=%d ---------------\n", nread);
		printf("i-node#          type  d_reclen  d_off   d_name\n");
		for (bpos = 0; bpos < nread;) {
			d = (struct linux_dirent64 *)(buf + bpos);
			printf("%16lld  ", d->d_ino);
			printf("%4d %10lld  %s\n", d->d_reclen,
			       d->d_off, d->d_name);
			bpos += d->d_reclen;
		}
	}

	exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 18:11       ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 18:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]

Justin,

can you please run the attached test program on the affected directory
on the server, and see if you see duplicates in the d_off colum.  Unless
you have privacy concerns I would also love to see the full output.


[-- Attachment #2: getdents.c --]
[-- Type: text/plain, Size: 1150 bytes --]

#define _GNU_SOURCE

#include <dirent.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>

#define handle_error(msg) \
               do { perror(msg); exit(EXIT_FAILURE); } while (0)

struct linux_dirent64 {
	unsigned long long d_ino;
	long long d_off;
	unsigned short d_reclen;
	unsigned char d_type;
	char d_name[];
};

#define BUF_SIZE 131072

int main(int argc, char *argv[])
{
	int fd, nread;
	char buf[BUF_SIZE];
	struct linux_dirent64 *d;
	int bpos;

	fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
	if (fd == -1)
		handle_error("open");

	for (;;) {
		nread = syscall(SYS_getdents64, fd, buf, BUF_SIZE);
		if (nread == -1)
			handle_error("getdents");

		if (nread == 0)
			break;

		printf("--------------- nread=%d ---------------\n", nread);
		printf("i-node#          type  d_reclen  d_off   d_name\n");
		for (bpos = 0; bpos < nread;) {
			d = (struct linux_dirent64 *)(buf + bpos);
			printf("%16lld  ", d->d_ino);
			printf("%4d %10lld  %s\n", d->d_reclen,
			       d->d_off, d->d_name);
			bpos += d->d_reclen;
		}
	}

	exit(EXIT_SUCCESS);
}

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 17:00         ` Ruediger Meier
@ 2011-07-27 18:28           ` Bryan Schumaker
  -1 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 18:28 UTC (permalink / raw)
  To: Ruediger Meier
  Cc: Justin Piszcz, J. Bruce Fields, linux-kernel, linux-nfs, xfs

On 07/27/2011 01:00 PM, Ruediger Meier wrote:
> On Wednesday 27 July 2011, Bryan Schumaker wrote:
>> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>>>
>>>> What filesystem on the server are you exporting?
>>>
>>> xfs.
>>> /dev/sda1 on / type xfs (rw,noatime)
>>>
>>> Nothing special, thoughts?
>>
>> Are there a lot of files in the directory you're exporting?  It looks
>> like cookie 10272 is mapped to multiple files.
> 
> I thought xfs is immune to readdir loops!?

I can ls a directory with 500,000 files over nfs4.  That's usually enough to cause the readdir loop in ext4, so I guess this is a different problem.

> Is your export directory really located directly within / on /dev/sda1?
> 
> cu,
> Rudi


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 18:28           ` Bryan Schumaker
  0 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 18:28 UTC (permalink / raw)
  To: Ruediger Meier
  Cc: J. Bruce Fields, linux-nfs, xfs, Justin Piszcz, linux-kernel

On 07/27/2011 01:00 PM, Ruediger Meier wrote:
> On Wednesday 27 July 2011, Bryan Schumaker wrote:
>> On 07/27/2011 12:28 PM, Justin Piszcz wrote:
>>> On Wed, 27 Jul 2011, J. Bruce Fields wrote:
>>>>
>>>> What filesystem on the server are you exporting?
>>>
>>> xfs.
>>> /dev/sda1 on / type xfs (rw,noatime)
>>>
>>> Nothing special, thoughts?
>>
>> Are there a lot of files in the directory you're exporting?  It looks
>> like cookie 10272 is mapped to multiple files.
> 
> I thought xfs is immune to readdir loops!?

I can ls a directory with 500,000 files over nfs4.  That's usually enough to cause the readdir loop in ext4, so I guess this is a different problem.

> Is your export directory really located directly within / on /dev/sda1?
> 
> cu,
> Rudi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 18:11       ` Christoph Hellwig
@ 2011-07-27 19:35         ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 19:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Christoph Hellwig wrote:

> Justin,
>
> can you please run the attached test program on the affected directory
> on the server, and see if you see duplicates in the d_off colum.  Unless
> you have privacy concerns I would also love to see the full output.
>
>

Hi,

Done:

atom:/d1/motion/cam1# /root/getdents  > /tmp/cam1-out.txt
atom:/d1/motion/cam1# cd ../cam2
atom:/d1/motion/cam2# /root/getdents  > /tmp/cam2-out.txt
atom:/d1/motion/cam2# cd ../cam3
atom:/d1/motion/cam3# /root/getdents  > /tmp/cam3-out.txt
atom:/d1/motion/cam3#

Files:
http://home.comcast.net/~jpiszcz/20110727/cam1-out.txt
http://home.comcast.net/~jpiszcz/20110727/cam2-out.txt
http://home.comcast.net/~jpiszcz/20110727/cam3-out.txt

Currently I do not see any dupes, however I have a script that moves 
images out of the directory once an hour:
0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1

I'll disable that for now and see if this recurs, if it does, I'll gather 
additional output and send it out, thanks.

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 19:35         ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 19:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Christoph Hellwig wrote:

> Justin,
>
> can you please run the attached test program on the affected directory
> on the server, and see if you see duplicates in the d_off colum.  Unless
> you have privacy concerns I would also love to see the full output.
>
>

Hi,

Done:

atom:/d1/motion/cam1# /root/getdents  > /tmp/cam1-out.txt
atom:/d1/motion/cam1# cd ../cam2
atom:/d1/motion/cam2# /root/getdents  > /tmp/cam2-out.txt
atom:/d1/motion/cam2# cd ../cam3
atom:/d1/motion/cam3# /root/getdents  > /tmp/cam3-out.txt
atom:/d1/motion/cam3#

Files:
http://home.comcast.net/~jpiszcz/20110727/cam1-out.txt
http://home.comcast.net/~jpiszcz/20110727/cam2-out.txt
http://home.comcast.net/~jpiszcz/20110727/cam3-out.txt

Currently I do not see any dupes, however I have a script that moves 
images out of the directory once an hour:
0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1

I'll disable that for now and see if this recurs, if it does, I'll gather 
additional output and send it out, thanks.

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:35         ` Justin Piszcz
@ 2011-07-27 19:39           ` Christoph Hellwig
  -1 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 19:39 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs

On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> Currently I do not see any dupes, however I have a script that moves
> images out of the directory once an hour:
> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1

Do you keep adding files to the directory while you move files out?
What's the rate of additions/removals to the directory?

If we add files to the directory while removing others we could easily
re-use the same offset for a different file.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 19:39           ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 19:39 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs

On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> Currently I do not see any dupes, however I have a script that moves
> images out of the directory once an hour:
> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1

Do you keep adding files to the directory while you move files out?
What's the rate of additions/removals to the directory?

If we add files to the directory while removing others we could easily
re-use the same offset for a different file.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:39           ` Christoph Hellwig
@ 2011-07-27 19:44             ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 19:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Christoph Hellwig wrote:

> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>> Currently I do not see any dupes, however I have a script that moves
>> images out of the directory once an hour:
>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>
> Do you keep adding files to the directory while you move files out?
Yes, otherwise there are too many files in the directory and viewers, e.g.,
each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
it around 5,000 pictures or less.

> What's the rate of additions/removals to the directory?
Additions it depends, around 5,000 over a 12hr period, 416/hr, current:

atom:/d1/motion# find cam1|wc
    5215    5215  166853
atom:/d1/motion# find cam2|wc
    5069    5069  162181
atom:/d1/motion# find cam3|wc
    5594    5594  178981
atom:/d1/motion#

>
> If we add files to the directory while removing others we could easily
> re-use the same offset for a different file.
>

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 19:44             ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 19:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Christoph Hellwig wrote:

> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>> Currently I do not see any dupes, however I have a script that moves
>> images out of the directory once an hour:
>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>
> Do you keep adding files to the directory while you move files out?
Yes, otherwise there are too many files in the directory and viewers, e.g.,
each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
it around 5,000 pictures or less.

> What's the rate of additions/removals to the directory?
Additions it depends, around 5,000 over a 12hr period, 416/hr, current:

atom:/d1/motion# find cam1|wc
    5215    5215  166853
atom:/d1/motion# find cam2|wc
    5069    5069  162181
atom:/d1/motion# find cam3|wc
    5594    5594  178981
atom:/d1/motion#

>
> If we add files to the directory while removing others we could easily
> re-use the same offset for a different file.
>

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:44             ` Justin Piszcz
@ 2011-07-27 19:47               ` Christoph Hellwig
  -1 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 19:47 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs

On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> 
> 
> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> 
> >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> >>Currently I do not see any dupes, however I have a script that moves
> >>images out of the directory once an hour:
> >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> >
> >Do you keep adding files to the directory while you move files out?
> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> it around 5,000 pictures or less.
> 
> >What's the rate of additions/removals to the directory?
> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> 
> atom:/d1/motion# find cam1|wc
>    5215    5215  166853
> atom:/d1/motion# find cam2|wc
>    5069    5069  162181
> atom:/d1/motion# find cam3|wc
>    5594    5594  178981
> atom:/d1/motion#

This sounds a lot like xfs simply filling up the directory index slots
of files that you just moved out with new files, and nfs falsely
claiming that this is a problem.

Any chance to figure out if the file you hit the printk with was one
that got either recently added or moved when you hit it?  (I can't
follow the nfs code enough to check if it prints the first or second hit
of the same cookie)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 19:47               ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 19:47 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs

On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> 
> 
> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> 
> >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> >>Currently I do not see any dupes, however I have a script that moves
> >>images out of the directory once an hour:
> >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> >
> >Do you keep adding files to the directory while you move files out?
> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> it around 5,000 pictures or less.
> 
> >What's the rate of additions/removals to the directory?
> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> 
> atom:/d1/motion# find cam1|wc
>    5215    5215  166853
> atom:/d1/motion# find cam2|wc
>    5069    5069  162181
> atom:/d1/motion# find cam3|wc
>    5594    5594  178981
> atom:/d1/motion#

This sounds a lot like xfs simply filling up the directory index slots
of files that you just moved out with new files, and nfs falsely
claiming that this is a problem.

Any chance to figure out if the file you hit the printk with was one
that got either recently added or moved when you hit it?  (I can't
follow the nfs code enough to check if it prints the first or second hit
of the same cookie)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:47               ` Christoph Hellwig
@ 2011-07-27 19:54                 ` Bryan Schumaker
  -1 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 19:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs

On 07/27/2011 03:47 PM, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>
>>
>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>
>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>> Currently I do not see any dupes, however I have a script that moves
>>>> images out of the directory once an hour:
>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>
>>> Do you keep adding files to the directory while you move files out?
>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>> it around 5,000 pictures or less.
>>
>>> What's the rate of additions/removals to the directory?
>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>
>> atom:/d1/motion# find cam1|wc
>>    5215    5215  166853
>> atom:/d1/motion# find cam2|wc
>>    5069    5069  162181
>> atom:/d1/motion# find cam3|wc
>>    5594    5594  178981
>> atom:/d1/motion#
> 
> This sounds a lot like xfs simply filling up the directory index slots
> of files that you just moved out with new files, and nfs falsely
> claiming that this is a problem.
> 
> Any chance to figure out if the file you hit the printk with was one
> that got either recently added or moved when you hit it?  (I can't
> follow the nfs code enough to check if it prints the first or second hit
> of the same cookie)

It should be printing on the second hit of a cookie.

- Bryan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 19:54                 ` Bryan Schumaker
  0 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-27 19:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, linux-nfs, xfs, Justin Piszcz, linux-kernel

On 07/27/2011 03:47 PM, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>
>>
>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>
>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>> Currently I do not see any dupes, however I have a script that moves
>>>> images out of the directory once an hour:
>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>
>>> Do you keep adding files to the directory while you move files out?
>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>> it around 5,000 pictures or less.
>>
>>> What's the rate of additions/removals to the directory?
>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>
>> atom:/d1/motion# find cam1|wc
>>    5215    5215  166853
>> atom:/d1/motion# find cam2|wc
>>    5069    5069  162181
>> atom:/d1/motion# find cam3|wc
>>    5594    5594  178981
>> atom:/d1/motion#
> 
> This sounds a lot like xfs simply filling up the directory index slots
> of files that you just moved out with new files, and nfs falsely
> claiming that this is a problem.
> 
> Any chance to figure out if the file you hit the printk with was one
> that got either recently added or moved when you hit it?  (I can't
> follow the nfs code enough to check if it prints the first or second hit
> of the same cookie)

It should be printing on the second hit of a cookie.

- Bryan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:47               ` Christoph Hellwig
@ 2011-07-27 19:57                 ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 19:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Christoph Hellwig wrote:

> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>
>>
>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>
>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>> Currently I do not see any dupes, however I have a script that moves
>>>> images out of the directory once an hour:
>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>
>>> Do you keep adding files to the directory while you move files out?
>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>> it around 5,000 pictures or less.
>>
>>> What's the rate of additions/removals to the directory?
>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>
>> atom:/d1/motion# find cam1|wc
>>    5215    5215  166853
>> atom:/d1/motion# find cam2|wc
>>    5069    5069  162181
>> atom:/d1/motion# find cam3|wc
>>    5594    5594  178981
>> atom:/d1/motion#
>
> This sounds a lot like xfs simply filling up the directory index slots
> of files that you just moved out with new files, and nfs falsely
> claiming that this is a problem.
>
> Any chance to figure out if the file you hit the printk with was one
> that got either recently added or moved when you hit it?  (I can't
> follow the nfs code enough to check if it prints the first or second hit
> of the same cookie)
>

It seems to happen across all directories, these are from the past 24 hours.

[41901.041923] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14368
[41901.275284] NFS: directory motion/cam3 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 17435
[45497.265250] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14488
[45498.832696] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 16416
[45507.812712] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14778
[45508.458785] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14778
[92223.918892] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 10272
[99413.259688] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 10272
[113791.004006] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 6848

Interestingly, I have two machines that perform this function, both XFS and it 
only affects the client running 2.6.38:

$ df -h
2.6.38 - Has a kernel driver that was removed in 2.6.39 (rt2870sta) which
works really well.
atomw:/d1              30G   13G   18G  43% /nfs/atomw/d1

2.6.39:
d630w:/d1              75G  2.6G   72G   4% /nfs/d630w/d1

However, to rule out any kernel issues I'll try 3.0 and see if the problem recurs with a newer version as it is _NOT_ happening with 2.6.39 (similar setup) on both; however:

d630 => 32bit installation (core2duo t7500)
atomw => 64-bit atom

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 19:57                 ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 19:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs



On Wed, 27 Jul 2011, Christoph Hellwig wrote:

> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>
>>
>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>
>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>> Currently I do not see any dupes, however I have a script that moves
>>>> images out of the directory once an hour:
>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>
>>> Do you keep adding files to the directory while you move files out?
>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>> it around 5,000 pictures or less.
>>
>>> What's the rate of additions/removals to the directory?
>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>
>> atom:/d1/motion# find cam1|wc
>>    5215    5215  166853
>> atom:/d1/motion# find cam2|wc
>>    5069    5069  162181
>> atom:/d1/motion# find cam3|wc
>>    5594    5594  178981
>> atom:/d1/motion#
>
> This sounds a lot like xfs simply filling up the directory index slots
> of files that you just moved out with new files, and nfs falsely
> claiming that this is a problem.
>
> Any chance to figure out if the file you hit the printk with was one
> that got either recently added or moved when you hit it?  (I can't
> follow the nfs code enough to check if it prints the first or second hit
> of the same cookie)
>

It seems to happen across all directories, these are from the past 24 hours.

[41901.041923] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14368
[41901.275284] NFS: directory motion/cam3 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 17435
[45497.265250] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14488
[45498.832696] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 16416
[45507.812712] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14778
[45508.458785] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 14778
[92223.918892] NFS: directory motion/cam2 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 10272
[99413.259688] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 10272
[113791.004006] NFS: directory motion/cam1 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 6848

Interestingly, I have two machines that perform this function, both XFS and it 
only affects the client running 2.6.38:

$ df -h
2.6.38 - Has a kernel driver that was removed in 2.6.39 (rt2870sta) which
works really well.
atomw:/d1              30G   13G   18G  43% /nfs/atomw/d1

2.6.39:
d630w:/d1              75G  2.6G   72G   4% /nfs/d630w/d1

However, to rule out any kernel issues I'll try 3.0 and see if the problem recurs with a newer version as it is _NOT_ happening with 2.6.39 (similar setup) on both; however:

d630 => 32bit installation (core2duo t7500)
atomw => 64-bit atom

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:54                 ` Bryan Schumaker
@ 2011-07-27 20:02                   ` Christoph Hellwig
  -1 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:02 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: Christoph Hellwig, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, Jul 27, 2011 at 03:54:49PM -0400, Bryan Schumaker wrote:
> > Any chance to figure out if the file you hit the printk with was one
> > that got either recently added or moved when you hit it?  (I can't
> > follow the nfs code enough to check if it prints the first or second hit
> > of the same cookie)
> 
> It should be printing on the second hit of a cookie.

But looking closer at it it only prints the directory name and not that
of any of the matching cookies, making it pretty useless to debug any
problem.  (and it makes my previous question to Justin look stupid..).


But so far I still stick to my previous theory that this sounds like
a directory offset getting reused.  How is cache invalidation for
the array supposed to work?  And maybe more importantly, given that he
can only reproduce it with a .38 client did any bugs get fixed in that
code recently that might lead to issues with the cache invalidation?


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:02                   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:02 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Justin Piszcz

On Wed, Jul 27, 2011 at 03:54:49PM -0400, Bryan Schumaker wrote:
> > Any chance to figure out if the file you hit the printk with was one
> > that got either recently added or moved when you hit it?  (I can't
> > follow the nfs code enough to check if it prints the first or second hit
> > of the same cookie)
> 
> It should be printing on the second hit of a cookie.

But looking closer at it it only prints the directory name and not that
of any of the matching cookies, making it pretty useless to debug any
problem.  (and it makes my previous question to Justin look stupid..).


But so far I still stick to my previous theory that this sounds like
a directory offset getting reused.  How is cache invalidation for
the array supposed to work?  And maybe more importantly, given that he
can only reproduce it with a .38 client did any bugs get fixed in that
code recently that might lead to issues with the cache invalidation?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:02                   ` Christoph Hellwig
@ 2011-07-27 20:05                     ` Christoph Hellwig
  -1 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:05 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: Christoph Hellwig, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, Jul 27, 2011 at 04:02:40PM -0400, Christoph Hellwig wrote:
> But looking closer at it it only prints the directory name and not that
> of any of the matching cookies, making it pretty useless to debug any
> problem.  (and it makes my previous question to Justin look stupid..).
> 
> 
> But so far I still stick to my previous theory that this sounds like
> a directory offset getting reused.  How is cache invalidation for
> the array supposed to work?  And maybe more importantly, given that he
> can only reproduce it with a .38 client did any bugs get fixed in that
> code recently that might lead to issues with the cache invalidation?

Actually we won't even need cache invalidation bugs, see
nfsd_buffered_readdir() - we might do multiple vfs_readdir calls to
fill a single nfs reply, and between these two directory contents might
have been completely replaced, in the worst (pathological case) you
might get a second readdir having exactly the same offsets, but pointing
to completely different inodes.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:05                     ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:05 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Justin Piszcz

On Wed, Jul 27, 2011 at 04:02:40PM -0400, Christoph Hellwig wrote:
> But looking closer at it it only prints the directory name and not that
> of any of the matching cookies, making it pretty useless to debug any
> problem.  (and it makes my previous question to Justin look stupid..).
> 
> 
> But so far I still stick to my previous theory that this sounds like
> a directory offset getting reused.  How is cache invalidation for
> the array supposed to work?  And maybe more importantly, given that he
> can only reproduce it with a .38 client did any bugs get fixed in that
> code recently that might lead to issues with the cache invalidation?

Actually we won't even need cache invalidation bugs, see
nfsd_buffered_readdir() - we might do multiple vfs_readdir calls to
fill a single nfs reply, and between these two directory contents might
have been completely replaced, in the worst (pathological case) you
might get a second readdir having exactly the same offsets, but pointing
to completely different inodes.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:02                   ` Christoph Hellwig
@ 2011-07-27 20:26                     ` Rüdiger Meier
  -1 siblings, 0 replies; 69+ messages in thread
From: Rüdiger Meier @ 2011-07-27 20:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wednesday 27 July 2011, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 03:54:49PM -0400, Bryan Schumaker wrote:
> > It should be printing on the second hit of a cookie.
>
> But looking closer at it it only prints the directory name and not
> that of any of the matching cookies, making it pretty useless to
> debug any problem.  (and it makes my previous question to Justin look
> stupid..).
>
>
> But so far I still stick to my previous theory that this sounds like
> a directory offset getting reused.  How is cache invalidation for
> the array supposed to work?  And maybe more importantly, given that
> he can only reproduce it with a .38 client did any bugs get fixed in
> that code recently that might lead to issues with the cache
> invalidation?

At the time I've started this thread
http://comments.gmane.org/gmane.linux.nfs/40863
I had the feeling that the readdir cache changings in 2.6.37 have 
something to do with these loop problems.

After that thread I've accepted that's a general problem with 
ext4/dirindex and nfs but seeing it again on xfs with just 5000 files 
I'm in doubt again.

cu,
Rudi

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:26                     ` Rüdiger Meier
  0 siblings, 0 replies; 69+ messages in thread
From: Rüdiger Meier @ 2011-07-27 20:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nfs, linux-kernel, xfs, J. Bruce Fields, Justin Piszcz,
	Bryan Schumaker

On Wednesday 27 July 2011, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 03:54:49PM -0400, Bryan Schumaker wrote:
> > It should be printing on the second hit of a cookie.
>
> But looking closer at it it only prints the directory name and not
> that of any of the matching cookies, making it pretty useless to
> debug any problem.  (and it makes my previous question to Justin look
> stupid..).
>
>
> But so far I still stick to my previous theory that this sounds like
> a directory offset getting reused.  How is cache invalidation for
> the array supposed to work?  And maybe more importantly, given that
> he can only reproduce it with a .38 client did any bugs get fixed in
> that code recently that might lead to issues with the cache
> invalidation?

At the time I've started this thread
http://comments.gmane.org/gmane.linux.nfs/40863
I had the feeling that the readdir cache changings in 2.6.37 have 
something to do with these loop problems.

After that thread I've accepted that's a general problem with 
ext4/dirindex and nfs but seeing it again on xfs with just 5000 files 
I'm in doubt again.

cu,
Rudi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 19:47               ` Christoph Hellwig
@ 2011-07-27 20:37                 ` Trond Myklebust
  -1 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:37 UTC (permalink / raw)
  To: Christoph Hellwig, Bryan Schumaker
  Cc: Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs

On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > 
> > 
> > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > 
> > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > >>Currently I do not see any dupes, however I have a script that moves
> > >>images out of the directory once an hour:
> > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > >
> > >Do you keep adding files to the directory while you move files out?
> > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > it around 5,000 pictures or less.
> > 
> > >What's the rate of additions/removals to the directory?
> > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > 
> > atom:/d1/motion# find cam1|wc
> >    5215    5215  166853
> > atom:/d1/motion# find cam2|wc
> >    5069    5069  162181
> > atom:/d1/motion# find cam3|wc
> >    5594    5594  178981
> > atom:/d1/motion#
> 
> This sounds a lot like xfs simply filling up the directory index slots
> of files that you just moved out with new files, and nfs falsely
> claiming that this is a problem.

Yep. There is an existing bugzilla report for this bug at

   https://bugzilla.kernel.org/show_bug.cgi?id=38572

I have a preliminary patch there that attempts to turn off the loop
detection when the directory is seen to change, however that patch still
appears to have a bug in it, and I haven't had time to figure out what
is wrong yet.

Can you perhaps take a look, Bryan?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:37                 ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:37 UTC (permalink / raw)
  To: Christoph Hellwig, Bryan Schumaker
  Cc: J. Bruce Fields, linux-nfs, xfs, Justin Piszcz, linux-kernel

On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > 
> > 
> > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > 
> > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > >>Currently I do not see any dupes, however I have a script that moves
> > >>images out of the directory once an hour:
> > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > >
> > >Do you keep adding files to the directory while you move files out?
> > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > it around 5,000 pictures or less.
> > 
> > >What's the rate of additions/removals to the directory?
> > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > 
> > atom:/d1/motion# find cam1|wc
> >    5215    5215  166853
> > atom:/d1/motion# find cam2|wc
> >    5069    5069  162181
> > atom:/d1/motion# find cam3|wc
> >    5594    5594  178981
> > atom:/d1/motion#
> 
> This sounds a lot like xfs simply filling up the directory index slots
> of files that you just moved out with new files, and nfs falsely
> claiming that this is a problem.

Yep. There is an existing bugzilla report for this bug at

   https://bugzilla.kernel.org/show_bug.cgi?id=38572

I have a preliminary patch there that attempts to turn off the loop
detection when the directory is seen to change, however that patch still
appears to have a bug in it, and I haven't had time to figure out what
is wrong yet.

Can you perhaps take a look, Bryan?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:26                     ` Rüdiger Meier
@ 2011-07-27 20:47                       ` Christoph Hellwig
  -1 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:47 UTC (permalink / raw)
  To: R?diger Meier
  Cc: Christoph Hellwig, Bryan Schumaker, Justin Piszcz,
	J. Bruce Fields, linux-nfs, linux-kernel, xfs

On Wed, Jul 27, 2011 at 10:26:55PM +0200, R?diger Meier wrote:
> At the time I've started this thread
> http://comments.gmane.org/gmane.linux.nfs/40863
> I had the feeling that the readdir cache changings in 2.6.37 have 
> something to do with these loop problems.
> 
> After that thread I've accepted that's a general problem with 
> ext4/dirindex and nfs but seeing it again on xfs with just 5000 files 
> I'm in doubt again.

Two separate issues.  For one thing the nfs code simply doesn't seem
to handle changing directories very well, and one and a half the Linux
NFS server might even send incoherent readdir output in a single
protocol reply.

Issue two is that the ext3/4 hashed directory format is too simply (not
to say dumb) to provide a proper 32-bit linear value for the dirent
d_off field.  It's not a complex task, and the first relatively simple
generation of xfs btree directories couldn't handle it either.  The
v2 directory format handles it fine, but at the cost of a much more
complex codebase.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:47                       ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2011-07-27 20:47 UTC (permalink / raw)
  To: R?diger Meier
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, Bryan Schumaker,
	Christoph Hellwig, Justin Piszcz, xfs

On Wed, Jul 27, 2011 at 10:26:55PM +0200, R?diger Meier wrote:
> At the time I've started this thread
> http://comments.gmane.org/gmane.linux.nfs/40863
> I had the feeling that the readdir cache changings in 2.6.37 have 
> something to do with these loop problems.
> 
> After that thread I've accepted that's a general problem with 
> ext4/dirindex and nfs but seeing it again on xfs with just 5000 files 
> I'm in doubt again.

Two separate issues.  For one thing the nfs code simply doesn't seem
to handle changing directories very well, and one and a half the Linux
NFS server might even send incoherent readdir output in a single
protocol reply.

Issue two is that the ext3/4 hashed directory format is too simply (not
to say dumb) to provide a proper 32-bit linear value for the dirent
d_off field.  It's not a complex task, and the first relatively simple
generation of xfs btree directories couldn't handle it either.  The
v2 directory format handles it fine, but at the cost of a much more
complex codebase.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:37                 ` Trond Myklebust
  (?)
@ 2011-07-27 20:54                   ` Trond Myklebust
  -1 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: 
> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > > 
> > > 
> > > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > > 
> > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > > >>Currently I do not see any dupes, however I have a script that moves
> > > >>images out of the directory once an hour:
> > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > > >
> > > >Do you keep adding files to the directory while you move files out?
> > > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > > it around 5,000 pictures or less.
> > > 
> > > >What's the rate of additions/removals to the directory?
> > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > > 
> > > atom:/d1/motion# find cam1|wc
> > >    5215    5215  166853
> > > atom:/d1/motion# find cam2|wc
> > >    5069    5069  162181
> > > atom:/d1/motion# find cam3|wc
> > >    5594    5594  178981
> > > atom:/d1/motion#
> > 
> > This sounds a lot like xfs simply filling up the directory index slots
> > of files that you just moved out with new files, and nfs falsely
> > claiming that this is a problem.
> 
> Yep. There is an existing bugzilla report for this bug at
> 
>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> 
> I have a preliminary patch there that attempts to turn off the loop
> detection when the directory is seen to change, however that patch still
> appears to have a bug in it, and I haven't had time to figure out what
> is wrong yet.
> 
> Can you perhaps take a look, Bryan?

Actually, Justin, can you test the following slight variant on the patch
in the bugzilla?

8<--------------------------------------------------------- 
>From 13cf7def9f2d802c3ea300833ba7f88705279cb1 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Wed, 27 Jul 2011 16:51:56 -0400
Subject: [PATCH] NFS: Fix spurious readdir cookie loop messages

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |   25 ++++++++++++++++---------
 include/linux/nfs_fs.h |    1 +
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57f578e..73993b9 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -134,7 +134,7 @@ const struct inode_operations nfs4_dir_inode_operations = {
 
 #endif /* CONFIG_NFS_V4 */
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
+static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
 {
 	struct nfs_open_dir_context *ctx;
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
@@ -143,9 +143,10 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
 		ctx->cred = get_rpccred(cred);
-	} else
-		ctx = ERR_PTR(-ENOMEM);
-	return ctx;
+		ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+		return ctx;
+	}
+	return  ERR_PTR(-ENOMEM);
 }
 
 static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
@@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
 	cred = rpc_lookup_cred();
 	if (IS_ERR(cred))
 		return PTR_ERR(cred);
-	ctx = alloc_nfs_open_dir_context(cred);
+	ctx = alloc_nfs_open_dir_context(inode, cred);
 	if (IS_ERR(ctx)) {
 		res = PTR_ERR(ctx);
 		goto out;
@@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 {
 	loff_t diff = desc->file->f_pos - desc->current_index;
 	unsigned int index;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	if (diff < 0)
 		goto out_eof;
@@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 	index = (unsigned int)diff;
 	*desc->dir_cookie = array->array[index].cookie;
 	desc->cache_entry_index = index;
-	ctx->duped = 0;
 	return 0;
 out_eof:
 	desc->eof = 1;
@@ -349,12 +348,18 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 	int i;
 	loff_t new_pos;
 	int status = -EAGAIN;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == *desc->dir_cookie) {
+			struct inode *dir = desc->file->f_path.dentry->d_inode;
+			struct nfs_open_dir_context *ctx = desc->file->private_data;
+
 			new_pos = desc->current_index + i;
-			if (new_pos < desc->file->f_pos) {
+			if (!nfs_verify_change_attribute(dir, ctx->cache_change_attribute)
+			    || (NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATTR)) {
+				ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+				ctx->duped = 0;
+			} else if (new_pos < desc->file->f_pos) {
 				ctx->dup_cookie = *desc->dir_cookie;
 				ctx->duped = 1;
 			}
@@ -805,6 +810,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct page	*page = NULL;
 	int		status;
 	struct inode *inode = desc->file->f_path.dentry->d_inode;
+	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
 			(unsigned long long)*desc->dir_cookie);
@@ -818,6 +824,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	desc->page_index = 0;
 	desc->last_cookie = *desc->dir_cookie;
 	desc->page = page;
+	ctx->duped = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, page, inode);
 	if (status < 0)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8b579be..f45d712 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,6 +99,7 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct rpc_cred *cred;
+	unsigned long cache_change_attribute;
 	__u64 dir_cookie;
 	__u64 dup_cookie;
 	int duped;
-- 
1.7.6



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:54                   ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nfs, linux-kernel, xfs, J. Bruce Fields, Justin Piszcz,
	Bryan Schumaker

On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: 
> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > > 
> > > 
> > > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > > 
> > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > > >>Currently I do not see any dupes, however I have a script that moves
> > > >>images out of the directory once an hour:
> > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > > >
> > > >Do you keep adding files to the directory while you move files out?
> > > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > > it around 5,000 pictures or less.
> > > 
> > > >What's the rate of additions/removals to the directory?
> > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > > 
> > > atom:/d1/motion# find cam1|wc
> > >    5215    5215  166853
> > > atom:/d1/motion# find cam2|wc
> > >    5069    5069  162181
> > > atom:/d1/motion# find cam3|wc
> > >    5594    5594  178981
> > > atom:/d1/motion#
> > 
> > This sounds a lot like xfs simply filling up the directory index slots
> > of files that you just moved out with new files, and nfs falsely
> > claiming that this is a problem.
> 
> Yep. There is an existing bugzilla report for this bug at
> 
>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> 
> I have a preliminary patch there that attempts to turn off the loop
> detection when the directory is seen to change, however that patch still
> appears to have a bug in it, and I haven't had time to figure out what
> is wrong yet.
> 
> Can you perhaps take a look, Bryan?

Actually, Justin, can you test the following slight variant on the patch
in the bugzilla?

8<--------------------------------------------------------- 
>From 13cf7def9f2d802c3ea300833ba7f88705279cb1 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Wed, 27 Jul 2011 16:51:56 -0400
Subject: [PATCH] NFS: Fix spurious readdir cookie loop messages

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |   25 ++++++++++++++++---------
 include/linux/nfs_fs.h |    1 +
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57f578e..73993b9 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -134,7 +134,7 @@ const struct inode_operations nfs4_dir_inode_operations = {
 
 #endif /* CONFIG_NFS_V4 */
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
+static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
 {
 	struct nfs_open_dir_context *ctx;
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
@@ -143,9 +143,10 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
 		ctx->cred = get_rpccred(cred);
-	} else
-		ctx = ERR_PTR(-ENOMEM);
-	return ctx;
+		ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+		return ctx;
+	}
+	return  ERR_PTR(-ENOMEM);
 }
 
 static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
@@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
 	cred = rpc_lookup_cred();
 	if (IS_ERR(cred))
 		return PTR_ERR(cred);
-	ctx = alloc_nfs_open_dir_context(cred);
+	ctx = alloc_nfs_open_dir_context(inode, cred);
 	if (IS_ERR(ctx)) {
 		res = PTR_ERR(ctx);
 		goto out;
@@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 {
 	loff_t diff = desc->file->f_pos - desc->current_index;
 	unsigned int index;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	if (diff < 0)
 		goto out_eof;
@@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 	index = (unsigned int)diff;
 	*desc->dir_cookie = array->array[index].cookie;
 	desc->cache_entry_index = index;
-	ctx->duped = 0;
 	return 0;
 out_eof:
 	desc->eof = 1;
@@ -349,12 +348,18 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 	int i;
 	loff_t new_pos;
 	int status = -EAGAIN;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == *desc->dir_cookie) {
+			struct inode *dir = desc->file->f_path.dentry->d_inode;
+			struct nfs_open_dir_context *ctx = desc->file->private_data;
+
 			new_pos = desc->current_index + i;
-			if (new_pos < desc->file->f_pos) {
+			if (!nfs_verify_change_attribute(dir, ctx->cache_change_attribute)
+			    || (NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATTR)) {
+				ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+				ctx->duped = 0;
+			} else if (new_pos < desc->file->f_pos) {
 				ctx->dup_cookie = *desc->dir_cookie;
 				ctx->duped = 1;
 			}
@@ -805,6 +810,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct page	*page = NULL;
 	int		status;
 	struct inode *inode = desc->file->f_path.dentry->d_inode;
+	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
 			(unsigned long long)*desc->dir_cookie);
@@ -818,6 +824,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	desc->page_index = 0;
 	desc->last_cookie = *desc->dir_cookie;
 	desc->page = page;
+	ctx->duped = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, page, inode);
 	if (status < 0)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8b579be..f45d712 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,6 +99,7 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct rpc_cred *cred;
+	unsigned long cache_change_attribute;
 	__u64 dir_cookie;
 	__u64 dup_cookie;
 	int duped;
-- 
1.7.6



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:54                   ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: 
> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > > 
> > > 
> > > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > > 
> > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > > >>Currently I do not see any dupes, however I have a script that moves
> > > >>images out of the directory once an hour:
> > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > > >
> > > >Do you keep adding files to the directory while you move files out?
> > > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > > it around 5,000 pictures or less.
> > > 
> > > >What's the rate of additions/removals to the directory?
> > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > > 
> > > atom:/d1/motion# find cam1|wc
> > >    5215    5215  166853
> > > atom:/d1/motion# find cam2|wc
> > >    5069    5069  162181
> > > atom:/d1/motion# find cam3|wc
> > >    5594    5594  178981
> > > atom:/d1/motion#
> > 
> > This sounds a lot like xfs simply filling up the directory index slots
> > of files that you just moved out with new files, and nfs falsely
> > claiming that this is a problem.
> 
> Yep. There is an existing bugzilla report for this bug at
> 
>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> 
> I have a preliminary patch there that attempts to turn off the loop
> detection when the directory is seen to change, however that patch still
> appears to have a bug in it, and I haven't had time to figure out what
> is wrong yet.
> 
> Can you perhaps take a look, Bryan?

Actually, Justin, can you test the following slight variant on the patch
in the bugzilla?

8<--------------------------------------------------------- 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:54                   ` Trond Myklebust
  (?)
@ 2011-07-27 20:56                     ` Trond Myklebust
  -1 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: 
> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: 
> > On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> > > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > > > 
> > > > 
> > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > > > 
> > > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > > > >>Currently I do not see any dupes, however I have a script that moves
> > > > >>images out of the directory once an hour:
> > > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > > > >
> > > > >Do you keep adding files to the directory while you move files out?
> > > > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > > > it around 5,000 pictures or less.
> > > > 
> > > > >What's the rate of additions/removals to the directory?
> > > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > > > 
> > > > atom:/d1/motion# find cam1|wc
> > > >    5215    5215  166853
> > > > atom:/d1/motion# find cam2|wc
> > > >    5069    5069  162181
> > > > atom:/d1/motion# find cam3|wc
> > > >    5594    5594  178981
> > > > atom:/d1/motion#
> > > 
> > > This sounds a lot like xfs simply filling up the directory index slots
> > > of files that you just moved out with new files, and nfs falsely
> > > claiming that this is a problem.
> > 
> > Yep. There is an existing bugzilla report for this bug at
> > 
> >    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > 
> > I have a preliminary patch there that attempts to turn off the loop
> > detection when the directory is seen to change, however that patch still
> > appears to have a bug in it, and I haven't had time to figure out what
> > is wrong yet.
> > 
> > Can you perhaps take a look, Bryan?
> 
> Actually, Justin, can you test the following slight variant on the patch
> in the bugzilla?

Doh! This one will actually compile....

> 8<--------------------------------------------------------- 
>From f6720ef169b706f2d85a89d82cc1f725632ac671 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Wed, 27 Jul 2011 16:55:16 -0400
Subject: [PATCH] NFS: Fix spurious readdir cookie loop messages

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |   25 ++++++++++++++++---------
 include/linux/nfs_fs.h |    1 +
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57f578e..188d5ae 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -134,7 +134,7 @@ const struct inode_operations nfs4_dir_inode_operations = {
 
 #endif /* CONFIG_NFS_V4 */
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
+static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
 {
 	struct nfs_open_dir_context *ctx;
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
@@ -143,9 +143,10 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
 		ctx->cred = get_rpccred(cred);
-	} else
-		ctx = ERR_PTR(-ENOMEM);
-	return ctx;
+		ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+		return ctx;
+	}
+	return  ERR_PTR(-ENOMEM);
 }
 
 static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
@@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
 	cred = rpc_lookup_cred();
 	if (IS_ERR(cred))
 		return PTR_ERR(cred);
-	ctx = alloc_nfs_open_dir_context(cred);
+	ctx = alloc_nfs_open_dir_context(inode, cred);
 	if (IS_ERR(ctx)) {
 		res = PTR_ERR(ctx);
 		goto out;
@@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 {
 	loff_t diff = desc->file->f_pos - desc->current_index;
 	unsigned int index;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	if (diff < 0)
 		goto out_eof;
@@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 	index = (unsigned int)diff;
 	*desc->dir_cookie = array->array[index].cookie;
 	desc->cache_entry_index = index;
-	ctx->duped = 0;
 	return 0;
 out_eof:
 	desc->eof = 1;
@@ -349,12 +348,18 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 	int i;
 	loff_t new_pos;
 	int status = -EAGAIN;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == *desc->dir_cookie) {
+			struct inode *dir = desc->file->f_path.dentry->d_inode;
+			struct nfs_open_dir_context *ctx = desc->file->private_data;
+
 			new_pos = desc->current_index + i;
-			if (new_pos < desc->file->f_pos) {
+			if (!nfs_verify_change_attribute(dir, ctx->cache_change_attribute)
+			    || (NFS_I(dir)->cache_validity & NFS_INO_INVALID_ATTR)) {
+				ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+				ctx->duped = 0;
+			} else if (new_pos < desc->file->f_pos) {
 				ctx->dup_cookie = *desc->dir_cookie;
 				ctx->duped = 1;
 			}
@@ -805,6 +810,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct page	*page = NULL;
 	int		status;
 	struct inode *inode = desc->file->f_path.dentry->d_inode;
+	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
 			(unsigned long long)*desc->dir_cookie);
@@ -818,6 +824,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	desc->page_index = 0;
 	desc->last_cookie = *desc->dir_cookie;
 	desc->page = page;
+	ctx->duped = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, page, inode);
 	if (status < 0)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8b579be..f45d712 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,6 +99,7 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct rpc_cred *cred;
+	unsigned long cache_change_attribute;
 	__u64 dir_cookie;
 	__u64 dup_cookie;
 	int duped;
-- 
1.7.6



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:56                     ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nfs, linux-kernel, xfs, J. Bruce Fields, Justin Piszcz,
	Bryan Schumaker

On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: 
> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: 
> > On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> > > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > > > 
> > > > 
> > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > > > 
> > > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > > > >>Currently I do not see any dupes, however I have a script that moves
> > > > >>images out of the directory once an hour:
> > > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > > > >
> > > > >Do you keep adding files to the directory while you move files out?
> > > > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > > > it around 5,000 pictures or less.
> > > > 
> > > > >What's the rate of additions/removals to the directory?
> > > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > > > 
> > > > atom:/d1/motion# find cam1|wc
> > > >    5215    5215  166853
> > > > atom:/d1/motion# find cam2|wc
> > > >    5069    5069  162181
> > > > atom:/d1/motion# find cam3|wc
> > > >    5594    5594  178981
> > > > atom:/d1/motion#
> > > 
> > > This sounds a lot like xfs simply filling up the directory index slots
> > > of files that you just moved out with new files, and nfs falsely
> > > claiming that this is a problem.
> > 
> > Yep. There is an existing bugzilla report for this bug at
> > 
> >    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > 
> > I have a preliminary patch there that attempts to turn off the loop
> > detection when the directory is seen to change, however that patch still
> > appears to have a bug in it, and I haven't had time to figure out what
> > is wrong yet.
> > 
> > Can you perhaps take a look, Bryan?
> 
> Actually, Justin, can you test the following slight variant on the patch
> in the bugzilla?

Doh! This one will actually compile....

> 8<--------------------------------------------------------- 
>From f6720ef169b706f2d85a89d82cc1f725632ac671 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Wed, 27 Jul 2011 16:55:16 -0400
Subject: [PATCH] NFS: Fix spurious readdir cookie loop messages

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |   25 ++++++++++++++++---------
 include/linux/nfs_fs.h |    1 +
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57f578e..188d5ae 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -134,7 +134,7 @@ const struct inode_operations nfs4_dir_inode_operations = {
 
 #endif /* CONFIG_NFS_V4 */
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
+static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
 {
 	struct nfs_open_dir_context *ctx;
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
@@ -143,9 +143,10 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
 		ctx->cred = get_rpccred(cred);
-	} else
-		ctx = ERR_PTR(-ENOMEM);
-	return ctx;
+		ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+		return ctx;
+	}
+	return  ERR_PTR(-ENOMEM);
 }
 
 static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
@@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
 	cred = rpc_lookup_cred();
 	if (IS_ERR(cred))
 		return PTR_ERR(cred);
-	ctx = alloc_nfs_open_dir_context(cred);
+	ctx = alloc_nfs_open_dir_context(inode, cred);
 	if (IS_ERR(ctx)) {
 		res = PTR_ERR(ctx);
 		goto out;
@@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 {
 	loff_t diff = desc->file->f_pos - desc->current_index;
 	unsigned int index;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	if (diff < 0)
 		goto out_eof;
@@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 	index = (unsigned int)diff;
 	*desc->dir_cookie = array->array[index].cookie;
 	desc->cache_entry_index = index;
-	ctx->duped = 0;
 	return 0;
 out_eof:
 	desc->eof = 1;
@@ -349,12 +348,18 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 	int i;
 	loff_t new_pos;
 	int status = -EAGAIN;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == *desc->dir_cookie) {
+			struct inode *dir = desc->file->f_path.dentry->d_inode;
+			struct nfs_open_dir_context *ctx = desc->file->private_data;
+
 			new_pos = desc->current_index + i;
-			if (new_pos < desc->file->f_pos) {
+			if (!nfs_verify_change_attribute(dir, ctx->cache_change_attribute)
+			    || (NFS_I(dir)->cache_validity & NFS_INO_INVALID_ATTR)) {
+				ctx->cache_change_attribute = nfs_save_change_attribute(dir);
+				ctx->duped = 0;
+			} else if (new_pos < desc->file->f_pos) {
 				ctx->dup_cookie = *desc->dir_cookie;
 				ctx->duped = 1;
 			}
@@ -805,6 +810,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct page	*page = NULL;
 	int		status;
 	struct inode *inode = desc->file->f_path.dentry->d_inode;
+	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
 			(unsigned long long)*desc->dir_cookie);
@@ -818,6 +824,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	desc->page_index = 0;
 	desc->last_cookie = *desc->dir_cookie;
 	desc->page = page;
+	ctx->duped = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, page, inode);
 	if (status < 0)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8b579be..f45d712 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,6 +99,7 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct rpc_cred *cred;
+	unsigned long cache_change_attribute;
 	__u64 dir_cookie;
 	__u64 dup_cookie;
 	int duped;
-- 
1.7.6



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 20:56                     ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: 
> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: 
> > On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: 
> > > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > > > 
> > > > 
> > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > > > 
> > > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > > > >>Currently I do not see any dupes, however I have a script that moves
> > > > >>images out of the directory once an hour:
> > > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > > > >
> > > > >Do you keep adding files to the directory while you move files out?
> > > > Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > > > it around 5,000 pictures or less.
> > > > 
> > > > >What's the rate of additions/removals to the directory?
> > > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > > > 
> > > > atom:/d1/motion# find cam1|wc
> > > >    5215    5215  166853
> > > > atom:/d1/motion# find cam2|wc
> > > >    5069    5069  162181
> > > > atom:/d1/motion# find cam3|wc
> > > >    5594    5594  178981
> > > > atom:/d1/motion#
> > > 
> > > This sounds a lot like xfs simply filling up the directory index slots
> > > of files that you just moved out with new files, and nfs falsely
> > > claiming that this is a problem.
> > 
> > Yep. There is an existing bugzilla report for this bug at
> > 
> >    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > 
> > I have a preliminary patch there that attempts to turn off the loop
> > detection when the directory is seen to change, however that patch still
> > appears to have a bug in it, and I haven't had time to figure out what
> > is wrong yet.
> > 
> > Can you perhaps take a look, Bryan?
> 
> Actually, Justin, can you test the following slight variant on the patch
> in the bugzilla?

Doh! This one will actually compile....

> 8<--------------------------------------------------------- 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:47                       ` Christoph Hellwig
@ 2011-07-27 21:21                         ` Rüdiger Meier
  -1 siblings, 0 replies; 69+ messages in thread
From: Rüdiger Meier @ 2011-07-27 21:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wednesday 27 July 2011, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 10:26:55PM +0200, R?diger Meier wrote:
> > At the time I've started this thread
> > http://comments.gmane.org/gmane.linux.nfs/40863
> > I had the feeling that the readdir cache changings in 2.6.37 have
> > something to do with these loop problems.
> >
> > After that thread I've accepted that's a general problem with
> > ext4/dirindex and nfs but seeing it again on xfs with just 5000
> > files I'm in doubt again.
>
> Two separate issues. [...]

Yup, I didn't wanted to say that I'm in doubt about the general 
ext4/dirindex problem but I'am still in doubt about the complete 
innocence of readdir cache.

I guess I've run into both issues at that time. I remember that I 
couldn't easily create such "broken" dir from scratch but my users 
managed it to have dozens of them, often just about 30000 files.
Somehow it seemed to be important that the dirs were growing in a 
natural way.

However no probs again since with xfs and ext4 without dirindex. But 
still the feeling that upgrading to 2.6.37 was also a part of the 
problem.

cu,
Rudi


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 21:21                         ` Rüdiger Meier
  0 siblings, 0 replies; 69+ messages in thread
From: Rüdiger Meier @ 2011-07-27 21:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nfs, linux-kernel, xfs, J. Bruce Fields, Justin Piszcz,
	Bryan Schumaker

On Wednesday 27 July 2011, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 10:26:55PM +0200, R?diger Meier wrote:
> > At the time I've started this thread
> > http://comments.gmane.org/gmane.linux.nfs/40863
> > I had the feeling that the readdir cache changings in 2.6.37 have
> > something to do with these loop problems.
> >
> > After that thread I've accepted that's a general problem with
> > ext4/dirindex and nfs but seeing it again on xfs with just 5000
> > files I'm in doubt again.
>
> Two separate issues. [...]

Yup, I didn't wanted to say that I'm in doubt about the general 
ext4/dirindex problem but I'am still in doubt about the complete 
innocence of readdir cache.

I guess I've run into both issues at that time. I remember that I 
couldn't easily create such "broken" dir from scratch but my users 
managed it to have dozens of them, often just about 30000 files.
Somehow it seemed to be important that the dirs were growing in a 
natural way.

However no probs again since with xfs and ext4 without dirindex. But 
still the feeling that upgrading to 2.6.37 was also a part of the 
problem.

cu,
Rudi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 20:56                     ` Trond Myklebust
  (?)
@ 2011-07-27 21:24                       ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 21:24 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs



On Wed, 27 Jul 2011, Trond Myklebust wrote:

> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>>>>
>>>>>
>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>>>>
>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>>>>> Currently I do not see any dupes, however I have a script that moves
>>>>>>> images out of the directory once an hour:
>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>>>>
>>>>>> Do you keep adding files to the directory while you move files out?
>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>>>>> it around 5,000 pictures or less.
>>>>>
>>>>>> What's the rate of additions/removals to the directory?
>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>>>>
>>>>> atom:/d1/motion# find cam1|wc
>>>>>    5215    5215  166853
>>>>> atom:/d1/motion# find cam2|wc
>>>>>    5069    5069  162181
>>>>> atom:/d1/motion# find cam3|wc
>>>>>    5594    5594  178981
>>>>> atom:/d1/motion#
>>>>
>>>> This sounds a lot like xfs simply filling up the directory index slots
>>>> of files that you just moved out with new files, and nfs falsely
>>>> claiming that this is a problem.
>>>
>>> Yep. There is an existing bugzilla report for this bug at
>>>
>>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
>>>
>>> I have a preliminary patch there that attempts to turn off the loop
>>> detection when the directory is seen to change, however that patch still
>>> appears to have a bug in it, and I haven't had time to figure out what
>>> is wrong yet.
>>>
>>> Can you perhaps take a look, Bryan?
>>
>> Actually, Justin, can you test the following slight variant on the patch
>> in the bugzilla?
>
> Doh! This one will actually compile....

Hi,

Should I try 3.0 first or retry 2.6.38 w/ this patch?

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 21:24                       ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 21:24 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Bryan Schumaker



On Wed, 27 Jul 2011, Trond Myklebust wrote:

> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>>>>
>>>>>
>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>>>>
>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>>>>> Currently I do not see any dupes, however I have a script that moves
>>>>>>> images out of the directory once an hour:
>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>>>>
>>>>>> Do you keep adding files to the directory while you move files out?
>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>>>>> it around 5,000 pictures or less.
>>>>>
>>>>>> What's the rate of additions/removals to the directory?
>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>>>>
>>>>> atom:/d1/motion# find cam1|wc
>>>>>    5215    5215  166853
>>>>> atom:/d1/motion# find cam2|wc
>>>>>    5069    5069  162181
>>>>> atom:/d1/motion# find cam3|wc
>>>>>    5594    5594  178981
>>>>> atom:/d1/motion#
>>>>
>>>> This sounds a lot like xfs simply filling up the directory index slots
>>>> of files that you just moved out with new files, and nfs falsely
>>>> claiming that this is a problem.
>>>
>>> Yep. There is an existing bugzilla report for this bug at
>>>
>>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
>>>
>>> I have a preliminary patch there that attempts to turn off the loop
>>> detection when the directory is seen to change, however that patch still
>>> appears to have a bug in it, and I haven't had time to figure out what
>>> is wrong yet.
>>>
>>> Can you perhaps take a look, Bryan?
>>
>> Actually, Justin, can you test the following slight variant on the patch
>> in the bugzilla?
>
> Doh! This one will actually compile....

Hi,

Should I try 3.0 first or retry 2.6.38 w/ this patch?

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 21:24                       ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 21:24 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs



On Wed, 27 Jul 2011, Trond Myklebust wrote:

> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>>>>
>>>>>
>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>>>>
>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>>>>> Currently I do not see any dupes, however I have a script that moves
>>>>>>> images out of the directory once an hour:
>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>>>>
>>>>>> Do you keep adding files to the directory while you move files out?
>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>>>>> it around 5,000 pictures or less.
>>>>>
>>>>>> What's the rate of additions/removals to the directory?
>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>>>>
>>>>> atom:/d1/motion# find cam1|wc
>>>>>    5215    5215  166853
>>>>> atom:/d1/motion# find cam2|wc
>>>>>    5069    5069  162181
>>>>> atom:/d1/motion# find cam3|wc
>>>>>    5594    5594  178981
>>>>> atom:/d1/motion#
>>>>
>>>> This sounds a lot like xfs simply filling up the directory index slots
>>>> of files that you just moved out with new files, and nfs falsely
>>>> claiming that this is a problem.
>>>
>>> Yep. There is an existing bugzilla report for this bug at
>>>
>>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
>>>
>>> I have a preliminary patch there that attempts to turn off the loop
>>> detection when the directory is seen to change, however that patch still
>>> appears to have a bug in it, and I haven't had time to figure out what
>>> is wrong yet.
>>>
>>> Can you perhaps take a look, Bryan?
>>
>> Actually, Justin, can you test the following slight variant on the patch
>> in the bugzilla?
>
> Doh! This one will actually compile....

Hi,

Should I try 3.0 first or retry 2.6.38 w/ this patch?

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 21:24                       ` Justin Piszcz
  (?)
@ 2011-07-27 22:44                         ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 22:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs



On Wed, 27 Jul 2011, Justin Piszcz wrote:

> 
> 
> On Wed, 27 Jul 2011, Trond Myklebust wrote:
> 
> > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> >>>>>
> >>>>>
> >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> >>>>>>> Currently I do not see any dupes, however I have a script that moves
> >>>>>>> images out of the directory once an hour:
> >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> >>>>>>
> >>>>>> Do you keep adding files to the directory while you move files out?
> >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> >>>>> it around 5,000 pictures or less.
> >>>>>
> >>>>>> What's the rate of additions/removals to the directory?
> >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> >>>>>
> >>>>> atom:/d1/motion# find cam1|wc
> >>>>>    5215    5215  166853
> >>>>> atom:/d1/motion# find cam2|wc
> >>>>>    5069    5069  162181
> >>>>> atom:/d1/motion# find cam3|wc
> >>>>>    5594    5594  178981
> >>>>> atom:/d1/motion#
> >>>>
> >>>> This sounds a lot like xfs simply filling up the directory index slots
> >>>> of files that you just moved out with new files, and nfs falsely
> >>>> claiming that this is a problem.
> >>>
> >>> Yep. There is an existing bugzilla report for this bug at
> >>>
> >>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> >>>
> >>> I have a preliminary patch there that attempts to turn off the loop
> >>> detection when the directory is seen to change, however that patch still
> >>> appears to have a bug in it, and I haven't had time to figure out what
> >>> is wrong yet.
> >>>
> >>> Can you perhaps take a look, Bryan?
> >>
> >> Actually, Justin, can you test the following slight variant on the patch
> >> in the bugzilla?
> >
> > Doh! This one will actually compile....
> 
> Hi,
> 
> Should I try 3.0 first or retry 2.6.38 w/ this patch?
> 
> Justin.
> 
>

I'll give 3.0 a go first.


Justin.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 22:44                         ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 22:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Bryan Schumaker



On Wed, 27 Jul 2011, Justin Piszcz wrote:

> 
> 
> On Wed, 27 Jul 2011, Trond Myklebust wrote:
> 
> > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> >>>>>
> >>>>>
> >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> >>>>>>> Currently I do not see any dupes, however I have a script that moves
> >>>>>>> images out of the directory once an hour:
> >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> >>>>>>
> >>>>>> Do you keep adding files to the directory while you move files out?
> >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> >>>>> it around 5,000 pictures or less.
> >>>>>
> >>>>>> What's the rate of additions/removals to the directory?
> >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> >>>>>
> >>>>> atom:/d1/motion# find cam1|wc
> >>>>>    5215    5215  166853
> >>>>> atom:/d1/motion# find cam2|wc
> >>>>>    5069    5069  162181
> >>>>> atom:/d1/motion# find cam3|wc
> >>>>>    5594    5594  178981
> >>>>> atom:/d1/motion#
> >>>>
> >>>> This sounds a lot like xfs simply filling up the directory index slots
> >>>> of files that you just moved out with new files, and nfs falsely
> >>>> claiming that this is a problem.
> >>>
> >>> Yep. There is an existing bugzilla report for this bug at
> >>>
> >>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> >>>
> >>> I have a preliminary patch there that attempts to turn off the loop
> >>> detection when the directory is seen to change, however that patch still
> >>> appears to have a bug in it, and I haven't had time to figure out what
> >>> is wrong yet.
> >>>
> >>> Can you perhaps take a look, Bryan?
> >>
> >> Actually, Justin, can you test the following slight variant on the patch
> >> in the bugzilla?
> >
> > Doh! This one will actually compile....
> 
> Hi,
> 
> Should I try 3.0 first or retry 2.6.38 w/ this patch?
> 
> Justin.
> 
>

I'll give 3.0 a go first.


Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-27 22:44                         ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-27 22:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs



On Wed, 27 Jul 2011, Justin Piszcz wrote:

> 
> 
> On Wed, 27 Jul 2011, Trond Myklebust wrote:
> 
> > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> >>>>>
> >>>>>
> >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> >>>>>>> Currently I do not see any dupes, however I have a script that moves
> >>>>>>> images out of the directory once an hour:
> >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> >>>>>>
> >>>>>> Do you keep adding files to the directory while you move files out?
> >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> >>>>> it around 5,000 pictures or less.
> >>>>>
> >>>>>> What's the rate of additions/removals to the directory?
> >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> >>>>>
> >>>>> atom:/d1/motion# find cam1|wc
> >>>>>    5215    5215  166853
> >>>>> atom:/d1/motion# find cam2|wc
> >>>>>    5069    5069  162181
> >>>>> atom:/d1/motion# find cam3|wc
> >>>>>    5594    5594  178981
> >>>>> atom:/d1/motion#
> >>>>
> >>>> This sounds a lot like xfs simply filling up the directory index slots
> >>>> of files that you just moved out with new files, and nfs falsely
> >>>> claiming that this is a problem.
> >>>
> >>> Yep. There is an existing bugzilla report for this bug at
> >>>
> >>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> >>>
> >>> I have a preliminary patch there that attempts to turn off the loop
> >>> detection when the directory is seen to change, however that patch still
> >>> appears to have a bug in it, and I haven't had time to figure out what
> >>> is wrong yet.
> >>>
> >>> Can you perhaps take a look, Bryan?
> >>
> >> Actually, Justin, can you test the following slight variant on the patch
> >> in the bugzilla?
> >
> > Doh! This one will actually compile....
> 
> Hi,
> 
> Should I try 3.0 first or retry 2.6.38 w/ this patch?
> 
> Justin.
> 
>

I'll give 3.0 a go first.


Justin.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-27 22:44                         ` Justin Piszcz
  (?)
@ 2011-07-28 20:48                           ` Trond Myklebust
  -1 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-28 20:48 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: 
> 
> On Wed, 27 Jul 2011, Justin Piszcz wrote:
> 
> > 
> > 
> > On Wed, 27 Jul 2011, Trond Myklebust wrote:
> > 
> > > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> > >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> > >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> > >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > >>>>>
> > >>>>>
> > >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > >>>>>
> > >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > >>>>>>> Currently I do not see any dupes, however I have a script that moves
> > >>>>>>> images out of the directory once an hour:
> > >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > >>>>>>
> > >>>>>> Do you keep adding files to the directory while you move files out?
> > >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > >>>>> it around 5,000 pictures or less.
> > >>>>>
> > >>>>>> What's the rate of additions/removals to the directory?
> > >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > >>>>>
> > >>>>> atom:/d1/motion# find cam1|wc
> > >>>>>    5215    5215  166853
> > >>>>> atom:/d1/motion# find cam2|wc
> > >>>>>    5069    5069  162181
> > >>>>> atom:/d1/motion# find cam3|wc
> > >>>>>    5594    5594  178981
> > >>>>> atom:/d1/motion#
> > >>>>
> > >>>> This sounds a lot like xfs simply filling up the directory index slots
> > >>>> of files that you just moved out with new files, and nfs falsely
> > >>>> claiming that this is a problem.
> > >>>
> > >>> Yep. There is an existing bugzilla report for this bug at
> > >>>
> > >>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > >>>
> > >>> I have a preliminary patch there that attempts to turn off the loop
> > >>> detection when the directory is seen to change, however that patch still
> > >>> appears to have a bug in it, and I haven't had time to figure out what
> > >>> is wrong yet.
> > >>>
> > >>> Can you perhaps take a look, Bryan?
> > >>
> > >> Actually, Justin, can you test the following slight variant on the patch
> > >> in the bugzilla?
> > >
> > > Doh! This one will actually compile....
> > 
> > Hi,
> > 
> > Should I try 3.0 first or retry 2.6.38 w/ this patch?
> > 
> > Justin.
> > 
> >
> 
> I'll give 3.0 a go first.

I had Bryan do some more tests, which revealed a couple more issues. The
attached patch should fix those, and has resisted everything we've
thrown at it so far. It should apply to 2.6.39 and newer.

Cheers
  Trond
8<----------------------------------------------------------------------- 
>From 75c0387540737a6663338d4ec0538bd6fb724173 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 28 Jul 2011 16:34:33 -0400
Subject: [PATCH v3] NFS: Fix spurious readdir cookie loop messages

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |   56 ++++++++++++++++++++++++++++-------------------
 include/linux/nfs_fs.h |    3 +-
 2 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57f578e..d23108b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -134,18 +134,19 @@ const struct inode_operations nfs4_dir_inode_operations = {
 
 #endif /* CONFIG_NFS_V4 */
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
+static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
 {
 	struct nfs_open_dir_context *ctx;
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 	if (ctx != NULL) {
 		ctx->duped = 0;
+		ctx->attr_gencount = NFS_I(dir)->attr_gencount;
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
 		ctx->cred = get_rpccred(cred);
-	} else
-		ctx = ERR_PTR(-ENOMEM);
-	return ctx;
+		return ctx;
+	}
+	return  ERR_PTR(-ENOMEM);
 }
 
 static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
@@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
 	cred = rpc_lookup_cred();
 	if (IS_ERR(cred))
 		return PTR_ERR(cred);
-	ctx = alloc_nfs_open_dir_context(cred);
+	ctx = alloc_nfs_open_dir_context(inode, cred);
 	if (IS_ERR(ctx)) {
 		res = PTR_ERR(ctx);
 		goto out;
@@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 {
 	loff_t diff = desc->file->f_pos - desc->current_index;
 	unsigned int index;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	if (diff < 0)
 		goto out_eof;
@@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 	index = (unsigned int)diff;
 	*desc->dir_cookie = array->array[index].cookie;
 	desc->cache_entry_index = index;
-	ctx->duped = 0;
 	return 0;
 out_eof:
 	desc->eof = 1;
@@ -349,14 +348,33 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 	int i;
 	loff_t new_pos;
 	int status = -EAGAIN;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == *desc->dir_cookie) {
+			struct nfs_inode *nfsi = NFS_I(desc->file->f_path.dentry->d_inode);
+			struct nfs_open_dir_context *ctx = desc->file->private_data;
+
 			new_pos = desc->current_index + i;
-			if (new_pos < desc->file->f_pos) {
+			if (ctx->attr_gencount != nfsi->attr_gencount
+			    || (nfsi->cache_validity & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA))) {
+				ctx->duped = 0;
+				ctx->attr_gencount = nfsi->attr_gencount;
+			} else if (new_pos < desc->file->f_pos) {
+				if (ctx->duped > 0
+				    && ctx->dup_cookie == *desc->dir_cookie) {
+					if (printk_ratelimit()) {
+						pr_notice("NFS: directory %s/%s contains a readdir loop."
+								"Please contact your server vendor.  "
+								"Offending cookie: %llu\n",
+								desc->file->f_dentry->d_parent->d_name.name,
+								desc->file->f_dentry->d_name.name,
+								*desc->dir_cookie);
+					}
+					status = -ELOOP;
+					goto out;
+				}
 				ctx->dup_cookie = *desc->dir_cookie;
-				ctx->duped = 1;
+				ctx->duped = -1;
 			}
 			desc->file->f_pos = new_pos;
 			desc->cache_entry_index = i;
@@ -368,6 +386,7 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 		if (*desc->dir_cookie == array->last_cookie)
 			desc->eof = 1;
 	}
+out:
 	return status;
 }
 
@@ -740,19 +759,6 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct nfs_cache_array *array = NULL;
 	struct nfs_open_dir_context *ctx = file->private_data;
 
-	if (ctx->duped != 0 && ctx->dup_cookie == *desc->dir_cookie) {
-		if (printk_ratelimit()) {
-			pr_notice("NFS: directory %s/%s contains a readdir loop.  "
-				"Please contact your server vendor.  "
-				"Offending cookie: %llu\n",
-				file->f_dentry->d_parent->d_name.name,
-				file->f_dentry->d_name.name,
-				*desc->dir_cookie);
-		}
-		res = -ELOOP;
-		goto out;
-	}
-
 	array = nfs_readdir_get_array(desc->page);
 	if (IS_ERR(array)) {
 		res = PTR_ERR(array);
@@ -774,6 +780,8 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
 			*desc->dir_cookie = array->array[i+1].cookie;
 		else
 			*desc->dir_cookie = array->last_cookie;
+		if (ctx->duped != 0)
+			ctx->duped = 1;
 	}
 	if (array->eof_index >= 0)
 		desc->eof = 1;
@@ -805,6 +813,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct page	*page = NULL;
 	int		status;
 	struct inode *inode = desc->file->f_path.dentry->d_inode;
+	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
 			(unsigned long long)*desc->dir_cookie);
@@ -818,6 +827,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	desc->page_index = 0;
 	desc->last_cookie = *desc->dir_cookie;
 	desc->page = page;
+	ctx->duped = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, page, inode);
 	if (status < 0)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8b579be..b96fb99 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,9 +99,10 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct rpc_cred *cred;
+	unsigned long attr_gencount;
 	__u64 dir_cookie;
 	__u64 dup_cookie;
-	int duped;
+	signed char duped;
 };
 
 /*
-- 
1.7.6


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-28 20:48                           ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-28 20:48 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Bryan Schumaker

On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: 
> 
> On Wed, 27 Jul 2011, Justin Piszcz wrote:
> 
> > 
> > 
> > On Wed, 27 Jul 2011, Trond Myklebust wrote:
> > 
> > > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> > >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> > >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> > >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > >>>>>
> > >>>>>
> > >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > >>>>>
> > >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > >>>>>>> Currently I do not see any dupes, however I have a script that moves
> > >>>>>>> images out of the directory once an hour:
> > >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > >>>>>>
> > >>>>>> Do you keep adding files to the directory while you move files out?
> > >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > >>>>> it around 5,000 pictures or less.
> > >>>>>
> > >>>>>> What's the rate of additions/removals to the directory?
> > >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > >>>>>
> > >>>>> atom:/d1/motion# find cam1|wc
> > >>>>>    5215    5215  166853
> > >>>>> atom:/d1/motion# find cam2|wc
> > >>>>>    5069    5069  162181
> > >>>>> atom:/d1/motion# find cam3|wc
> > >>>>>    5594    5594  178981
> > >>>>> atom:/d1/motion#
> > >>>>
> > >>>> This sounds a lot like xfs simply filling up the directory index slots
> > >>>> of files that you just moved out with new files, and nfs falsely
> > >>>> claiming that this is a problem.
> > >>>
> > >>> Yep. There is an existing bugzilla report for this bug at
> > >>>
> > >>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > >>>
> > >>> I have a preliminary patch there that attempts to turn off the loop
> > >>> detection when the directory is seen to change, however that patch still
> > >>> appears to have a bug in it, and I haven't had time to figure out what
> > >>> is wrong yet.
> > >>>
> > >>> Can you perhaps take a look, Bryan?
> > >>
> > >> Actually, Justin, can you test the following slight variant on the patch
> > >> in the bugzilla?
> > >
> > > Doh! This one will actually compile....
> > 
> > Hi,
> > 
> > Should I try 3.0 first or retry 2.6.38 w/ this patch?
> > 
> > Justin.
> > 
> >
> 
> I'll give 3.0 a go first.

I had Bryan do some more tests, which revealed a couple more issues. The
attached patch should fix those, and has resisted everything we've
thrown at it so far. It should apply to 2.6.39 and newer.

Cheers
  Trond
8<----------------------------------------------------------------------- 
>From 75c0387540737a6663338d4ec0538bd6fb724173 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 28 Jul 2011 16:34:33 -0400
Subject: [PATCH v3] NFS: Fix spurious readdir cookie loop messages

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |   56 ++++++++++++++++++++++++++++-------------------
 include/linux/nfs_fs.h |    3 +-
 2 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57f578e..d23108b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -134,18 +134,19 @@ const struct inode_operations nfs4_dir_inode_operations = {
 
 #endif /* CONFIG_NFS_V4 */
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
+static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
 {
 	struct nfs_open_dir_context *ctx;
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 	if (ctx != NULL) {
 		ctx->duped = 0;
+		ctx->attr_gencount = NFS_I(dir)->attr_gencount;
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
 		ctx->cred = get_rpccred(cred);
-	} else
-		ctx = ERR_PTR(-ENOMEM);
-	return ctx;
+		return ctx;
+	}
+	return  ERR_PTR(-ENOMEM);
 }
 
 static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
@@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
 	cred = rpc_lookup_cred();
 	if (IS_ERR(cred))
 		return PTR_ERR(cred);
-	ctx = alloc_nfs_open_dir_context(cred);
+	ctx = alloc_nfs_open_dir_context(inode, cred);
 	if (IS_ERR(ctx)) {
 		res = PTR_ERR(ctx);
 		goto out;
@@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 {
 	loff_t diff = desc->file->f_pos - desc->current_index;
 	unsigned int index;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	if (diff < 0)
 		goto out_eof;
@@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
 	index = (unsigned int)diff;
 	*desc->dir_cookie = array->array[index].cookie;
 	desc->cache_entry_index = index;
-	ctx->duped = 0;
 	return 0;
 out_eof:
 	desc->eof = 1;
@@ -349,14 +348,33 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 	int i;
 	loff_t new_pos;
 	int status = -EAGAIN;
-	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == *desc->dir_cookie) {
+			struct nfs_inode *nfsi = NFS_I(desc->file->f_path.dentry->d_inode);
+			struct nfs_open_dir_context *ctx = desc->file->private_data;
+
 			new_pos = desc->current_index + i;
-			if (new_pos < desc->file->f_pos) {
+			if (ctx->attr_gencount != nfsi->attr_gencount
+			    || (nfsi->cache_validity & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA))) {
+				ctx->duped = 0;
+				ctx->attr_gencount = nfsi->attr_gencount;
+			} else if (new_pos < desc->file->f_pos) {
+				if (ctx->duped > 0
+				    && ctx->dup_cookie == *desc->dir_cookie) {
+					if (printk_ratelimit()) {
+						pr_notice("NFS: directory %s/%s contains a readdir loop."
+								"Please contact your server vendor.  "
+								"Offending cookie: %llu\n",
+								desc->file->f_dentry->d_parent->d_name.name,
+								desc->file->f_dentry->d_name.name,
+								*desc->dir_cookie);
+					}
+					status = -ELOOP;
+					goto out;
+				}
 				ctx->dup_cookie = *desc->dir_cookie;
-				ctx->duped = 1;
+				ctx->duped = -1;
 			}
 			desc->file->f_pos = new_pos;
 			desc->cache_entry_index = i;
@@ -368,6 +386,7 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 		if (*desc->dir_cookie == array->last_cookie)
 			desc->eof = 1;
 	}
+out:
 	return status;
 }
 
@@ -740,19 +759,6 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct nfs_cache_array *array = NULL;
 	struct nfs_open_dir_context *ctx = file->private_data;
 
-	if (ctx->duped != 0 && ctx->dup_cookie == *desc->dir_cookie) {
-		if (printk_ratelimit()) {
-			pr_notice("NFS: directory %s/%s contains a readdir loop.  "
-				"Please contact your server vendor.  "
-				"Offending cookie: %llu\n",
-				file->f_dentry->d_parent->d_name.name,
-				file->f_dentry->d_name.name,
-				*desc->dir_cookie);
-		}
-		res = -ELOOP;
-		goto out;
-	}
-
 	array = nfs_readdir_get_array(desc->page);
 	if (IS_ERR(array)) {
 		res = PTR_ERR(array);
@@ -774,6 +780,8 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
 			*desc->dir_cookie = array->array[i+1].cookie;
 		else
 			*desc->dir_cookie = array->last_cookie;
+		if (ctx->duped != 0)
+			ctx->duped = 1;
 	}
 	if (array->eof_index >= 0)
 		desc->eof = 1;
@@ -805,6 +813,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	struct page	*page = NULL;
 	int		status;
 	struct inode *inode = desc->file->f_path.dentry->d_inode;
+	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
 	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
 			(unsigned long long)*desc->dir_cookie);
@@ -818,6 +827,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
 	desc->page_index = 0;
 	desc->last_cookie = *desc->dir_cookie;
 	desc->page = page;
+	ctx->duped = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, page, inode);
 	if (status < 0)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8b579be..b96fb99 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,9 +99,10 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct rpc_cred *cred;
+	unsigned long attr_gencount;
 	__u64 dir_cookie;
 	__u64 dup_cookie;
-	int duped;
+	signed char duped;
 };
 
 /*
-- 
1.7.6


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-28 20:48                           ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-28 20:48 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: 
> 
> On Wed, 27 Jul 2011, Justin Piszcz wrote:
> 
> > 
> > 
> > On Wed, 27 Jul 2011, Trond Myklebust wrote:
> > 
> > > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> > >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> > >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> > >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > >>>>>
> > >>>>>
> > >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > >>>>>
> > >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > >>>>>>> Currently I do not see any dupes, however I have a script that moves
> > >>>>>>> images out of the directory once an hour:
> > >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > >>>>>>
> > >>>>>> Do you keep adding files to the directory while you move files out?
> > >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > >>>>> it around 5,000 pictures or less.
> > >>>>>
> > >>>>>> What's the rate of additions/removals to the directory?
> > >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > >>>>>
> > >>>>> atom:/d1/motion# find cam1|wc
> > >>>>>    5215    5215  166853
> > >>>>> atom:/d1/motion# find cam2|wc
> > >>>>>    5069    5069  162181
> > >>>>> atom:/d1/motion# find cam3|wc
> > >>>>>    5594    5594  178981
> > >>>>> atom:/d1/motion#
> > >>>>
> > >>>> This sounds a lot like xfs simply filling up the directory index slots
> > >>>> of files that you just moved out with new files, and nfs falsely
> > >>>> claiming that this is a problem.
> > >>>
> > >>> Yep. There is an existing bugzilla report for this bug at
> > >>>
> > >>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > >>>
> > >>> I have a preliminary patch there that attempts to turn off the loop
> > >>> detection when the directory is seen to change, however that patch still
> > >>> appears to have a bug in it, and I haven't had time to figure out what
> > >>> is wrong yet.
> > >>>
> > >>> Can you perhaps take a look, Bryan?
> > >>
> > >> Actually, Justin, can you test the following slight variant on the patch
> > >> in the bugzilla?
> > >
> > > Doh! This one will actually compile....
> > 
> > Hi,
> > 
> > Should I try 3.0 first or retry 2.6.38 w/ this patch?
> > 
> > Justin.
> > 
> >
> 
> I'll give 3.0 a go first.

I had Bryan do some more tests, which revealed a couple more issues. The
attached patch should fix those, and has resisted everything we've
thrown at it so far. It should apply to 2.6.39 and newer.

Cheers
  Trond
8<----------------------------------------------------------------------- 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-28 20:48                           ` Trond Myklebust
@ 2011-07-29 20:52                             ` Bryan Schumaker
  -1 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-29 20:52 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Justin Piszcz, Christoph Hellwig, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On 07/28/2011 04:48 PM, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: 
>>
>> On Wed, 27 Jul 2011, Justin Piszcz wrote:
>>
>>>
>>>
>>> On Wed, 27 Jul 2011, Trond Myklebust wrote:
>>>
>>>> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
>>>>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
>>>>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
>>>>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>>>>>>>> Currently I do not see any dupes, however I have a script that moves
>>>>>>>>>> images out of the directory once an hour:
>>>>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>>>>>>>
>>>>>>>>> Do you keep adding files to the directory while you move files out?
>>>>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>>>>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>>>>>>>> it around 5,000 pictures or less.
>>>>>>>>
>>>>>>>>> What's the rate of additions/removals to the directory?
>>>>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>>>>>>>
>>>>>>>> atom:/d1/motion# find cam1|wc
>>>>>>>>    5215    5215  166853
>>>>>>>> atom:/d1/motion# find cam2|wc
>>>>>>>>    5069    5069  162181
>>>>>>>> atom:/d1/motion# find cam3|wc
>>>>>>>>    5594    5594  178981
>>>>>>>> atom:/d1/motion#
>>>>>>>
>>>>>>> This sounds a lot like xfs simply filling up the directory index slots
>>>>>>> of files that you just moved out with new files, and nfs falsely
>>>>>>> claiming that this is a problem.
>>>>>>
>>>>>> Yep. There is an existing bugzilla report for this bug at
>>>>>>
>>>>>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
>>>>>>
>>>>>> I have a preliminary patch there that attempts to turn off the loop
>>>>>> detection when the directory is seen to change, however that patch still
>>>>>> appears to have a bug in it, and I haven't had time to figure out what
>>>>>> is wrong yet.
>>>>>>
>>>>>> Can you perhaps take a look, Bryan?
>>>>>
>>>>> Actually, Justin, can you test the following slight variant on the patch
>>>>> in the bugzilla?
>>>>
>>>> Doh! This one will actually compile....
>>>
>>> Hi,
>>>
>>> Should I try 3.0 first or retry 2.6.38 w/ this patch?
>>>
>>> Justin.
>>>
>>>
>>
>> I'll give 3.0 a go first.
> 
> I had Bryan do some more tests, which revealed a couple more issues. The
> attached patch should fix those, and has resisted everything we've
> thrown at it so far. It should apply to 2.6.39 and newer.

This patch still looks good (after testing it a bit more today).

How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.

- Bryan

8<-----------------------------------------------------------------------
>From 4d74863dc2bcd4e603a873b3725f0a05afd21f1f Mon Sep 17 00:00:00 2001
From: Bryan Schumaker <bjschuma@netapp.com>
Date: Fri, 29 Jul 2011 11:49:06 -0400
Subject: [PATCH] Additional readdir cookie loop information

Print out the name of the file that triggers the cookie loop  message to
make it slightly easier to track down the cause.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
---
 fs/nfs/dir.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d23108b..b238d95 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -365,9 +365,10 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 					if (printk_ratelimit()) {
 						pr_notice("NFS: directory %s/%s contains a readdir loop."
 								"Please contact your server vendor.  "
-								"Offending cookie: %llu\n",
+								"The file: %s has duplicate cookie %llu\n",
 								desc->file->f_dentry->d_parent->d_name.name,
 								desc->file->f_dentry->d_name.name,
+								array->array[i].string.name,
 								*desc->dir_cookie);
 					}
 					status = -ELOOP;
-- 
1.7.6


> 
> Cheers
>   Trond
> 8<----------------------------------------------------------------------- 
> From 75c0387540737a6663338d4ec0538bd6fb724173 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Thu, 28 Jul 2011 16:34:33 -0400
> Subject: [PATCH v3] NFS: Fix spurious readdir cookie loop messages
> 
> If the directory contents change, then we have to accept that the
> file->f_pos value may shrink if we do a 'search-by-cookie'. In that
> case, we should turn off the loop detection and let the NFS client
> try to recover.
> 
> The patch also fixes a second loop detection bug by ensuring
> that after turning on the ctx->duped flag, we read at least one new
> cookie into ctx->dir_cookie before attempting to match with
> ctx->dup_cookie.
> 
> Reported-by: Petr Vandrovec <petr@vandrovec.name>
> Cc: stable@kernel.org [2.6.39+]
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>  fs/nfs/dir.c           |   56 ++++++++++++++++++++++++++++-------------------
>  include/linux/nfs_fs.h |    3 +-
>  2 files changed, 35 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 57f578e..d23108b 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -134,18 +134,19 @@ const struct inode_operations nfs4_dir_inode_operations = {
>  
>  #endif /* CONFIG_NFS_V4 */
>  
> -static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
> +static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
>  {
>  	struct nfs_open_dir_context *ctx;
>  	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>  	if (ctx != NULL) {
>  		ctx->duped = 0;
> +		ctx->attr_gencount = NFS_I(dir)->attr_gencount;
>  		ctx->dir_cookie = 0;
>  		ctx->dup_cookie = 0;
>  		ctx->cred = get_rpccred(cred);
> -	} else
> -		ctx = ERR_PTR(-ENOMEM);
> -	return ctx;
> +		return ctx;
> +	}
> +	return  ERR_PTR(-ENOMEM);
>  }
>  
>  static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
> @@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
>  	cred = rpc_lookup_cred();
>  	if (IS_ERR(cred))
>  		return PTR_ERR(cred);
> -	ctx = alloc_nfs_open_dir_context(cred);
> +	ctx = alloc_nfs_open_dir_context(inode, cred);
>  	if (IS_ERR(ctx)) {
>  		res = PTR_ERR(ctx);
>  		goto out;
> @@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
>  {
>  	loff_t diff = desc->file->f_pos - desc->current_index;
>  	unsigned int index;
> -	struct nfs_open_dir_context *ctx = desc->file->private_data;
>  
>  	if (diff < 0)
>  		goto out_eof;
> @@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
>  	index = (unsigned int)diff;
>  	*desc->dir_cookie = array->array[index].cookie;
>  	desc->cache_entry_index = index;
> -	ctx->duped = 0;
>  	return 0;
>  out_eof:
>  	desc->eof = 1;
> @@ -349,14 +348,33 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
>  	int i;
>  	loff_t new_pos;
>  	int status = -EAGAIN;
> -	struct nfs_open_dir_context *ctx = desc->file->private_data;
>  
>  	for (i = 0; i < array->size; i++) {
>  		if (array->array[i].cookie == *desc->dir_cookie) {
> +			struct nfs_inode *nfsi = NFS_I(desc->file->f_path.dentry->d_inode);
> +			struct nfs_open_dir_context *ctx = desc->file->private_data;
> +
>  			new_pos = desc->current_index + i;
> -			if (new_pos < desc->file->f_pos) {
> +			if (ctx->attr_gencount != nfsi->attr_gencount
> +			    || (nfsi->cache_validity & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA))) {
> +				ctx->duped = 0;
> +				ctx->attr_gencount = nfsi->attr_gencount;
> +			} else if (new_pos < desc->file->f_pos) {
> +				if (ctx->duped > 0
> +				    && ctx->dup_cookie == *desc->dir_cookie) {
> +					if (printk_ratelimit()) {
> +						pr_notice("NFS: directory %s/%s contains a readdir loop."
> +								"Please contact your server vendor.  "
> +								"Offending cookie: %llu\n",
> +								desc->file->f_dentry->d_parent->d_name.name,
> +								desc->file->f_dentry->d_name.name,
> +								*desc->dir_cookie);
> +					}
> +					status = -ELOOP;
> +					goto out;
> +				}
>  				ctx->dup_cookie = *desc->dir_cookie;
> -				ctx->duped = 1;
> +				ctx->duped = -1;
>  			}
>  			desc->file->f_pos = new_pos;
>  			desc->cache_entry_index = i;
> @@ -368,6 +386,7 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
>  		if (*desc->dir_cookie == array->last_cookie)
>  			desc->eof = 1;
>  	}
> +out:
>  	return status;
>  }
>  
> @@ -740,19 +759,6 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
>  	struct nfs_cache_array *array = NULL;
>  	struct nfs_open_dir_context *ctx = file->private_data;
>  
> -	if (ctx->duped != 0 && ctx->dup_cookie == *desc->dir_cookie) {
> -		if (printk_ratelimit()) {
> -			pr_notice("NFS: directory %s/%s contains a readdir loop.  "
> -				"Please contact your server vendor.  "
> -				"Offending cookie: %llu\n",
> -				file->f_dentry->d_parent->d_name.name,
> -				file->f_dentry->d_name.name,
> -				*desc->dir_cookie);
> -		}
> -		res = -ELOOP;
> -		goto out;
> -	}
> -
>  	array = nfs_readdir_get_array(desc->page);
>  	if (IS_ERR(array)) {
>  		res = PTR_ERR(array);
> @@ -774,6 +780,8 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
>  			*desc->dir_cookie = array->array[i+1].cookie;
>  		else
>  			*desc->dir_cookie = array->last_cookie;
> +		if (ctx->duped != 0)
> +			ctx->duped = 1;
>  	}
>  	if (array->eof_index >= 0)
>  		desc->eof = 1;
> @@ -805,6 +813,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
>  	struct page	*page = NULL;
>  	int		status;
>  	struct inode *inode = desc->file->f_path.dentry->d_inode;
> +	struct nfs_open_dir_context *ctx = desc->file->private_data;
>  
>  	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
>  			(unsigned long long)*desc->dir_cookie);
> @@ -818,6 +827,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
>  	desc->page_index = 0;
>  	desc->last_cookie = *desc->dir_cookie;
>  	desc->page = page;
> +	ctx->duped = 0;
>  
>  	status = nfs_readdir_xdr_to_array(desc, page, inode);
>  	if (status < 0)
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 8b579be..b96fb99 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -99,9 +99,10 @@ struct nfs_open_context {
>  
>  struct nfs_open_dir_context {
>  	struct rpc_cred *cred;
> +	unsigned long attr_gencount;
>  	__u64 dir_cookie;
>  	__u64 dup_cookie;
> -	int duped;
> +	signed char duped;
>  };
>  
>  /*


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-29 20:52                             ` Bryan Schumaker
  0 siblings, 0 replies; 69+ messages in thread
From: Bryan Schumaker @ 2011-07-29 20:52 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Justin Piszcz

On 07/28/2011 04:48 PM, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: 
>>
>> On Wed, 27 Jul 2011, Justin Piszcz wrote:
>>
>>>
>>>
>>> On Wed, 27 Jul 2011, Trond Myklebust wrote:
>>>
>>>> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
>>>>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
>>>>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
>>>>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>>>>>>>> Currently I do not see any dupes, however I have a script that moves
>>>>>>>>>> images out of the directory once an hour:
>>>>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>>>>>>>
>>>>>>>>> Do you keep adding files to the directory while you move files out?
>>>>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>>>>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>>>>>>>> it around 5,000 pictures or less.
>>>>>>>>
>>>>>>>>> What's the rate of additions/removals to the directory?
>>>>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>>>>>>>
>>>>>>>> atom:/d1/motion# find cam1|wc
>>>>>>>>    5215    5215  166853
>>>>>>>> atom:/d1/motion# find cam2|wc
>>>>>>>>    5069    5069  162181
>>>>>>>> atom:/d1/motion# find cam3|wc
>>>>>>>>    5594    5594  178981
>>>>>>>> atom:/d1/motion#
>>>>>>>
>>>>>>> This sounds a lot like xfs simply filling up the directory index slots
>>>>>>> of files that you just moved out with new files, and nfs falsely
>>>>>>> claiming that this is a problem.
>>>>>>
>>>>>> Yep. There is an existing bugzilla report for this bug at
>>>>>>
>>>>>>    https://bugzilla.kernel.org/show_bug.cgi?id=38572
>>>>>>
>>>>>> I have a preliminary patch there that attempts to turn off the loop
>>>>>> detection when the directory is seen to change, however that patch still
>>>>>> appears to have a bug in it, and I haven't had time to figure out what
>>>>>> is wrong yet.
>>>>>>
>>>>>> Can you perhaps take a look, Bryan?
>>>>>
>>>>> Actually, Justin, can you test the following slight variant on the patch
>>>>> in the bugzilla?
>>>>
>>>> Doh! This one will actually compile....
>>>
>>> Hi,
>>>
>>> Should I try 3.0 first or retry 2.6.38 w/ this patch?
>>>
>>> Justin.
>>>
>>>
>>
>> I'll give 3.0 a go first.
> 
> I had Bryan do some more tests, which revealed a couple more issues. The
> attached patch should fix those, and has resisted everything we've
> thrown at it so far. It should apply to 2.6.39 and newer.

This patch still looks good (after testing it a bit more today).

How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.

- Bryan

8<-----------------------------------------------------------------------
>From 4d74863dc2bcd4e603a873b3725f0a05afd21f1f Mon Sep 17 00:00:00 2001
From: Bryan Schumaker <bjschuma@netapp.com>
Date: Fri, 29 Jul 2011 11:49:06 -0400
Subject: [PATCH] Additional readdir cookie loop information

Print out the name of the file that triggers the cookie loop  message to
make it slightly easier to track down the cause.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
---
 fs/nfs/dir.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d23108b..b238d95 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -365,9 +365,10 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
 					if (printk_ratelimit()) {
 						pr_notice("NFS: directory %s/%s contains a readdir loop."
 								"Please contact your server vendor.  "
-								"Offending cookie: %llu\n",
+								"The file: %s has duplicate cookie %llu\n",
 								desc->file->f_dentry->d_parent->d_name.name,
 								desc->file->f_dentry->d_name.name,
+								array->array[i].string.name,
 								*desc->dir_cookie);
 					}
 					status = -ELOOP;
-- 
1.7.6


> 
> Cheers
>   Trond
> 8<----------------------------------------------------------------------- 
> From 75c0387540737a6663338d4ec0538bd6fb724173 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Thu, 28 Jul 2011 16:34:33 -0400
> Subject: [PATCH v3] NFS: Fix spurious readdir cookie loop messages
> 
> If the directory contents change, then we have to accept that the
> file->f_pos value may shrink if we do a 'search-by-cookie'. In that
> case, we should turn off the loop detection and let the NFS client
> try to recover.
> 
> The patch also fixes a second loop detection bug by ensuring
> that after turning on the ctx->duped flag, we read at least one new
> cookie into ctx->dir_cookie before attempting to match with
> ctx->dup_cookie.
> 
> Reported-by: Petr Vandrovec <petr@vandrovec.name>
> Cc: stable@kernel.org [2.6.39+]
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>  fs/nfs/dir.c           |   56 ++++++++++++++++++++++++++++-------------------
>  include/linux/nfs_fs.h |    3 +-
>  2 files changed, 35 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 57f578e..d23108b 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -134,18 +134,19 @@ const struct inode_operations nfs4_dir_inode_operations = {
>  
>  #endif /* CONFIG_NFS_V4 */
>  
> -static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
> +static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
>  {
>  	struct nfs_open_dir_context *ctx;
>  	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>  	if (ctx != NULL) {
>  		ctx->duped = 0;
> +		ctx->attr_gencount = NFS_I(dir)->attr_gencount;
>  		ctx->dir_cookie = 0;
>  		ctx->dup_cookie = 0;
>  		ctx->cred = get_rpccred(cred);
> -	} else
> -		ctx = ERR_PTR(-ENOMEM);
> -	return ctx;
> +		return ctx;
> +	}
> +	return  ERR_PTR(-ENOMEM);
>  }
>  
>  static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
> @@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
>  	cred = rpc_lookup_cred();
>  	if (IS_ERR(cred))
>  		return PTR_ERR(cred);
> -	ctx = alloc_nfs_open_dir_context(cred);
> +	ctx = alloc_nfs_open_dir_context(inode, cred);
>  	if (IS_ERR(ctx)) {
>  		res = PTR_ERR(ctx);
>  		goto out;
> @@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
>  {
>  	loff_t diff = desc->file->f_pos - desc->current_index;
>  	unsigned int index;
> -	struct nfs_open_dir_context *ctx = desc->file->private_data;
>  
>  	if (diff < 0)
>  		goto out_eof;
> @@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
>  	index = (unsigned int)diff;
>  	*desc->dir_cookie = array->array[index].cookie;
>  	desc->cache_entry_index = index;
> -	ctx->duped = 0;
>  	return 0;
>  out_eof:
>  	desc->eof = 1;
> @@ -349,14 +348,33 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
>  	int i;
>  	loff_t new_pos;
>  	int status = -EAGAIN;
> -	struct nfs_open_dir_context *ctx = desc->file->private_data;
>  
>  	for (i = 0; i < array->size; i++) {
>  		if (array->array[i].cookie == *desc->dir_cookie) {
> +			struct nfs_inode *nfsi = NFS_I(desc->file->f_path.dentry->d_inode);
> +			struct nfs_open_dir_context *ctx = desc->file->private_data;
> +
>  			new_pos = desc->current_index + i;
> -			if (new_pos < desc->file->f_pos) {
> +			if (ctx->attr_gencount != nfsi->attr_gencount
> +			    || (nfsi->cache_validity & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA))) {
> +				ctx->duped = 0;
> +				ctx->attr_gencount = nfsi->attr_gencount;
> +			} else if (new_pos < desc->file->f_pos) {
> +				if (ctx->duped > 0
> +				    && ctx->dup_cookie == *desc->dir_cookie) {
> +					if (printk_ratelimit()) {
> +						pr_notice("NFS: directory %s/%s contains a readdir loop."
> +								"Please contact your server vendor.  "
> +								"Offending cookie: %llu\n",
> +								desc->file->f_dentry->d_parent->d_name.name,
> +								desc->file->f_dentry->d_name.name,
> +								*desc->dir_cookie);
> +					}
> +					status = -ELOOP;
> +					goto out;
> +				}
>  				ctx->dup_cookie = *desc->dir_cookie;
> -				ctx->duped = 1;
> +				ctx->duped = -1;
>  			}
>  			desc->file->f_pos = new_pos;
>  			desc->cache_entry_index = i;
> @@ -368,6 +386,7 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
>  		if (*desc->dir_cookie == array->last_cookie)
>  			desc->eof = 1;
>  	}
> +out:
>  	return status;
>  }
>  
> @@ -740,19 +759,6 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
>  	struct nfs_cache_array *array = NULL;
>  	struct nfs_open_dir_context *ctx = file->private_data;
>  
> -	if (ctx->duped != 0 && ctx->dup_cookie == *desc->dir_cookie) {
> -		if (printk_ratelimit()) {
> -			pr_notice("NFS: directory %s/%s contains a readdir loop.  "
> -				"Please contact your server vendor.  "
> -				"Offending cookie: %llu\n",
> -				file->f_dentry->d_parent->d_name.name,
> -				file->f_dentry->d_name.name,
> -				*desc->dir_cookie);
> -		}
> -		res = -ELOOP;
> -		goto out;
> -	}
> -
>  	array = nfs_readdir_get_array(desc->page);
>  	if (IS_ERR(array)) {
>  		res = PTR_ERR(array);
> @@ -774,6 +780,8 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
>  			*desc->dir_cookie = array->array[i+1].cookie;
>  		else
>  			*desc->dir_cookie = array->last_cookie;
> +		if (ctx->duped != 0)
> +			ctx->duped = 1;
>  	}
>  	if (array->eof_index >= 0)
>  		desc->eof = 1;
> @@ -805,6 +813,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
>  	struct page	*page = NULL;
>  	int		status;
>  	struct inode *inode = desc->file->f_path.dentry->d_inode;
> +	struct nfs_open_dir_context *ctx = desc->file->private_data;
>  
>  	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
>  			(unsigned long long)*desc->dir_cookie);
> @@ -818,6 +827,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
>  	desc->page_index = 0;
>  	desc->last_cookie = *desc->dir_cookie;
>  	desc->page = page;
> +	ctx->duped = 0;
>  
>  	status = nfs_readdir_xdr_to_array(desc, page, inode);
>  	if (status < 0)
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 8b579be..b96fb99 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -99,9 +99,10 @@ struct nfs_open_context {
>  
>  struct nfs_open_dir_context {
>  	struct rpc_cred *cred;
> +	unsigned long attr_gencount;
>  	__u64 dir_cookie;
>  	__u64 dup_cookie;
> -	int duped;
> +	signed char duped;
>  };
>  
>  /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-29 20:52                             ` Bryan Schumaker
@ 2011-07-29 20:59                               ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-29 20:59 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: Trond Myklebust, Christoph Hellwig, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs


On Fri, 29 Jul 2011, Bryan Schumaker wrote:

> How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.


Hi,

This fails against 2.6.38:

patching file fs/nfs/dir.c
Hunk #1 FAILED at 134.
Hunk #2 FAILED at 173.
Hunk #3 FAILED at 323.
Hunk #4 FAILED at 336.
Hunk #5 FAILED at 349.
Hunk #6 succeeded at 320 (offset -48 lines).
Hunk #7 FAILED at 741.
Hunk #8 succeeded at 716 (offset -59 lines).
Hunk #9 succeeded at 749 (offset -59 lines).
Hunk #10 succeeded at 763 (offset -59 lines).
6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
patching file include/linux/nfs_fs.h
Hunk #1 FAILED at 99.
1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
atom:/usr/src/linux#

And the 3.0 kernel is broken for my wireless adapter:
http://www.gossamer-threads.com/lists/linux/kernel/1411576

If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
horrible driver (rt2800usb) and 1 person emailed me as well stating the
same thing off-list (they stick with the manufacturer's driver or the *sta
one).

Justin.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-29 20:59                               ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-29 20:59 UTC (permalink / raw)
  To: Bryan Schumaker
  Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, linux-kernel, xfs,
	Christoph Hellwig


On Fri, 29 Jul 2011, Bryan Schumaker wrote:

> How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.


Hi,

This fails against 2.6.38:

patching file fs/nfs/dir.c
Hunk #1 FAILED at 134.
Hunk #2 FAILED at 173.
Hunk #3 FAILED at 323.
Hunk #4 FAILED at 336.
Hunk #5 FAILED at 349.
Hunk #6 succeeded at 320 (offset -48 lines).
Hunk #7 FAILED at 741.
Hunk #8 succeeded at 716 (offset -59 lines).
Hunk #9 succeeded at 749 (offset -59 lines).
Hunk #10 succeeded at 763 (offset -59 lines).
6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
patching file include/linux/nfs_fs.h
Hunk #1 FAILED at 99.
1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
atom:/usr/src/linux#

And the 3.0 kernel is broken for my wireless adapter:
http://www.gossamer-threads.com/lists/linux/kernel/1411576

If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
horrible driver (rt2800usb) and 1 person emailed me as well stating the
same thing off-list (they stick with the manufacturer's driver or the *sta
one).

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-29 20:59                               ` Justin Piszcz
@ 2011-07-29 22:03                                 ` Trond Myklebust
  -1 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-29 22:03 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Bryan Schumaker, Christoph Hellwig, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs

On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote: 
> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
> 
> > How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.
> 
> 
> Hi,
> 
> This fails against 2.6.38:
> 
> patching file fs/nfs/dir.c
> Hunk #1 FAILED at 134.
> Hunk #2 FAILED at 173.
> Hunk #3 FAILED at 323.
> Hunk #4 FAILED at 336.
> Hunk #5 FAILED at 349.
> Hunk #6 succeeded at 320 (offset -48 lines).
> Hunk #7 FAILED at 741.
> Hunk #8 succeeded at 716 (offset -59 lines).
> Hunk #9 succeeded at 749 (offset -59 lines).
> Hunk #10 succeeded at 763 (offset -59 lines).
> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
> patching file include/linux/nfs_fs.h
> Hunk #1 FAILED at 99.
> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
> atom:/usr/src/linux#
> 
> And the 3.0 kernel is broken for my wireless adapter:
> http://www.gossamer-threads.com/lists/linux/kernel/1411576
> 
> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
> horrible driver (rt2800usb) and 1 person emailed me as well stating the
> same thing off-list (they stick with the manufacturer's driver or the *sta
> one).

I don't understand. The readdir loop detection code was first merged
upstream in 2.6.39. 2.6.38 doesn't report any loops...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-29 22:03                                 ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2011-07-29 22:03 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Bryan Schumaker

On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote: 
> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
> 
> > How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.
> 
> 
> Hi,
> 
> This fails against 2.6.38:
> 
> patching file fs/nfs/dir.c
> Hunk #1 FAILED at 134.
> Hunk #2 FAILED at 173.
> Hunk #3 FAILED at 323.
> Hunk #4 FAILED at 336.
> Hunk #5 FAILED at 349.
> Hunk #6 succeeded at 320 (offset -48 lines).
> Hunk #7 FAILED at 741.
> Hunk #8 succeeded at 716 (offset -59 lines).
> Hunk #9 succeeded at 749 (offset -59 lines).
> Hunk #10 succeeded at 763 (offset -59 lines).
> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
> patching file include/linux/nfs_fs.h
> Hunk #1 FAILED at 99.
> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
> atom:/usr/src/linux#
> 
> And the 3.0 kernel is broken for my wireless adapter:
> http://www.gossamer-threads.com/lists/linux/kernel/1411576
> 
> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
> horrible driver (rt2800usb) and 1 person emailed me as well stating the
> same thing off-list (they stick with the manufacturer's driver or the *sta
> one).

I don't understand. The readdir loop detection code was first merged
upstream in 2.6.39. 2.6.38 doesn't report any loops...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-29 22:03                                 ` Trond Myklebust
@ 2011-07-29 22:23                                   ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-29 22:23 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Bryan Schumaker, Christoph Hellwig, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs



On Fri, 29 Jul 2011, Trond Myklebust wrote:

> On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
>> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
>>
>>> How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.
>>
>>
>> Hi,
>>
>> This fails against 2.6.38:
>>
>> patching file fs/nfs/dir.c
>> Hunk #1 FAILED at 134.
>> Hunk #2 FAILED at 173.
>> Hunk #3 FAILED at 323.
>> Hunk #4 FAILED at 336.
>> Hunk #5 FAILED at 349.
>> Hunk #6 succeeded at 320 (offset -48 lines).
>> Hunk #7 FAILED at 741.
>> Hunk #8 succeeded at 716 (offset -59 lines).
>> Hunk #9 succeeded at 749 (offset -59 lines).
>> Hunk #10 succeeded at 763 (offset -59 lines).
>> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
>> patching file include/linux/nfs_fs.h
>> Hunk #1 FAILED at 99.
>> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
>> atom:/usr/src/linux#
>>
>> And the 3.0 kernel is broken for my wireless adapter:
>> http://www.gossamer-threads.com/lists/linux/kernel/1411576
>>
>> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
>> horrible driver (rt2800usb) and 1 person emailed me as well stating the
>> same thing off-list (they stick with the manufacturer's driver or the *sta
>> one).
>
> I don't understand. The readdir loop detection code was first merged
> upstream in 2.6.39. 2.6.38 doesn't report any loops...

Hi,

Sorry--(my error) this is meant for the client, patched & will e-mail when 
it happens again.

# patch -p1 < /home/jpiszcz/patch1
patching file fs/nfs/dir.c
patching file include/linux/nfs_fs.h

# patch -p1 < /home/jpiszcz/patch2
patching file fs/nfs/dir.c

(recompile->reboot->waiting for next error)

Justin.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-29 22:23                                   ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-29 22:23 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Bryan Schumaker



On Fri, 29 Jul 2011, Trond Myklebust wrote:

> On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
>> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
>>
>>> How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.
>>
>>
>> Hi,
>>
>> This fails against 2.6.38:
>>
>> patching file fs/nfs/dir.c
>> Hunk #1 FAILED at 134.
>> Hunk #2 FAILED at 173.
>> Hunk #3 FAILED at 323.
>> Hunk #4 FAILED at 336.
>> Hunk #5 FAILED at 349.
>> Hunk #6 succeeded at 320 (offset -48 lines).
>> Hunk #7 FAILED at 741.
>> Hunk #8 succeeded at 716 (offset -59 lines).
>> Hunk #9 succeeded at 749 (offset -59 lines).
>> Hunk #10 succeeded at 763 (offset -59 lines).
>> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
>> patching file include/linux/nfs_fs.h
>> Hunk #1 FAILED at 99.
>> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
>> atom:/usr/src/linux#
>>
>> And the 3.0 kernel is broken for my wireless adapter:
>> http://www.gossamer-threads.com/lists/linux/kernel/1411576
>>
>> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
>> horrible driver (rt2800usb) and 1 person emailed me as well stating the
>> same thing off-list (they stick with the manufacturer's driver or the *sta
>> one).
>
> I don't understand. The readdir loop detection code was first merged
> upstream in 2.6.39. 2.6.38 doesn't report any loops...

Hi,

Sorry--(my error) this is meant for the client, patched & will e-mail when 
it happens again.

# patch -p1 < /home/jpiszcz/patch1
patching file fs/nfs/dir.c
patching file include/linux/nfs_fs.h

# patch -p1 < /home/jpiszcz/patch2
patching file fs/nfs/dir.c

(recompile->reboot->waiting for next error)

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
  2011-07-29 22:23                                   ` Justin Piszcz
@ 2011-07-30  9:58                                     ` Justin Piszcz
  -1 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-30  9:58 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Bryan Schumaker, Christoph Hellwig, J. Bruce Fields, linux-nfs,
	linux-kernel, xfs



On Fri, 29 Jul 2011, Justin Piszcz wrote:

> 
> 
> On Fri, 29 Jul 2011, Trond Myklebust wrote:
> 
> > On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
> >> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
> >>
> >>> How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.
> >>
> >>
> >> Hi,
> >>
> >> This fails against 2.6.38:
> >>
> >> patching file fs/nfs/dir.c
> >> Hunk #1 FAILED at 134.
> >> Hunk #2 FAILED at 173.
> >> Hunk #3 FAILED at 323.
> >> Hunk #4 FAILED at 336.
> >> Hunk #5 FAILED at 349.
> >> Hunk #6 succeeded at 320 (offset -48 lines).
> >> Hunk #7 FAILED at 741.
> >> Hunk #8 succeeded at 716 (offset -59 lines).
> >> Hunk #9 succeeded at 749 (offset -59 lines).
> >> Hunk #10 succeeded at 763 (offset -59 lines).
> >> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
> >> patching file include/linux/nfs_fs.h
> >> Hunk #1 FAILED at 99.
> >> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
> >> atom:/usr/src/linux#
> >>
> >> And the 3.0 kernel is broken for my wireless adapter:
> >> http://www.gossamer-threads.com/lists/linux/kernel/1411576
> >>
> >> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
> >> horrible driver (rt2800usb) and 1 person emailed me as well stating the
> >> same thing off-list (they stick with the manufacturer's driver or the *sta
> >> one).
> >
> > I don't understand. The readdir loop detection code was first merged
> > upstream in 2.6.39. 2.6.38 doesn't report any loops...
> 
> Hi,
> 
> Sorry--(my error) this is meant for the client, patched & will e-mail when 
> it happens again.
> 
> # patch -p1 < /home/jpiszcz/patch1
> patching file fs/nfs/dir.c
> patching file include/linux/nfs_fs.h
> 
> # patch -p1 < /home/jpiszcz/patch2
> patching file fs/nfs/dir.c
> 
> (recompile->reboot->waiting for next error)
> 
> Justin.

So I have been running Linux 2.6.37-(.. 3.0 recently) since Jan of this year on these new hosts and I have never had so much as a kernel OOPS, with these 
patches, there were several kernel lockups/problems but the nfs/loop did 
not show up.

I've went back to the previous (non-patched) kernel, is there a less invasive
patch?

http://home.comcast.net/~jpiszcz/20110730/kernel-error.txt

Justin.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop
@ 2011-07-30  9:58                                     ` Justin Piszcz
  0 siblings, 0 replies; 69+ messages in thread
From: Justin Piszcz @ 2011-07-30  9:58 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs, Christoph Hellwig,
	Bryan Schumaker



On Fri, 29 Jul 2011, Justin Piszcz wrote:

> 
> 
> On Fri, 29 Jul 2011, Trond Myklebust wrote:
> 
> > On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
> >> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
> >>
> >>> How does this look for printing out more information when a cookie loop is detected?  Is there anything else that should be printed out?  My patch applies on top of Trond's from yesterday.
> >>
> >>
> >> Hi,
> >>
> >> This fails against 2.6.38:
> >>
> >> patching file fs/nfs/dir.c
> >> Hunk #1 FAILED at 134.
> >> Hunk #2 FAILED at 173.
> >> Hunk #3 FAILED at 323.
> >> Hunk #4 FAILED at 336.
> >> Hunk #5 FAILED at 349.
> >> Hunk #6 succeeded at 320 (offset -48 lines).
> >> Hunk #7 FAILED at 741.
> >> Hunk #8 succeeded at 716 (offset -59 lines).
> >> Hunk #9 succeeded at 749 (offset -59 lines).
> >> Hunk #10 succeeded at 763 (offset -59 lines).
> >> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
> >> patching file include/linux/nfs_fs.h
> >> Hunk #1 FAILED at 99.
> >> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
> >> atom:/usr/src/linux#
> >>
> >> And the 3.0 kernel is broken for my wireless adapter:
> >> http://www.gossamer-threads.com/lists/linux/kernel/1411576
> >>
> >> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
> >> horrible driver (rt2800usb) and 1 person emailed me as well stating the
> >> same thing off-list (they stick with the manufacturer's driver or the *sta
> >> one).
> >
> > I don't understand. The readdir loop detection code was first merged
> > upstream in 2.6.39. 2.6.38 doesn't report any loops...
> 
> Hi,
> 
> Sorry--(my error) this is meant for the client, patched & will e-mail when 
> it happens again.
> 
> # patch -p1 < /home/jpiszcz/patch1
> patching file fs/nfs/dir.c
> patching file include/linux/nfs_fs.h
> 
> # patch -p1 < /home/jpiszcz/patch2
> patching file fs/nfs/dir.c
> 
> (recompile->reboot->waiting for next error)
> 
> Justin.

So I have been running Linux 2.6.37-(.. 3.0 recently) since Jan of this year on these new hosts and I have never had so much as a kernel OOPS, with these 
patches, there were several kernel lockups/problems but the nfs/loop did 
not show up.

I've went back to the previous (non-patched) kernel, is there a less invasive
patch?

http://home.comcast.net/~jpiszcz/20110730/kernel-error.txt

Justin.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2011-07-30  9:58 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-27 13:54 2.6.xx: NFS: directory motion/cam2 contains a readdir loop Justin Piszcz
2011-07-27 16:07 ` J. Bruce Fields
2011-07-27 16:28   ` Justin Piszcz
2011-07-27 16:28     ` Justin Piszcz
2011-07-27 16:40     ` Bryan Schumaker
2011-07-27 16:40       ` Bryan Schumaker
2011-07-27 17:00       ` Ruediger Meier
2011-07-27 17:00         ` Ruediger Meier
2011-07-27 17:09         ` Bryan Schumaker
2011-07-27 17:09           ` Bryan Schumaker
2011-07-27 17:17         ` Justin Piszcz
2011-07-27 17:17           ` Justin Piszcz
2011-07-27 17:45           ` J. Bruce Fields
2011-07-27 17:45             ` J. Bruce Fields
2011-07-27 18:28         ` Bryan Schumaker
2011-07-27 18:28           ` Bryan Schumaker
2011-07-27 17:15       ` Justin Piszcz
2011-07-27 17:15         ` Justin Piszcz
2011-07-27 18:11     ` Christoph Hellwig
2011-07-27 18:11       ` Christoph Hellwig
2011-07-27 19:35       ` Justin Piszcz
2011-07-27 19:35         ` Justin Piszcz
2011-07-27 19:39         ` Christoph Hellwig
2011-07-27 19:39           ` Christoph Hellwig
2011-07-27 19:44           ` Justin Piszcz
2011-07-27 19:44             ` Justin Piszcz
2011-07-27 19:47             ` Christoph Hellwig
2011-07-27 19:47               ` Christoph Hellwig
2011-07-27 19:54               ` Bryan Schumaker
2011-07-27 19:54                 ` Bryan Schumaker
2011-07-27 20:02                 ` Christoph Hellwig
2011-07-27 20:02                   ` Christoph Hellwig
2011-07-27 20:05                   ` Christoph Hellwig
2011-07-27 20:05                     ` Christoph Hellwig
2011-07-27 20:26                   ` Rüdiger Meier
2011-07-27 20:26                     ` Rüdiger Meier
2011-07-27 20:47                     ` Christoph Hellwig
2011-07-27 20:47                       ` Christoph Hellwig
2011-07-27 21:21                       ` Rüdiger Meier
2011-07-27 21:21                         ` Rüdiger Meier
2011-07-27 19:57               ` Justin Piszcz
2011-07-27 19:57                 ` Justin Piszcz
2011-07-27 20:37               ` Trond Myklebust
2011-07-27 20:37                 ` Trond Myklebust
2011-07-27 20:54                 ` Trond Myklebust
2011-07-27 20:54                   ` Trond Myklebust
2011-07-27 20:54                   ` Trond Myklebust
2011-07-27 20:56                   ` Trond Myklebust
2011-07-27 20:56                     ` Trond Myklebust
2011-07-27 20:56                     ` Trond Myklebust
2011-07-27 21:24                     ` Justin Piszcz
2011-07-27 21:24                       ` Justin Piszcz
2011-07-27 21:24                       ` Justin Piszcz
2011-07-27 22:44                       ` Justin Piszcz
2011-07-27 22:44                         ` Justin Piszcz
2011-07-27 22:44                         ` Justin Piszcz
2011-07-28 20:48                         ` Trond Myklebust
2011-07-28 20:48                           ` Trond Myklebust
2011-07-28 20:48                           ` Trond Myklebust
2011-07-29 20:52                           ` Bryan Schumaker
2011-07-29 20:52                             ` Bryan Schumaker
2011-07-29 20:59                             ` Justin Piszcz
2011-07-29 20:59                               ` Justin Piszcz
2011-07-29 22:03                               ` Trond Myklebust
2011-07-29 22:03                                 ` Trond Myklebust
2011-07-29 22:23                                 ` Justin Piszcz
2011-07-29 22:23                                   ` Justin Piszcz
2011-07-30  9:58                                   ` Justin Piszcz
2011-07-30  9:58                                     ` Justin Piszcz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.