linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
@ 2011-12-29  0:08 Konstantinos Skarlatos
  2011-12-29  2:04 ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-29  0:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-cifs, jlayton

I mount via cifs a windows XP share, df gives me correct sizes, but when 
I ls the mount point i get input/output error.
strace: http://pastebin.com/WXf8M1nu

mount --verbose -t cifs -o username=administrator,password=blahblah 
//192.168.0.11/jobs /mnt/backups/montaz/jobs
mount.cifs kernel mount options: 
ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********

df
//192.168.0.11/jobs                                       114464  
105196      9268  92% /mnt/backups/montaz/jobs

ls /mnt/backups/montaz/jobs/
ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
total 0

the fun thing is that i can cd to a lower level directory, and ls works 
fine there! only the mount point has the problem

ls /mnt/backups/montaz/jobs/test
total 44K
drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
......

kernel version 3.2rc7

this seems to be related to :
https://lkml.org/lkml/2011/8/1/427
Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for large dirs

Please CC because i am not subscribed

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-29  0:08 cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs) Konstantinos Skarlatos
@ 2011-12-29  2:04 ` Jeff Layton
  2011-12-29 10:30   ` Konstantinos Skarlatos
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2011-12-29  2:04 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-kernel, linux-cifs

On Thu, 29 Dec 2011 02:08:57 +0200
Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:

> I mount via cifs a windows XP share, df gives me correct sizes, but when 
> I ls the mount point i get input/output error.
> strace: http://pastebin.com/WXf8M1nu
> 
> mount --verbose -t cifs -o username=administrator,password=blahblah 
> //192.168.0.11/jobs /mnt/backups/montaz/jobs
> mount.cifs kernel mount options: 
> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
> 
> df
> //192.168.0.11/jobs                                       114464  
> 105196      9268  92% /mnt/backups/montaz/jobs
> 
> ls /mnt/backups/montaz/jobs/
> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
> total 0
> 
> the fun thing is that i can cd to a lower level directory, and ls works 
> fine there! only the mount point has the problem
> 
> ls /mnt/backups/montaz/jobs/test
> total 44K
> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
> ......
> 
> kernel version 3.2rc7
> 
> this seems to be related to :
> https://lkml.org/lkml/2011/8/1/427
> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for large dirs
> 

Hmmm, maybe. What makes you think that it's related? What sort of
server are you seeing this against?

What would be most helpful is a capture and/or some cifsFYI info.
Instructions on how to capture that are here:

http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting

Would you be able to provide those?

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-29  2:04 ` Jeff Layton
@ 2011-12-29 10:30   ` Konstantinos Skarlatos
  2011-12-29 13:39     ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-29 10:30 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

On 29/12/2011 4:04 πμ, Jeff Layton wrote:
> On Thu, 29 Dec 2011 02:08:57 +0200
> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>
>> I mount via cifs a windows XP share, df gives me correct sizes, but when
>> I ls the mount point i get input/output error.
>> strace: http://pastebin.com/WXf8M1nu
>>
>> mount --verbose -t cifs -o username=administrator,password=blahblah
>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>> mount.cifs kernel mount options:
>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>
>> df
>> //192.168.0.11/jobs                                       114464
>> 105196      9268  92% /mnt/backups/montaz/jobs
>>
>> ls /mnt/backups/montaz/jobs/
>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
>> total 0
>>
>> the fun thing is that i can cd to a lower level directory, and ls works
>> fine there! only the mount point has the problem
>>
>> ls /mnt/backups/montaz/jobs/test
>> total 44K
>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
>> ......
>>
>> kernel version 3.2rc7
>>
>> this seems to be related to :
>> https://lkml.org/lkml/2011/8/1/427
>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for large dirs
>>
> Hmmm, maybe. What makes you think that it's related? What sort of
> server are you seeing this against?
Windows XP service pack 2 (greek)
>
> What would be most helpful is a capture and/or some cifsFYI info.
> Instructions on how to capture that are here:
>
> http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting
>
> Would you be able to provide those?
http://pastebin.com/XR9H5Kzj
>
> Thanks,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-29 10:30   ` Konstantinos Skarlatos
@ 2011-12-29 13:39     ` Jeff Layton
  2011-12-29 13:54       ` Konstantinos Skarlatos
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2011-12-29 13:39 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-kernel, linux-cifs

On Thu, 29 Dec 2011 12:30:18 +0200
Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:

> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
> > On Thu, 29 Dec 2011 02:08:57 +0200
> > Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
> >
> >> I mount via cifs a windows XP share, df gives me correct sizes, but when
> >> I ls the mount point i get input/output error.
> >> strace: http://pastebin.com/WXf8M1nu
> >>
> >> mount --verbose -t cifs -o username=administrator,password=blahblah
> >> //192.168.0.11/jobs /mnt/backups/montaz/jobs
> >> mount.cifs kernel mount options:
> >> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
> >>
> >> df
> >> //192.168.0.11/jobs                                       114464
> >> 105196      9268  92% /mnt/backups/montaz/jobs
> >>
> >> ls /mnt/backups/montaz/jobs/
> >> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
> >> total 0
> >>
> >> the fun thing is that i can cd to a lower level directory, and ls works
> >> fine there! only the mount point has the problem
> >>
> >> ls /mnt/backups/montaz/jobs/test
> >> total 44K
> >> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
> >> ......
> >>
> >> kernel version 3.2rc7
> >>
> >> this seems to be related to :
> >> https://lkml.org/lkml/2011/8/1/427
> >> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for large dirs
> >>
> > Hmmm, maybe. What makes you think that it's related? What sort of
> > server are you seeing this against?
> Windows XP service pack 2 (greek)


How many files are in the directory?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-29 13:39     ` Jeff Layton
@ 2011-12-29 13:54       ` Konstantinos Skarlatos
  2011-12-30  9:04         ` Konstantinos Skarlatos
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-29 13:54 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
> On Thu, 29 Dec 2011 12:30:18 +0200
> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>
>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
>>>
>>>> I mount via cifs a windows XP share, df gives me correct sizes, but when
>>>> I ls the mount point i get input/output error.
>>>> strace: http://pastebin.com/WXf8M1nu
>>>>
>>>> mount --verbose -t cifs -o username=administrator,password=blahblah
>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>> mount.cifs kernel mount options:
>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>
>>>> df
>>>> //192.168.0.11/jobs                                       114464
>>>> 105196      9268  92% /mnt/backups/montaz/jobs
>>>>
>>>> ls /mnt/backups/montaz/jobs/
>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
>>>> total 0
>>>>
>>>> the fun thing is that i can cd to a lower level directory, and ls works
>>>> fine there! only the mount point has the problem
>>>>
>>>> ls /mnt/backups/montaz/jobs/test
>>>> total 44K
>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
>>>> ......
>>>>
>>>> kernel version 3.2rc7
>>>>
>>>> this seems to be related to :
>>>> https://lkml.org/lkml/2011/8/1/427
>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for large dirs
>>>>
>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>> server are you seeing this against?
>> Windows XP service pack 2 (greek)
>
>
> How many files are in the directory?
>
140 folders and 20 files


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-29 13:54       ` Konstantinos Skarlatos
@ 2011-12-30  9:04         ` Konstantinos Skarlatos
  2011-12-30 13:11           ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-30  9:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>> On Thu, 29 Dec 2011 12:30:18 +0200
>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>
>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
>>>>
>>>>> I mount via cifs a windows XP share, df gives me correct sizes, 
>>>>> but when
>>>>> I ls the mount point i get input/output error.
>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>
>>>>> mount --verbose -t cifs -o username=administrator,password=blahblah
>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>> mount.cifs kernel mount options:
>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=******** 
>>>>>
>>>>>
>>>>> df
>>>>> //192.168.0.11/jobs                                       114464
>>>>> 105196      9268  92% /mnt/backups/montaz/jobs
>>>>>
>>>>> ls /mnt/backups/montaz/jobs/
>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
>>>>> total 0
>>>>>
>>>>> the fun thing is that i can cd to a lower level directory, and ls 
>>>>> works
>>>>> fine there! only the mount point has the problem
>>>>>
>>>>> ls /mnt/backups/montaz/jobs/test
>>>>> total 44K
>>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
>>>>> ......
>>>>>
>>>>> kernel version 3.2rc7
>>>>>
>>>>> this seems to be related to :
>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for 
>>>>> large dirs
>>>>>
>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>> server are you seeing this against?
>>> Windows XP service pack 2 (greek)
>>
>>
>> How many files are in the directory?
>>
> 140 folders and 20 files
>
Attached is a tcp dump of my session.

[-- Attachment #2: cifs-traffic.zip --]
[-- Type: application/octet-stream, Size: 9900 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-30  9:04         ` Konstantinos Skarlatos
@ 2011-12-30 13:11           ` Jeff Layton
  2011-12-30 18:00             ` Konstantinos Skarlatos
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2011-12-30 13:11 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-kernel, linux-cifs

On Fri, 30 Dec 2011 11:04:59 +0200
Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:

> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
> > On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
> >> On Thu, 29 Dec 2011 12:30:18 +0200
> >> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
> >>
> >>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
> >>>> On Thu, 29 Dec 2011 02:08:57 +0200
> >>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
> >>>>
> >>>>> I mount via cifs a windows XP share, df gives me correct sizes, 
> >>>>> but when
> >>>>> I ls the mount point i get input/output error.
> >>>>> strace: http://pastebin.com/WXf8M1nu
> >>>>>
> >>>>> mount --verbose -t cifs -o username=administrator,password=blahblah
> >>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
> >>>>> mount.cifs kernel mount options:
> >>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=******** 
> >>>>>
> >>>>>
> >>>>> df
> >>>>> //192.168.0.11/jobs                                       114464
> >>>>> 105196      9268  92% /mnt/backups/montaz/jobs
> >>>>>
> >>>>> ls /mnt/backups/montaz/jobs/
> >>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
> >>>>> total 0
> >>>>>
> >>>>> the fun thing is that i can cd to a lower level directory, and ls 
> >>>>> works
> >>>>> fine there! only the mount point has the problem
> >>>>>
> >>>>> ls /mnt/backups/montaz/jobs/test
> >>>>> total 44K
> >>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
> >>>>> ......
> >>>>>
> >>>>> kernel version 3.2rc7
> >>>>>
> >>>>> this seems to be related to :
> >>>>> https://lkml.org/lkml/2011/8/1/427
> >>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for 
> >>>>> large dirs
> >>>>>
> >>>> Hmmm, maybe. What makes you think that it's related? What sort of
> >>>> server are you seeing this against?
> >>> Windows XP service pack 2 (greek)
> >>
> >>
> >> How many files are in the directory?
> >>
> > 140 folders and 20 files
> >
> Attached is a tcp dump of my session.

I tried reproducing this here, but wasn't able to. Testing against my
xp box worked fine.

Most likely, the FIND_FILE responses are falling afoul of the code in
coalesce_t2 or check2ndT2. Unfortunately that code is pretty
complicated and I'm not certain what the problem actually is...

One thing that's interesting is that the total data being sent in the
request is rather large (16336 bytes). I think that's legit, but maybe
it's exceeding the end of the buffer once we try to coalesce it.

Would it be possible to get the cFYI output from this test?

Is this a regression? Did it work with earlier kernels and only
recently start failing?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-30 13:11           ` Jeff Layton
@ 2011-12-30 18:00             ` Konstantinos Skarlatos
  2011-12-30 21:46               ` Konstantinos Skarlatos
  2011-12-31 11:59               ` Jeff Layton
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-30 18:00 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

[-- Attachment #1: Type: text/plain, Size: 3320 bytes --]

On 30/12/2011 3:11 μμ, Jeff Layton wrote:
> On Fri, 30 Dec 2011 11:04:59 +0200
> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>
>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
>>>>
>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>    wrote:
>>>>>>
>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>> but when
>>>>>>> I ls the mount point i get input/output error.
>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>
>>>>>>> mount --verbose -t cifs -o username=administrator,password=blahblah
>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>> mount.cifs kernel mount options:
>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>>>>
>>>>>>>
>>>>>>> df
>>>>>>> //192.168.0.11/jobs                                       114464
>>>>>>> 105196      9268  92% /mnt/backups/montaz/jobs
>>>>>>>
>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
>>>>>>> total 0
>>>>>>>
>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>> works
>>>>>>> fine there! only the mount point has the problem
>>>>>>>
>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>> total 44K
>>>>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
>>>>>>> ......
>>>>>>>
>>>>>>> kernel version 3.2rc7
>>>>>>>
>>>>>>> this seems to be related to :
>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>> large dirs
>>>>>>>
>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>> server are you seeing this against?
>>>>> Windows XP service pack 2 (greek)
>>>>
>>>> How many files are in the directory?
>>>>
>>> 140 folders and 20 files
>>>
>> Attached is a tcp dump of my session.
> I tried reproducing this here, but wasn't able to. Testing against my
> xp box worked fine.
>
> Most likely, the FIND_FILE responses are falling afoul of the code in
> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
> complicated and I'm not certain what the problem actually is...
>
> One thing that's interesting is that the total data being sent in the
> request is rather large (16336 bytes). I think that's legit, but maybe
> it's exceeding the end of the buffer once we try to coalesce it.
>
> Would it be possible to get the cFYI output from this test?
I did not get a cFYI output from that test, but i redid a 
mount-ls-umount and am attaching the tcpdump
Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the 
contents of /proc/fs/cifs/DebugData form the same test
>
> Is this a regression? Did it work with earlier kernels and only
> recently start failing?
>
I do not know, and i am a bit afraid to downgrade this machine below 3.0 
due to some changes arch linux has introduced recently. I can always set 
up a few virtual machines though, and i can even request permission from 
my company to give you shell access if you like. Which kernel versions 
would you like me to test?

[-- Attachment #2: cifs-traffic2.pcap --]
[-- Type: application/octet-stream, Size: 23203 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-30 18:00             ` Konstantinos Skarlatos
@ 2011-12-30 21:46               ` Konstantinos Skarlatos
  2011-12-31  2:16                 ` Konstantinos Skarlatos
  2011-12-31 11:59               ` Jeff Layton
  1 sibling, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-30 21:46 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

[-- Attachment #1: Type: text/plain, Size: 3977 bytes --]

On 30/12/2011 8:00 μμ, Konstantinos Skarlatos wrote:
> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
>> On Fri, 30 Dec 2011 11:04:59 +0200
>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>
>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
>>>>>
>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>    wrote:
>>>>>>>
>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>>> but when
>>>>>>>> I ls the mount point i get input/output error.
>>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>>
>>>>>>>> mount --verbose -t cifs -o 
>>>>>>>> username=administrator,password=blahblah
>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>>> mount.cifs kernel mount options:
>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=******** 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> df
>>>>>>>> //192.168.0.11/jobs                                       114464
>>>>>>>> 105196      9268  92% /mnt/backups/montaz/jobs
>>>>>>>>
>>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output 
>>>>>>>> error
>>>>>>>> total 0
>>>>>>>>
>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>>> works
>>>>>>>> fine there! only the mount point has the problem
>>>>>>>>
>>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>>> total 44K
>>>>>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
>>>>>>>> ......
>>>>>>>>
>>>>>>>> kernel version 3.2rc7
>>>>>>>>
>>>>>>>> this seems to be related to :
>>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>>> large dirs
>>>>>>>>
>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>>> server are you seeing this against?
>>>>>> Windows XP service pack 2 (greek)
>>>>>
>>>>> How many files are in the directory?
>>>>>
>>>> 140 folders and 20 files
>>>>
>>> Attached is a tcp dump of my session.
>> I tried reproducing this here, but wasn't able to. Testing against my
>> xp box worked fine.
>>
>> Most likely, the FIND_FILE responses are falling afoul of the code in
>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
>> complicated and I'm not certain what the problem actually is...
>>
>> One thing that's interesting is that the total data being sent in the
>> request is rather large (16336 bytes). I think that's legit, but maybe
>> it's exceeding the end of the buffer once we try to coalesce it.
>>
>> Would it be possible to get the cFYI output from this test?
> I did not get a cFYI output from that test, but i redid a 
> mount-ls-umount and am attaching the tcpdump
> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and 
> the contents of /proc/fs/cifs/DebugData form the same test
>>
>> Is this a regression? Did it work with earlier kernels and only
>> recently start failing?
>>
> I do not know, and i am a bit afraid to downgrade this machine below 
> 3.0 due to some changes arch linux has introduced recently. I can 
> always set up a few virtual machines though, and i can even request 
> permission from my company to give you shell access if you like. Which 
> kernel versions would you like me to test?
I just tested  3.1.5-1-ARCH on a virtual machine and it works, so it is 
probably a regression... On the same virtual machine 3.2-rc7 produces 
input/output error. The virtual machine is a fresh install of arch linux.
here is the relevant pastebin http://pastebin.com/BwX2DqJC
and attached is the tcpdump
As a am a noob to all this, what should I do next in order to help you? 
maybe compile a 3.1 kernel from official sources to make sure no patches 
from arch linux interfere?


[-- Attachment #2: cifs-traffic.pcap --]
[-- Type: application/octet-stream, Size: 28885 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-30 21:46               ` Konstantinos Skarlatos
@ 2011-12-31  2:16                 ` Konstantinos Skarlatos
  0 siblings, 0 replies; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-31  2:16 UTC (permalink / raw)
  To: Jeff Layton, Shirish Pargaonkar; +Cc: linux-kernel, linux-cifs

On 30/12/2011 11:46 μμ, Konstantinos Skarlatos wrote:
> On 30/12/2011 8:00 μμ, Konstantinos Skarlatos wrote:
>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
>>> On Fri, 30 Dec 2011 11:04:59 +0200
>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>>
>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>>>>>
>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>>>> but when
>>>>>>>>> I ls the mount point i get input/output error.
>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>>>
>>>>>>>>> mount --verbose -t cifs -o
>>>>>>>>> username=administrator,password=blahblah
>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>>>> mount.cifs kernel mount options:
>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> df
>>>>>>>>> //192.168.0.11/jobs 114464
>>>>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs
>>>>>>>>>
>>>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output
>>>>>>>>> error
>>>>>>>>> total 0
>>>>>>>>>
>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>>>> works
>>>>>>>>> fine there! only the mount point has the problem
>>>>>>>>>
>>>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>>>> total 44K
>>>>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
>>>>>>>>> ......
>>>>>>>>>
>>>>>>>>> kernel version 3.2rc7
>>>>>>>>>
>>>>>>>>> this seems to be related to :
>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>>>> large dirs
>>>>>>>>>
>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>>>> server are you seeing this against?
>>>>>>> Windows XP service pack 2 (greek)
>>>>>>
>>>>>> How many files are in the directory?
>>>>>>
>>>>> 140 folders and 20 files
>>>>>
>>>> Attached is a tcp dump of my session.
>>> I tried reproducing this here, but wasn't able to. Testing against my
>>> xp box worked fine.
>>>
>>> Most likely, the FIND_FILE responses are falling afoul of the code in
>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
>>> complicated and I'm not certain what the problem actually is...
>>>
>>> One thing that's interesting is that the total data being sent in the
>>> request is rather large (16336 bytes). I think that's legit, but maybe
>>> it's exceeding the end of the buffer once we try to coalesce it.
>>>
>>> Would it be possible to get the cFYI output from this test?
>> I did not get a cFYI output from that test, but i redid a
>> mount-ls-umount and am attaching the tcpdump
>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and
>> the contents of /proc/fs/cifs/DebugData form the same test
>>>
>>> Is this a regression? Did it work with earlier kernels and only
>>> recently start failing?
>>>
>> I do not know, and i am a bit afraid to downgrade this machine below
>> 3.0 due to some changes arch linux has introduced recently. I can
>> always set up a few virtual machines though, and i can even request
>> permission from my company to give you shell access if you like. Which
>> kernel versions would you like me to test?
> I just tested 3.1.5-1-ARCH on a virtual machine and it works, so it is
> probably a regression... On the same virtual machine 3.2-rc7 produces
> input/output error. The virtual machine is a fresh install of arch linux.
> here is the relevant pastebin http://pastebin.com/BwX2DqJC
> and attached is the tcpdump
> As a am a noob to all this, what should I do next in order to help you?
> maybe compile a 3.1 kernel from official sources to make sure no patches
> from arch linux interfere?
>
After some testing, vanilla 3.1.6 does not have this bug and 3.2.0-rc1 
has it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-30 18:00             ` Konstantinos Skarlatos
  2011-12-30 21:46               ` Konstantinos Skarlatos
@ 2011-12-31 11:59               ` Jeff Layton
  2011-12-31 12:49                 ` Konstantinos Skarlatos
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2011-12-31 11:59 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-kernel, linux-cifs

[-- Attachment #1: Type: text/plain, Size: 5518 bytes --]

On Fri, 30 Dec 2011 20:00:46 +0200
Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:

> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
> > On Fri, 30 Dec 2011 11:04:59 +0200
> > Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
> >
> >> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
> >>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
> >>>> On Thu, 29 Dec 2011 12:30:18 +0200
> >>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
> >>>>
> >>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
> >>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
> >>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>    wrote:
> >>>>>>
> >>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
> >>>>>>> but when
> >>>>>>> I ls the mount point i get input/output error.
> >>>>>>> strace: http://pastebin.com/WXf8M1nu
> >>>>>>>
> >>>>>>> mount --verbose -t cifs -o username=administrator,password=blahblah
> >>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
> >>>>>>> mount.cifs kernel mount options:
> >>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
> >>>>>>>
> >>>>>>>
> >>>>>>> df
> >>>>>>> //192.168.0.11/jobs                                       114464
> >>>>>>> 105196      9268  92% /mnt/backups/montaz/jobs
> >>>>>>>
> >>>>>>> ls /mnt/backups/montaz/jobs/
> >>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
> >>>>>>> total 0
> >>>>>>>
> >>>>>>> the fun thing is that i can cd to a lower level directory, and ls
> >>>>>>> works
> >>>>>>> fine there! only the mount point has the problem
> >>>>>>>
> >>>>>>> ls /mnt/backups/montaz/jobs/test
> >>>>>>> total 44K
> >>>>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
> >>>>>>> ......
> >>>>>>>
> >>>>>>> kernel version 3.2rc7
> >>>>>>>
> >>>>>>> this seems to be related to :
> >>>>>>> https://lkml.org/lkml/2011/8/1/427
> >>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
> >>>>>>> large dirs
> >>>>>>>
> >>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
> >>>>>> server are you seeing this against?
> >>>>> Windows XP service pack 2 (greek)
> >>>>
> >>>> How many files are in the directory?
> >>>>
> >>> 140 folders and 20 files
> >>>
> >> Attached is a tcp dump of my session.
> > I tried reproducing this here, but wasn't able to. Testing against my
> > xp box worked fine.
> >
> > Most likely, the FIND_FILE responses are falling afoul of the code in
> > coalesce_t2 or check2ndT2. Unfortunately that code is pretty
> > complicated and I'm not certain what the problem actually is...
> >
> > One thing that's interesting is that the total data being sent in the
> > request is rather large (16336 bytes). I think that's legit, but maybe
> > it's exceeding the end of the buffer once we try to coalesce it.
> >
> > Would it be possible to get the cFYI output from this test?
> I did not get a cFYI output from that test, but i redid a 
> mount-ls-umount and am attaching the tcpdump
> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the 
> contents of /proc/fs/cifs/DebugData form the same test
> >
> > Is this a regression? Did it work with earlier kernels and only
> > recently start failing?
> >
> I do not know, and i am a bit afraid to downgrade this machine below 3.0 
> due to some changes arch linux has introduced recently. I can always set 
> up a few virtual machines though, and i can even request permission from 
> my company to give you shell access if you like. Which kernel versions 
> would you like me to test?


Ok, that tells us a little:

-------------------[snip]---------------------
[96268.787078] fs/cifs/cifssmb.c: In FindFirst for
[96268.787083] fs/cifs/transport.c: For smb_command 50
[96268.787086] fs/cifs/transport.c: Sending smb:  total_len 88

...FIND_FIRST command is sent

[96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
[96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2, check next response
[96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
[96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2, check next response
[96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
[96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2, check next response
[96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
[96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2, check next response

...all four parts of the response are collected here

[96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50 mid=12 state=16

...but the state at this point is MID_RESPONSE_MALFORMED

[96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
[96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
[96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid = 737644) rc = -5

...which makes readdir return -EIO

-------------------[snip]---------------------

Based on that, it looks like something in one of these frames caused
coalesce_t2() to return an error. I don't see the problem right offhand
in the capture, but T2 response handling is pretty complex so it can be
hard to see.

Would it be possible for you to rebuild your kernel (or just cifs.ko)
with this patch? Once you do that, rerun the test with cFYI turned up,
and it should help point out what the problem is.

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

[-- Attachment #2: 0001-cifs-better-instrumentation-for-coalesce_t2.patch --]
[-- Type: text/x-patch, Size: 2938 bytes --]

>From 3ae6fb6404d150c5ac026b7b16cd1c6b17e6f1ad Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Sat, 31 Dec 2011 06:53:26 -0500
Subject: [PATCH] cifs: better instrumentation for coalesce_t2

When coalesce_t2 returns an error, have it throw a cFYI message that
explains the reason.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/connect.c |   29 +++++++++++++++++++++--------
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 8cd4b52..cbf3b78 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -243,16 +243,21 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 
 	remaining = total_data_size - total_in_buf;
 
-	if (remaining < 0)
+	if (remaining < 0) {
+		cFYI(1, "Server sent too much data. total_data_size=%hu "
+			"total_in_buf=%hu", total_data_size, total_in_buf);
 		return -EPROTO;
+	}
 
-	if (remaining == 0) /* nothing to do, ignore */
+	if (remaining == 0) {
+		/* nothing to do, ignore */
+		cFYI(1, "no more data remains");
 		return 0;
+	}
 
 	total_in_buf2 = get_unaligned_le16(&pSMB2->t2_rsp.DataCount);
-	if (remaining < total_in_buf2) {
+	if (remaining < total_in_buf2)
 		cFYI(1, "transact2 2nd response contains too much data");
-	}
 
 	/* find end of first SMB data area */
 	data_area_of_target = (char *)&pSMBt->hdr.Protocol +
@@ -267,23 +272,29 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 	/* copy second buffer into end of first buffer */
 	total_in_buf += total_in_buf2;
 	/* is the result too big for the field? */
-	if (total_in_buf > USHRT_MAX)
+	if (total_in_buf > USHRT_MAX) {
+		cFYI(1, "coalesced total_in_buf too large (%u)", total_in_buf);
 		return -EPROTO;
+	}
 	put_unaligned_le16(total_in_buf, &pSMBt->t2_rsp.DataCount);
 
 	/* fix up the BCC */
 	byte_count = get_bcc(pTargetSMB);
 	byte_count += total_in_buf2;
 	/* is the result too big for the field? */
-	if (byte_count > USHRT_MAX)
+	if (byte_count > USHRT_MAX) {
+		cFYI(1, "coalesced BCC too large (%u)", byte_count);
 		return -EPROTO;
+	}
 	put_bcc(byte_count, pTargetSMB);
 
 	byte_count = be32_to_cpu(pTargetSMB->smb_buf_length);
 	byte_count += total_in_buf2;
 	/* don't allow buffer to overflow */
-	if (byte_count > CIFSMaxBufSize)
+	if (byte_count > CIFSMaxBufSize) {
+		cFYI(1, "coalesced BCC exceeds buffer size (%u)", byte_count);
 		return -ENOBUFS;
+	}
 	pTargetSMB->smb_buf_length = cpu_to_be32(byte_count);
 
 	memcpy(data_area_of_target, data_area_of_buf2, total_in_buf2);
@@ -291,8 +302,10 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 	if (remaining == total_in_buf2) {
 		cFYI(1, "found the last secondary response");
 		return 0; /* we are done */
-	} else /* more responses to go */
+	} else {/* more responses to go */
+		cFYI(1, "waiting for more secondary responses");
 		return 1;
+	}
 }
 
 static void
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-31 11:59               ` Jeff Layton
@ 2011-12-31 12:49                 ` Konstantinos Skarlatos
  2011-12-31 13:41                   ` Konstantinos Skarlatos
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-31 12:49 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

On Σάββατο, 31 Δεκέμβριος 2011 1:59:22 μμ, Jeff Layton wrote:
> On Fri, 30 Dec 2011 20:00:46 +0200
> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>
>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
>>> On Fri, 30 Dec 2011 11:04:59 +0200
>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>   wrote:
>>>
>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>    wrote:
>>>>>>
>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>     wrote:
>>>>>>>>
>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>>>> but when
>>>>>>>>> I ls the mount point i get input/output error.
>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>>>
>>>>>>>>> mount --verbose -t cifs -o username=administrator,password=blahblah
>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>>>> mount.cifs kernel mount options:
>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> df
>>>>>>>>> //192.168.0.11/jobs                                       114464
>>>>>>>>> 105196      9268  92% /mnt/backups/montaz/jobs
>>>>>>>>>
>>>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error
>>>>>>>>> total 0
>>>>>>>>>
>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>>>> works
>>>>>>>>> fine there! only the mount point has the problem
>>>>>>>>>
>>>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>>>> total 44K
>>>>>>>>> drwxr-xr-x 1 root root    0 Apr 30  2010 blah blah/
>>>>>>>>> ......
>>>>>>>>>
>>>>>>>>> kernel version 3.2rc7
>>>>>>>>>
>>>>>>>>> this seems to be related to :
>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>>>> large dirs
>>>>>>>>>
>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>>>> server are you seeing this against?
>>>>>>> Windows XP service pack 2 (greek)
>>>>>>
>>>>>> How many files are in the directory?
>>>>>>
>>>>> 140 folders and 20 files
>>>>>
>>>> Attached is a tcp dump of my session.
>>> I tried reproducing this here, but wasn't able to. Testing against my
>>> xp box worked fine.
>>>
>>> Most likely, the FIND_FILE responses are falling afoul of the code in
>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
>>> complicated and I'm not certain what the problem actually is...
>>>
>>> One thing that's interesting is that the total data being sent in the
>>> request is rather large (16336 bytes). I think that's legit, but maybe
>>> it's exceeding the end of the buffer once we try to coalesce it.
>>>
>>> Would it be possible to get the cFYI output from this test?
>> I did not get a cFYI output from that test, but i redid a
>> mount-ls-umount and am attaching the tcpdump
>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
>> contents of /proc/fs/cifs/DebugData form the same test
>>>
>>> Is this a regression? Did it work with earlier kernels and only
>>> recently start failing?
>>>
>> I do not know, and i am a bit afraid to downgrade this machine below 3.0
>> due to some changes arch linux has introduced recently. I can always set
>> up a few virtual machines though, and i can even request permission from
>> my company to give you shell access if you like. Which kernel versions
>> would you like me to test?
>
>
> Ok, that tells us a little:
>
> -------------------[snip]---------------------
> [96268.787078] fs/cifs/cifssmb.c: In FindFirst for
> [96268.787083] fs/cifs/transport.c: For smb_command 50
> [96268.787086] fs/cifs/transport.c: Sending smb:  total_len 88
>
> ...FIND_FIRST command is sent
>
> [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
> [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2, check next response
> [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
> [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2, check next response
> [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
> [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2, check next response
> [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
> [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2, check next response
>
> ...all four parts of the response are collected here
>
> [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50 mid=12 state=16
>
> ...but the state at this point is MID_RESPONSE_MALFORMED
>
> [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
> [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
> [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid = 737644) rc = -5
>
> ...which makes readdir return -EIO
>
> -------------------[snip]---------------------
>
> Based on that, it looks like something in one of these frames caused
> coalesce_t2() to return an error. I don't see the problem right offhand
> in the capture, but T2 response handling is pretty complex so it can be
> hard to see.
>
> Would it be possible for you to rebuild your kernel (or just cifs.ko)
> with this patch? Once you do that, rerun the test with cFYI turned up,
> and it should help point out what the problem is.
>
> Thanks,

Ok i am now rebuilding the kernel and will report when i have results.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-31 12:49                 ` Konstantinos Skarlatos
@ 2011-12-31 13:41                   ` Konstantinos Skarlatos
  2012-01-01  1:23                     ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantinos Skarlatos @ 2011-12-31 13:41 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

[-- Attachment #1: Type: text/plain, Size: 5820 bytes --]

On 31/12/2011 2:49 μμ, Konstantinos Skarlatos wrote:
> On Σάββατο, 31 Δεκέμβριος 2011 1:59:22 μμ, Jeff Layton wrote:
>> On Fri, 30 Dec 2011 20:00:46 +0200
>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>
>>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
>>>> On Fri, 30 Dec 2011 11:04:59 +0200
>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>>>
>>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>>>>>>
>>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>>>>> but when
>>>>>>>>>> I ls the mount point i get input/output error.
>>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>>>>
>>>>>>>>>> mount --verbose -t cifs -o
>>>>>>>>>> username=administrator,password=blahblah
>>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>>>>> mount.cifs kernel mount options:
>>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> df
>>>>>>>>>> //192.168.0.11/jobs 114464
>>>>>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs
>>>>>>>>>>
>>>>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output
>>>>>>>>>> error
>>>>>>>>>> total 0
>>>>>>>>>>
>>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>>>>> works
>>>>>>>>>> fine there! only the mount point has the problem
>>>>>>>>>>
>>>>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>>>>> total 44K
>>>>>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
>>>>>>>>>> ......
>>>>>>>>>>
>>>>>>>>>> kernel version 3.2rc7
>>>>>>>>>>
>>>>>>>>>> this seems to be related to :
>>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>>>>> large dirs
>>>>>>>>>>
>>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>>>>> server are you seeing this against?
>>>>>>>> Windows XP service pack 2 (greek)
>>>>>>>
>>>>>>> How many files are in the directory?
>>>>>>>
>>>>>> 140 folders and 20 files
>>>>>>
>>>>> Attached is a tcp dump of my session.
>>>> I tried reproducing this here, but wasn't able to. Testing against my
>>>> xp box worked fine.
>>>>
>>>> Most likely, the FIND_FILE responses are falling afoul of the code in
>>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
>>>> complicated and I'm not certain what the problem actually is...
>>>>
>>>> One thing that's interesting is that the total data being sent in the
>>>> request is rather large (16336 bytes). I think that's legit, but maybe
>>>> it's exceeding the end of the buffer once we try to coalesce it.
>>>>
>>>> Would it be possible to get the cFYI output from this test?
>>> I did not get a cFYI output from that test, but i redid a
>>> mount-ls-umount and am attaching the tcpdump
>>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
>>> contents of /proc/fs/cifs/DebugData form the same test
>>>>
>>>> Is this a regression? Did it work with earlier kernels and only
>>>> recently start failing?
>>>>
>>> I do not know, and i am a bit afraid to downgrade this machine below 3.0
>>> due to some changes arch linux has introduced recently. I can always set
>>> up a few virtual machines though, and i can even request permission from
>>> my company to give you shell access if you like. Which kernel versions
>>> would you like me to test?
>>
>>
>> Ok, that tells us a little:
>>
>> -------------------[snip]---------------------
>> [96268.787078] fs/cifs/cifssmb.c: In FindFirst for
>> [96268.787083] fs/cifs/transport.c: For smb_command 50
>> [96268.787086] fs/cifs/transport.c: Sending smb: total_len 88
>>
>> ...FIND_FIRST command is sent
>>
>> [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
>> [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2,
>> check next response
>> [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
>> [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2,
>> check next response
>> [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
>> [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2,
>> check next response
>> [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
>> [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2,
>> check next response
>>
>> ...all four parts of the response are collected here
>>
>> [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50
>> mid=12 state=16
>>
>> ...but the state at this point is MID_RESPONSE_MALFORMED
>>
>> [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
>> [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
>> [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid
>> = 737644) rc = -5
>>
>> ...which makes readdir return -EIO
>>
>> -------------------[snip]---------------------
>>
>> Based on that, it looks like something in one of these frames caused
>> coalesce_t2() to return an error. I don't see the problem right offhand
>> in the capture, but T2 response handling is pretty complex so it can be
>> hard to see.
>>
>> Would it be possible for you to rebuild your kernel (or just cifs.ko)
>> with this patch? Once you do that, rerun the test with cFYI turned up,
>> and it should help point out what the problem is.
>>
>> Thanks,
>
> Ok i am now rebuilding the kernel and will report when i have results.
http://pastebin.com/scgyDjhT and attached tcpdump

[-- Attachment #2: cifs-traffic.pcap --]
[-- Type: application/octet-stream, Size: 22933 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2011-12-31 13:41                   ` Konstantinos Skarlatos
@ 2012-01-01  1:23                     ` Jeff Layton
  2012-01-01 11:44                       ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2012-01-01  1:23 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-kernel, linux-cifs

On Sat, 31 Dec 2011 15:41:32 +0200
Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:

> On 31/12/2011 2:49 μμ, Konstantinos Skarlatos wrote:
> > On Σάββατο, 31 Δεκέμβριος 2011 1:59:22 μμ, Jeff Layton wrote:
> >> On Fri, 30 Dec 2011 20:00:46 +0200
> >> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> >>
> >>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
> >>>> On Fri, 30 Dec 2011 11:04:59 +0200
> >>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> >>>>
> >>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
> >>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
> >>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
> >>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
> >>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
> >>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
> >>>>>>>>>> but when
> >>>>>>>>>> I ls the mount point i get input/output error.
> >>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
> >>>>>>>>>>
> >>>>>>>>>> mount --verbose -t cifs -o
> >>>>>>>>>> username=administrator,password=blahblah
> >>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
> >>>>>>>>>> mount.cifs kernel mount options:
> >>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> df
> >>>>>>>>>> //192.168.0.11/jobs 114464
> >>>>>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs
> >>>>>>>>>>
> >>>>>>>>>> ls /mnt/backups/montaz/jobs/
> >>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output
> >>>>>>>>>> error
> >>>>>>>>>> total 0
> >>>>>>>>>>
> >>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
> >>>>>>>>>> works
> >>>>>>>>>> fine there! only the mount point has the problem
> >>>>>>>>>>
> >>>>>>>>>> ls /mnt/backups/montaz/jobs/test
> >>>>>>>>>> total 44K
> >>>>>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
> >>>>>>>>>> ......
> >>>>>>>>>>
> >>>>>>>>>> kernel version 3.2rc7
> >>>>>>>>>>
> >>>>>>>>>> this seems to be related to :
> >>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
> >>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
> >>>>>>>>>> large dirs
> >>>>>>>>>>
> >>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
> >>>>>>>>> server are you seeing this against?
> >>>>>>>> Windows XP service pack 2 (greek)
> >>>>>>>
> >>>>>>> How many files are in the directory?
> >>>>>>>
> >>>>>> 140 folders and 20 files
> >>>>>>
> >>>>> Attached is a tcp dump of my session.
> >>>> I tried reproducing this here, but wasn't able to. Testing against my
> >>>> xp box worked fine.
> >>>>
> >>>> Most likely, the FIND_FILE responses are falling afoul of the code in
> >>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
> >>>> complicated and I'm not certain what the problem actually is...
> >>>>
> >>>> One thing that's interesting is that the total data being sent in the
> >>>> request is rather large (16336 bytes). I think that's legit, but maybe
> >>>> it's exceeding the end of the buffer once we try to coalesce it.
> >>>>
> >>>> Would it be possible to get the cFYI output from this test?
> >>> I did not get a cFYI output from that test, but i redid a
> >>> mount-ls-umount and am attaching the tcpdump
> >>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
> >>> contents of /proc/fs/cifs/DebugData form the same test
> >>>>
> >>>> Is this a regression? Did it work with earlier kernels and only
> >>>> recently start failing?
> >>>>
> >>> I do not know, and i am a bit afraid to downgrade this machine below 3.0
> >>> due to some changes arch linux has introduced recently. I can always set
> >>> up a few virtual machines though, and i can even request permission from
> >>> my company to give you shell access if you like. Which kernel versions
> >>> would you like me to test?
> >>
> >>
> >> Ok, that tells us a little:
> >>
> >> -------------------[snip]---------------------
> >> [96268.787078] fs/cifs/cifssmb.c: In FindFirst for
> >> [96268.787083] fs/cifs/transport.c: For smb_command 50
> >> [96268.787086] fs/cifs/transport.c: Sending smb: total_len 88
> >>
> >> ...FIND_FIRST command is sent
> >>
> >> [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
> >> [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2,
> >> check next response
> >> [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
> >> [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2,
> >> check next response
> >> [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
> >> [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2,
> >> check next response
> >> [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
> >> [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2,
> >> check next response
> >>
> >> ...all four parts of the response are collected here
> >>
> >> [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50
> >> mid=12 state=16
> >>
> >> ...but the state at this point is MID_RESPONSE_MALFORMED
> >>
> >> [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
> >> [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
> >> [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid
> >> = 737644) rc = -5
> >>
> >> ...which makes readdir return -EIO
> >>
> >> -------------------[snip]---------------------
> >>
> >> Based on that, it looks like something in one of these frames caused
> >> coalesce_t2() to return an error. I don't see the problem right offhand
> >> in the capture, but T2 response handling is pretty complex so it can be
> >> hard to see.
> >>
> >> Would it be possible for you to rebuild your kernel (or just cifs.ko)
> >> with this patch? Once you do that, rerun the test with cFYI turned up,
> >> and it should help point out what the problem is.
> >>
> >> Thanks,
> >
> > Ok i am now rebuilding the kernel and will report when i have results.
> http://pastebin.com/scgyDjhT and attached tcpdump

Very helpful -- I think I see the bug now. Can you test this patch and
let me know if it fixes the problem? You'll need to back out the debug
patch first, and then apply this one. If it works, I'll plan to clean
this up and send to Steve for inclusion.

Thanks...

----------------------[snip]---------------------

[PATCH] cifs: fix buffer length check and better instrumentation for coalesce_t2

Fix the buffer length check in coalesce_t2. The current one checks to
see that the rfc1002 length is not larger than the buffer that will
hold it. It gets the check wrong though. The buffer is MAX_CIFS_HDR_SIZE
larger than the current check allows for.

Also, to make it easier to debug problems of this sort, when coalesce_t2
returns an error, have it throw a cFYI message that explains the reason.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/connect.c |   29 +++++++++++++++++++++--------
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 8cd4b52..0b033ff 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -243,16 +243,21 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 
 	remaining = total_data_size - total_in_buf;
 
-	if (remaining < 0)
+	if (remaining < 0) {
+		cFYI(1, "Server sent too much data. total_data_size=%hu "
+			"total_in_buf=%hu", total_data_size, total_in_buf);
 		return -EPROTO;
+	}
 
-	if (remaining == 0) /* nothing to do, ignore */
+	if (remaining == 0) {
+		/* nothing to do, ignore */
+		cFYI(1, "no more data remains");
 		return 0;
+	}
 
 	total_in_buf2 = get_unaligned_le16(&pSMB2->t2_rsp.DataCount);
-	if (remaining < total_in_buf2) {
+	if (remaining < total_in_buf2)
 		cFYI(1, "transact2 2nd response contains too much data");
-	}
 
 	/* find end of first SMB data area */
 	data_area_of_target = (char *)&pSMBt->hdr.Protocol +
@@ -267,23 +272,29 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 	/* copy second buffer into end of first buffer */
 	total_in_buf += total_in_buf2;
 	/* is the result too big for the field? */
-	if (total_in_buf > USHRT_MAX)
+	if (total_in_buf > USHRT_MAX) {
+		cFYI(1, "coalesced total_in_buf too large (%u)", total_in_buf);
 		return -EPROTO;
+	}
 	put_unaligned_le16(total_in_buf, &pSMBt->t2_rsp.DataCount);
 
 	/* fix up the BCC */
 	byte_count = get_bcc(pTargetSMB);
 	byte_count += total_in_buf2;
 	/* is the result too big for the field? */
-	if (byte_count > USHRT_MAX)
+	if (byte_count > USHRT_MAX) {
+		cFYI(1, "coalesced BCC too large (%u)", byte_count);
 		return -EPROTO;
+	}
 	put_bcc(byte_count, pTargetSMB);
 
 	byte_count = be32_to_cpu(pTargetSMB->smb_buf_length);
 	byte_count += total_in_buf2;
 	/* don't allow buffer to overflow */
-	if (byte_count > CIFSMaxBufSize)
+	if (byte_count > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
+		cFYI(1, "coalesced BCC exceeds buffer size (%u)", byte_count);
 		return -ENOBUFS;
+	}
 	pTargetSMB->smb_buf_length = cpu_to_be32(byte_count);
 
 	memcpy(data_area_of_target, data_area_of_buf2, total_in_buf2);
@@ -291,8 +302,10 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 	if (remaining == total_in_buf2) {
 		cFYI(1, "found the last secondary response");
 		return 0; /* we are done */
-	} else /* more responses to go */
+	} else {/* more responses to go */
+		cFYI(1, "waiting for more secondary responses");
 		return 1;
+	}
 }
 
 static void
-- 
1.7.7.4



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2012-01-01  1:23                     ` Jeff Layton
@ 2012-01-01 11:44                       ` Jeff Layton
  2012-01-01 12:38                         ` Konstantinos Skarlatos
  2012-01-01 13:58                         ` Konstantinos Skarlatos
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff Layton @ 2012-01-01 11:44 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-kernel, linux-cifs

On Sat, 31 Dec 2011 20:23:31 -0500
Jeff Layton <jlayton@redhat.com> wrote:

> On Sat, 31 Dec 2011 15:41:32 +0200
> Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:
> 
> > On 31/12/2011 2:49 μμ, Konstantinos Skarlatos wrote:
> > > On Σάββατο, 31 Δεκέμβριος 2011 1:59:22 μμ, Jeff Layton wrote:
> > >> On Fri, 30 Dec 2011 20:00:46 +0200
> > >> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> > >>
> > >>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
> > >>>> On Fri, 30 Dec 2011 11:04:59 +0200
> > >>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> > >>>>
> > >>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
> > >>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
> > >>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
> > >>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
> > >>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
> > >>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
> > >>>>>>>>>> but when
> > >>>>>>>>>> I ls the mount point i get input/output error.
> > >>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
> > >>>>>>>>>>
> > >>>>>>>>>> mount --verbose -t cifs -o
> > >>>>>>>>>> username=administrator,password=blahblah
> > >>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
> > >>>>>>>>>> mount.cifs kernel mount options:
> > >>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> df
> > >>>>>>>>>> //192.168.0.11/jobs 114464
> > >>>>>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs
> > >>>>>>>>>>
> > >>>>>>>>>> ls /mnt/backups/montaz/jobs/
> > >>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output
> > >>>>>>>>>> error
> > >>>>>>>>>> total 0
> > >>>>>>>>>>
> > >>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
> > >>>>>>>>>> works
> > >>>>>>>>>> fine there! only the mount point has the problem
> > >>>>>>>>>>
> > >>>>>>>>>> ls /mnt/backups/montaz/jobs/test
> > >>>>>>>>>> total 44K
> > >>>>>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
> > >>>>>>>>>> ......
> > >>>>>>>>>>
> > >>>>>>>>>> kernel version 3.2rc7
> > >>>>>>>>>>
> > >>>>>>>>>> this seems to be related to :
> > >>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
> > >>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
> > >>>>>>>>>> large dirs
> > >>>>>>>>>>
> > >>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
> > >>>>>>>>> server are you seeing this against?
> > >>>>>>>> Windows XP service pack 2 (greek)
> > >>>>>>>
> > >>>>>>> How many files are in the directory?
> > >>>>>>>
> > >>>>>> 140 folders and 20 files
> > >>>>>>
> > >>>>> Attached is a tcp dump of my session.
> > >>>> I tried reproducing this here, but wasn't able to. Testing against my
> > >>>> xp box worked fine.
> > >>>>
> > >>>> Most likely, the FIND_FILE responses are falling afoul of the code in
> > >>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
> > >>>> complicated and I'm not certain what the problem actually is...
> > >>>>
> > >>>> One thing that's interesting is that the total data being sent in the
> > >>>> request is rather large (16336 bytes). I think that's legit, but maybe
> > >>>> it's exceeding the end of the buffer once we try to coalesce it.
> > >>>>
> > >>>> Would it be possible to get the cFYI output from this test?
> > >>> I did not get a cFYI output from that test, but i redid a
> > >>> mount-ls-umount and am attaching the tcpdump
> > >>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
> > >>> contents of /proc/fs/cifs/DebugData form the same test
> > >>>>
> > >>>> Is this a regression? Did it work with earlier kernels and only
> > >>>> recently start failing?
> > >>>>
> > >>> I do not know, and i am a bit afraid to downgrade this machine below 3.0
> > >>> due to some changes arch linux has introduced recently. I can always set
> > >>> up a few virtual machines though, and i can even request permission from
> > >>> my company to give you shell access if you like. Which kernel versions
> > >>> would you like me to test?
> > >>
> > >>
> > >> Ok, that tells us a little:
> > >>
> > >> -------------------[snip]---------------------
> > >> [96268.787078] fs/cifs/cifssmb.c: In FindFirst for
> > >> [96268.787083] fs/cifs/transport.c: For smb_command 50
> > >> [96268.787086] fs/cifs/transport.c: Sending smb: total_len 88
> > >>
> > >> ...FIND_FIRST command is sent
> > >>
> > >> [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
> > >> [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2,
> > >> check next response
> > >> [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
> > >> [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2,
> > >> check next response
> > >> [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
> > >> [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2,
> > >> check next response
> > >> [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
> > >> [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2,
> > >> check next response
> > >>
> > >> ...all four parts of the response are collected here
> > >>
> > >> [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50
> > >> mid=12 state=16
> > >>
> > >> ...but the state at this point is MID_RESPONSE_MALFORMED
> > >>
> > >> [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
> > >> [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
> > >> [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid
> > >> = 737644) rc = -5
> > >>
> > >> ...which makes readdir return -EIO
> > >>
> > >> -------------------[snip]---------------------
> > >>
> > >> Based on that, it looks like something in one of these frames caused
> > >> coalesce_t2() to return an error. I don't see the problem right offhand
> > >> in the capture, but T2 response handling is pretty complex so it can be
> > >> hard to see.
> > >>
> > >> Would it be possible for you to rebuild your kernel (or just cifs.ko)
> > >> with this patch? Once you do that, rerun the test with cFYI turned up,
> > >> and it should help point out what the problem is.
> > >>
> > >> Thanks,
> > >
> > > Ok i am now rebuilding the kernel and will report when i have results.
> > http://pastebin.com/scgyDjhT and attached tcpdump
> 
> Very helpful -- I think I see the bug now. Can you test this patch and
> let me know if it fixes the problem? You'll need to back out the debug
> patch first, and then apply this one. If it works, I'll plan to clean
> this up and send to Steve for inclusion.
> 
> Thanks...
> 

Actually...that patch is off by 4 bytes. Since we're so late in the
cycle too, we should probably keep extraneous changes to a minimum.
I'll probably propose this patch for 3.2 once you've tested it, and
will spin up a patch to clean up coalesce_t2 for 3.3.

If you're already testing the other one, then it's probably good enough
to verify the fix. If you haven't tested it yet though, then you should
probably test this one instead. Either way, let me know soon if you can
so we can get this into 3.2.

---------------------------[snip]----------------------------


[PATCH] cifs: fix bad buffer length check in coalesce_t2

The current check looks to see if the RFC1002 length is larger than
CIFSMaxBufSize and fails if it is. The buffer is actually larger than
that by MAX_CIFS_HDR_SIZE. Fix the check so that it doesn't fail due
to this.

This bug has been around for a long time, but the fact that we used to
cap the clients MaxBufferSize at the same level as the server tended
to paper over it. Commit c974befa changed that however.

Reported-by: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/connect.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 8cd4b52..27c4f25 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -282,7 +282,7 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
 	byte_count = be32_to_cpu(pTargetSMB->smb_buf_length);
 	byte_count += total_in_buf2;
 	/* don't allow buffer to overflow */
-	if (byte_count > CIFSMaxBufSize)
+	if (byte_count > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE - 4)
 		return -ENOBUFS;
 	pTargetSMB->smb_buf_length = cpu_to_be32(byte_count);
 
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2012-01-01 11:44                       ` Jeff Layton
@ 2012-01-01 12:38                         ` Konstantinos Skarlatos
  2012-01-01 13:58                         ` Konstantinos Skarlatos
  1 sibling, 0 replies; 17+ messages in thread
From: Konstantinos Skarlatos @ 2012-01-01 12:38 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

On 1/1/2012 1:44 μμ, Jeff Layton wrote:
> On Sat, 31 Dec 2011 20:23:31 -0500
> Jeff Layton<jlayton@redhat.com>  wrote:
>
>> On Sat, 31 Dec 2011 15:41:32 +0200
>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>
>>> On 31/12/2011 2:49 μμ, Konstantinos Skarlatos wrote:
>>>> On Σάββατο, 31 Δεκέμβριος 2011 1:59:22 μμ, Jeff Layton wrote:
>>>>> On Fri, 30 Dec 2011 20:00:46 +0200
>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>
>>>>>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
>>>>>>> On Fri, 30 Dec 2011 11:04:59 +0200
>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>>>
>>>>>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>>>>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>>>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>>>>>>>> but when
>>>>>>>>>>>>> I ls the mount point i get input/output error.
>>>>>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>>>>>>>
>>>>>>>>>>>>> mount --verbose -t cifs -o
>>>>>>>>>>>>> username=administrator,password=blahblah
>>>>>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>>>>>>>> mount.cifs kernel mount options:
>>>>>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> df
>>>>>>>>>>>>> //192.168.0.11/jobs 114464
>>>>>>>>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs
>>>>>>>>>>>>>
>>>>>>>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output
>>>>>>>>>>>>> error
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>>>>>>>> works
>>>>>>>>>>>>> fine there! only the mount point has the problem
>>>>>>>>>>>>>
>>>>>>>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>>>>>>>> total 44K
>>>>>>>>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
>>>>>>>>>>>>> ......
>>>>>>>>>>>>>
>>>>>>>>>>>>> kernel version 3.2rc7
>>>>>>>>>>>>>
>>>>>>>>>>>>> this seems to be related to :
>>>>>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>>>>>>>> large dirs
>>>>>>>>>>>>>
>>>>>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>>>>>>>> server are you seeing this against?
>>>>>>>>>>> Windows XP service pack 2 (greek)
>>>>>>>>>>
>>>>>>>>>> How many files are in the directory?
>>>>>>>>>>
>>>>>>>>> 140 folders and 20 files
>>>>>>>>>
>>>>>>>> Attached is a tcp dump of my session.
>>>>>>> I tried reproducing this here, but wasn't able to. Testing against my
>>>>>>> xp box worked fine.
>>>>>>>
>>>>>>> Most likely, the FIND_FILE responses are falling afoul of the code in
>>>>>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
>>>>>>> complicated and I'm not certain what the problem actually is...
>>>>>>>
>>>>>>> One thing that's interesting is that the total data being sent in the
>>>>>>> request is rather large (16336 bytes). I think that's legit, but maybe
>>>>>>> it's exceeding the end of the buffer once we try to coalesce it.
>>>>>>>
>>>>>>> Would it be possible to get the cFYI output from this test?
>>>>>> I did not get a cFYI output from that test, but i redid a
>>>>>> mount-ls-umount and am attaching the tcpdump
>>>>>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
>>>>>> contents of /proc/fs/cifs/DebugData form the same test
>>>>>>>
>>>>>>> Is this a regression? Did it work with earlier kernels and only
>>>>>>> recently start failing?
>>>>>>>
>>>>>> I do not know, and i am a bit afraid to downgrade this machine below 3.0
>>>>>> due to some changes arch linux has introduced recently. I can always set
>>>>>> up a few virtual machines though, and i can even request permission from
>>>>>> my company to give you shell access if you like. Which kernel versions
>>>>>> would you like me to test?
>>>>>
>>>>>
>>>>> Ok, that tells us a little:
>>>>>
>>>>> -------------------[snip]---------------------
>>>>> [96268.787078] fs/cifs/cifssmb.c: In FindFirst for
>>>>> [96268.787083] fs/cifs/transport.c: For smb_command 50
>>>>> [96268.787086] fs/cifs/transport.c: Sending smb: total_len 88
>>>>>
>>>>> ...FIND_FIRST command is sent
>>>>>
>>>>> [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
>>>>> [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2,
>>>>> check next response
>>>>> [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
>>>>> [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2,
>>>>> check next response
>>>>> [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
>>>>> [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2,
>>>>> check next response
>>>>> [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
>>>>> [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2,
>>>>> check next response
>>>>>
>>>>> ...all four parts of the response are collected here
>>>>>
>>>>> [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50
>>>>> mid=12 state=16
>>>>>
>>>>> ...but the state at this point is MID_RESPONSE_MALFORMED
>>>>>
>>>>> [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
>>>>> [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
>>>>> [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid
>>>>> = 737644) rc = -5
>>>>>
>>>>> ...which makes readdir return -EIO
>>>>>
>>>>> -------------------[snip]---------------------
>>>>>
>>>>> Based on that, it looks like something in one of these frames caused
>>>>> coalesce_t2() to return an error. I don't see the problem right offhand
>>>>> in the capture, but T2 response handling is pretty complex so it can be
>>>>> hard to see.
>>>>>
>>>>> Would it be possible for you to rebuild your kernel (or just cifs.ko)
>>>>> with this patch? Once you do that, rerun the test with cFYI turned up,
>>>>> and it should help point out what the problem is.
>>>>>
>>>>> Thanks,
>>>>
>>>> Ok i am now rebuilding the kernel and will report when i have results.
>>> http://pastebin.com/scgyDjhT and attached tcpdump
>>
>> Very helpful -- I think I see the bug now. Can you test this patch and
>> let me know if it fixes the problem? You'll need to back out the debug
>> patch first, and then apply this one. If it works, I'll plan to clean
>> this up and send to Steve for inclusion.
>>
>> Thanks...
>>
>
> Actually...that patch is off by 4 bytes. Since we're so late in the
> cycle too, we should probably keep extraneous changes to a minimum.
> I'll probably propose this patch for 3.2 once you've tested it, and
> will spin up a patch to clean up coalesce_t2 for 3.3.
>
> If you're already testing the other one, then it's probably good enough
> to verify the fix. If you haven't tested it yet though, then you should
> probably test this one instead. Either way, let me know soon if you can
> so we can get this into 3.2.
Hello, i just woke up from the new year celebration :) . I think that i 
will be able to test this patch in an hour or so.

Regards
>
> ---------------------------[snip]----------------------------
>
>
> [PATCH] cifs: fix bad buffer length check in coalesce_t2
>
> The current check looks to see if the RFC1002 length is larger than
> CIFSMaxBufSize and fails if it is. The buffer is actually larger than
> that by MAX_CIFS_HDR_SIZE. Fix the check so that it doesn't fail due
> to this.
>
> This bug has been around for a long time, but the fact that we used to
> cap the clients MaxBufferSize at the same level as the server tended
> to paper over it. Commit c974befa changed that however.
>
> Reported-by: Konstantinos Skarlatos<k.skarlatos@gmail.com>
> Signed-off-by: Jeff Layton<jlayton@redhat.com>
> ---
>   fs/cifs/connect.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 8cd4b52..27c4f25 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -282,7 +282,7 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
>   	byte_count = be32_to_cpu(pTargetSMB->smb_buf_length);
>   	byte_count += total_in_buf2;
>   	/* don't allow buffer to overflow */
> -	if (byte_count>  CIFSMaxBufSize)
> +	if (byte_count>  CIFSMaxBufSize + MAX_CIFS_HDR_SIZE - 4)
>   		return -ENOBUFS;
>   	pTargetSMB->smb_buf_length = cpu_to_be32(byte_count);
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs)
  2012-01-01 11:44                       ` Jeff Layton
  2012-01-01 12:38                         ` Konstantinos Skarlatos
@ 2012-01-01 13:58                         ` Konstantinos Skarlatos
  1 sibling, 0 replies; 17+ messages in thread
From: Konstantinos Skarlatos @ 2012-01-01 13:58 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-cifs

On 1/1/2012 1:44 μμ, Jeff Layton wrote:
> On Sat, 31 Dec 2011 20:23:31 -0500
> Jeff Layton<jlayton@redhat.com>  wrote:
>
>> On Sat, 31 Dec 2011 15:41:32 +0200
>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>
>>> On 31/12/2011 2:49 μμ, Konstantinos Skarlatos wrote:
>>>> On Σάββατο, 31 Δεκέμβριος 2011 1:59:22 μμ, Jeff Layton wrote:
>>>>> On Fri, 30 Dec 2011 20:00:46 +0200
>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>
>>>>>> On 30/12/2011 3:11 μμ, Jeff Layton wrote:
>>>>>>> On Fri, 30 Dec 2011 11:04:59 +0200
>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>>>
>>>>>>>> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote:
>>>>>>>>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote:
>>>>>>>>>> On Thu, 29 Dec 2011 12:30:18 +0200
>>>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote:
>>>>>>>>>>>> On Thu, 29 Dec 2011 02:08:57 +0200
>>>>>>>>>>>> Konstantinos Skarlatos<k.skarlatos@gmail.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I mount via cifs a windows XP share, df gives me correct sizes,
>>>>>>>>>>>>> but when
>>>>>>>>>>>>> I ls the mount point i get input/output error.
>>>>>>>>>>>>> strace: http://pastebin.com/WXf8M1nu
>>>>>>>>>>>>>
>>>>>>>>>>>>> mount --verbose -t cifs -o
>>>>>>>>>>>>> username=administrator,password=blahblah
>>>>>>>>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs
>>>>>>>>>>>>> mount.cifs kernel mount options:
>>>>>>>>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> df
>>>>>>>>>>>>> //192.168.0.11/jobs 114464
>>>>>>>>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs
>>>>>>>>>>>>>
>>>>>>>>>>>>> ls /mnt/backups/montaz/jobs/
>>>>>>>>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output
>>>>>>>>>>>>> error
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> the fun thing is that i can cd to a lower level directory, and ls
>>>>>>>>>>>>> works
>>>>>>>>>>>>> fine there! only the mount point has the problem
>>>>>>>>>>>>>
>>>>>>>>>>>>> ls /mnt/backups/montaz/jobs/test
>>>>>>>>>>>>> total 44K
>>>>>>>>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
>>>>>>>>>>>>> ......
>>>>>>>>>>>>>
>>>>>>>>>>>>> kernel version 3.2rc7
>>>>>>>>>>>>>
>>>>>>>>>>>>> this seems to be related to :
>>>>>>>>>>>>> https://lkml.org/lkml/2011/8/1/427
>>>>>>>>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
>>>>>>>>>>>>> large dirs
>>>>>>>>>>>>>
>>>>>>>>>>>> Hmmm, maybe. What makes you think that it's related? What sort of
>>>>>>>>>>>> server are you seeing this against?
>>>>>>>>>>> Windows XP service pack 2 (greek)
>>>>>>>>>>
>>>>>>>>>> How many files are in the directory?
>>>>>>>>>>
>>>>>>>>> 140 folders and 20 files
>>>>>>>>>
>>>>>>>> Attached is a tcp dump of my session.
>>>>>>> I tried reproducing this here, but wasn't able to. Testing against my
>>>>>>> xp box worked fine.
>>>>>>>
>>>>>>> Most likely, the FIND_FILE responses are falling afoul of the code in
>>>>>>> coalesce_t2 or check2ndT2. Unfortunately that code is pretty
>>>>>>> complicated and I'm not certain what the problem actually is...
>>>>>>>
>>>>>>> One thing that's interesting is that the total data being sent in the
>>>>>>> request is rather large (16336 bytes). I think that's legit, but maybe
>>>>>>> it's exceeding the end of the buffer once we try to coalesce it.
>>>>>>>
>>>>>>> Would it be possible to get the cFYI output from this test?
>>>>>> I did not get a cFYI output from that test, but i redid a
>>>>>> mount-ls-umount and am attaching the tcpdump
>>>>>> Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
>>>>>> contents of /proc/fs/cifs/DebugData form the same test
>>>>>>>
>>>>>>> Is this a regression? Did it work with earlier kernels and only
>>>>>>> recently start failing?
>>>>>>>
>>>>>> I do not know, and i am a bit afraid to downgrade this machine below 3.0
>>>>>> due to some changes arch linux has introduced recently. I can always set
>>>>>> up a few virtual machines though, and i can even request permission from
>>>>>> my company to give you shell access if you like. Which kernel versions
>>>>>> would you like me to test?
>>>>>
>>>>>
>>>>> Ok, that tells us a little:
>>>>>
>>>>> -------------------[snip]---------------------
>>>>> [96268.787078] fs/cifs/cifssmb.c: In FindFirst for
>>>>> [96268.787083] fs/cifs/transport.c: For smb_command 50
>>>>> [96268.787086] fs/cifs/transport.c: Sending smb: total_len 88
>>>>>
>>>>> ...FIND_FIRST command is sent
>>>>>
>>>>> [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
>>>>> [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2,
>>>>> check next response
>>>>> [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
>>>>> [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2,
>>>>> check next response
>>>>> [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
>>>>> [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2,
>>>>> check next response
>>>>> [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
>>>>> [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2,
>>>>> check next response
>>>>>
>>>>> ...all four parts of the response are collected here
>>>>>
>>>>> [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50
>>>>> mid=12 state=16
>>>>>
>>>>> ...but the state at this point is MID_RESPONSE_MALFORMED
>>>>>
>>>>> [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
>>>>> [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
>>>>> [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid
>>>>> = 737644) rc = -5
>>>>>
>>>>> ...which makes readdir return -EIO
>>>>>
>>>>> -------------------[snip]---------------------
>>>>>
>>>>> Based on that, it looks like something in one of these frames caused
>>>>> coalesce_t2() to return an error. I don't see the problem right offhand
>>>>> in the capture, but T2 response handling is pretty complex so it can be
>>>>> hard to see.
>>>>>
>>>>> Would it be possible for you to rebuild your kernel (or just cifs.ko)
>>>>> with this patch? Once you do that, rerun the test with cFYI turned up,
>>>>> and it should help point out what the problem is.
>>>>>
>>>>> Thanks,
>>>>
>>>> Ok i am now rebuilding the kernel and will report when i have results.
>>> http://pastebin.com/scgyDjhT and attached tcpdump
>>
>> Very helpful -- I think I see the bug now. Can you test this patch and
>> let me know if it fixes the problem? You'll need to back out the debug
>> patch first, and then apply this one. If it works, I'll plan to clean
>> this up and send to Steve for inclusion.
>>
>> Thanks...
>>
>
> Actually...that patch is off by 4 bytes. Since we're so late in the
> cycle too, we should probably keep extraneous changes to a minimum.
> I'll probably propose this patch for 3.2 once you've tested it, and
> will spin up a patch to clean up coalesce_t2 for 3.3.
>
> If you're already testing the other one, then it's probably good enough
> to verify the fix. If you haven't tested it yet though, then you should
> probably test this one instead. Either way, let me know soon if you can
> so we can get this into 3.2.

Ok, I tested and it works fine for me. Thanks!

PS: would you like me to send any cFYI or tcpdumps?

Tested-by: Konstantinos Skarlatos<k.skarlatos@gmail.com>

>
> ---------------------------[snip]----------------------------
>
>
> [PATCH] cifs: fix bad buffer length check in coalesce_t2
>
> The current check looks to see if the RFC1002 length is larger than
> CIFSMaxBufSize and fails if it is. The buffer is actually larger than
> that by MAX_CIFS_HDR_SIZE. Fix the check so that it doesn't fail due
> to this.
>
> This bug has been around for a long time, but the fact that we used to
> cap the clients MaxBufferSize at the same level as the server tended
> to paper over it. Commit c974befa changed that however.
>
> Reported-by: Konstantinos Skarlatos<k.skarlatos@gmail.com>
> Signed-off-by: Jeff Layton<jlayton@redhat.com>
> ---
>   fs/cifs/connect.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 8cd4b52..27c4f25 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -282,7 +282,7 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
>   	byte_count = be32_to_cpu(pTargetSMB->smb_buf_length);
>   	byte_count += total_in_buf2;
>   	/* don't allow buffer to overflow */
> -	if (byte_count>  CIFSMaxBufSize)
> +	if (byte_count>  CIFSMaxBufSize + MAX_CIFS_HDR_SIZE - 4)
>   		return -ENOBUFS;
>   	pTargetSMB->smb_buf_length = cpu_to_be32(byte_count);
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-01-01 13:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-29  0:08 cifs: ls of mount point gives input/output error (probably related to CIFS: getdents() broken for large dirs) Konstantinos Skarlatos
2011-12-29  2:04 ` Jeff Layton
2011-12-29 10:30   ` Konstantinos Skarlatos
2011-12-29 13:39     ` Jeff Layton
2011-12-29 13:54       ` Konstantinos Skarlatos
2011-12-30  9:04         ` Konstantinos Skarlatos
2011-12-30 13:11           ` Jeff Layton
2011-12-30 18:00             ` Konstantinos Skarlatos
2011-12-30 21:46               ` Konstantinos Skarlatos
2011-12-31  2:16                 ` Konstantinos Skarlatos
2011-12-31 11:59               ` Jeff Layton
2011-12-31 12:49                 ` Konstantinos Skarlatos
2011-12-31 13:41                   ` Konstantinos Skarlatos
2012-01-01  1:23                     ` Jeff Layton
2012-01-01 11:44                       ` Jeff Layton
2012-01-01 12:38                         ` Konstantinos Skarlatos
2012-01-01 13:58                         ` Konstantinos Skarlatos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).