All of lore.kernel.org
 help / color / mirror / Atom feed
* Expected response from server not supporting v4
@ 2011-08-16  8:01 Shehjar Tikoo
  2011-08-16 12:50 ` Steve Dickson
  0 siblings, 1 reply; 8+ messages in thread
From: Shehjar Tikoo @ 2011-08-16  8:01 UTC (permalink / raw)
  To: linux-nfs

Hi All

The following thread discusses the behaviour when the client does not 
support v4:
http://thread.gmane.org/gmane.linux.nfs/36928/

OTOH, when the server does not support v4, for eg. Gluster NFS server, 
where we support only v3, I believe v4 client will attempt to connect 
directly to port 2049 and receive connection failure errors on TCP. Does 
the current nfs client handle the situation where this results in a timeout 
for mount? We're hearing  a report of a timeout occurring on the RHEL6 
client because the server does not have v4 support. Could someone please 
shed some light on how this behaviour is handled at present? Thanks

-Shehjar

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
  2011-08-16  8:01 Expected response from server not supporting v4 Shehjar Tikoo
@ 2011-08-16 12:50 ` Steve Dickson
  2011-08-17  6:35   ` Shehjar Tikoo
  0 siblings, 1 reply; 8+ messages in thread
From: Steve Dickson @ 2011-08-16 12:50 UTC (permalink / raw)
  To: Shehjar Tikoo; +Cc: linux-nfs



On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
> Hi All
> 
> The following thread discusses the behaviour when the client does not support v4:
> http://thread.gmane.org/gmane.linux.nfs/36928/
> 
> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
Here is the current logic as to what will cause a fall back:

    switch (errno) {
    case EPROTONOSUPPORT:
        /* A clear indication that the server or our
         * client does not support NFS version 4. */
        goto fall_back;
    case ENOENT:
        /* Legacy Linux servers don't export an NFS
         * version 4 pseudoroot. */
        goto fall_back;
    case EPERM:
        /* Linux servers prior to 2.6.25 may return
         * EPERM when NFS version 4 is not supported. */
        goto fall_back;
    default:
        return result;
    }

fall_back:
    return nfs_try_mount_v3v2(mi);

So in the case of the Gluster server, you are dropping into the
default case which is causing the time out.

In the above patch set, Mi patches the mount code 
to fall back on EINVAL which is the current return value from 
the kernel, when v4 is not configured. I'm not totally 
against doing something like this, but this is very touchy  
code since it could have negative effects on other legacy 
servers.

So I'm thinking Mi's kernel patch that cause the kernel
to return EPROTONOSUPPORT, which is the correct return
value, is probably the better way to go... 

With that said, to get this type of functionality into
already released distros, maybe the mount patch should be 
looked into since it much easier to back port and people 
are more will to take nfs-utils updates than kernel updates... 

So, if by chance, a well place bz is opened against an 
already released distro, someone would have to make that 
call... ;-) 

steved.
  



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
  2011-08-16 12:50 ` Steve Dickson
@ 2011-08-17  6:35   ` Shehjar Tikoo
  2011-08-17 15:27     ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Shehjar Tikoo @ 2011-08-17  6:35 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs

Steve Dickson wrote:
> 
> On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
>> Hi All
>>
>> The following thread discusses the behaviour when the client does not support v4:
>> http://thread.gmane.org/gmane.linux.nfs/36928/
>>
>> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
> Here is the current logic as to what will cause a fall back:
> 
>     switch (errno) {
>     case EPROTONOSUPPORT:
>         /* A clear indication that the server or our
>          * client does not support NFS version 4. */
>         goto fall_back;
>     case ENOENT:
>         /* Legacy Linux servers don't export an NFS
>          * version 4 pseudoroot. */
>         goto fall_back;
>     case EPERM:
>         /* Linux servers prior to 2.6.25 may return
>          * EPERM when NFS version 4 is not supported. */
>         goto fall_back;
>     default:
>         return result;
>     }
> 
> fall_back:
>     return nfs_try_mount_v3v2(mi);
> 
> So in the case of the Gluster server, you are dropping into the
> default case which is causing the time out.
> 
> In the above patch set, Mi patches the mount code 
> to fall back on EINVAL which is the current return value from 
> the kernel, when v4 is not configured. I'm not totally 
> against doing something like this, but this is very touchy  
> code since it could have negative effects on other legacy 
> servers.
> 
> So I'm thinking Mi's kernel patch that cause the kernel
> to return EPROTONOSUPPORT, which is the correct return
> value, is probably the better way to go... 
> 

Thanks Steve. My understanding is that Mi's patch is to handle the case 
where the client does not support v4. Do you think the same patch will also 
handle a server that does not support v4 and hence prevents a client from 
connecting to 2049?

Thanks
-Shehjar


> With that said, to get this type of functionality into
> already released distros, maybe the mount patch should be 
> looked into since it much easier to back port and people 
> are more will to take nfs-utils updates than kernel updates... 
> 
> So, if by chance, a well place bz is opened against an 
> already released distro, someone would have to make that 
> call... ;-) 
> 
> steved.
>   
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
  2011-08-17  6:35   ` Shehjar Tikoo
@ 2011-08-17 15:27     ` Chuck Lever
       [not found]       ` <4E4E0171.6010104@gluster.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2011-08-17 15:27 UTC (permalink / raw)
  To: Shehjar Tikoo; +Cc: Steve Dickson, linux-nfs


On Aug 17, 2011, at 2:35 AM, Shehjar Tikoo wrote:

> Steve Dickson wrote:
>> On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
>>> Hi All
>>> 
>>> The following thread discusses the behaviour when the client does not support v4:
>>> http://thread.gmane.org/gmane.linux.nfs/36928/
>>> 
>>> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
>> Here is the current logic as to what will cause a fall back:
>>    switch (errno) {
>>    case EPROTONOSUPPORT:
>>        /* A clear indication that the server or our
>>         * client does not support NFS version 4. */
>>        goto fall_back;
>>    case ENOENT:
>>        /* Legacy Linux servers don't export an NFS
>>         * version 4 pseudoroot. */
>>        goto fall_back;
>>    case EPERM:
>>        /* Linux servers prior to 2.6.25 may return
>>         * EPERM when NFS version 4 is not supported. */
>>        goto fall_back;
>>    default:
>>        return result;
>>    }
>> fall_back:
>>    return nfs_try_mount_v3v2(mi);
>> So in the case of the Gluster server, you are dropping into the
>> default case which is causing the time out.
>> In the above patch set, Mi patches the mount code to fall back on EINVAL which is the current return value from the kernel, when v4 is not configured. I'm not totally against doing something like this, but this is very touchy  code since it could have negative effects on other legacy servers.
>> So I'm thinking Mi's kernel patch that cause the kernel
>> to return EPROTONOSUPPORT, which is the correct return
>> value, is probably the better way to go... 
> 
> Thanks Steve. My understanding is that Mi's patch is to handle the case where the client does not support v4. Do you think the same patch will also handle a server that does not support v4 and hence prevents a client from connecting to 2049?

It's a best practice for clients to connect to 2049 immediately, rather than querying the server's portmapper, to discover and potentially connect to a server's NFSv4 service.

A full-frame network trace of a mount attempt that times out would tell us if there is something pathological going on.

>> With that said, to get this type of functionality into
>> already released distros, maybe the mount patch should be looked into since it much easier to back port and people are more will to take nfs-utils updates than kernel updates... So, if by chance, a well place bz is opened against an already released distro, someone would have to make that call... ;-) steved.
>>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
       [not found]       ` <4E4E0171.6010104@gluster.com>
@ 2011-08-19 16:09         ` Chuck Lever
  2011-08-23  9:21           ` Shehjar Tikoo
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2011-08-19 16:09 UTC (permalink / raw)
  To: Shehjar Tikoo; +Cc: Steve Dickson, linux-nfs


On Aug 19, 2011, at 2:23 AM, Shehjar Tikoo wrote:

> Chuck Lever wrote:
>> On Aug 17, 2011, at 2:35 AM, Shehjar Tikoo wrote:
>>> Steve Dickson wrote:
>>>> On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
>>>>> Hi All
>>>>> 
>>>>> The following thread discusses the behaviour when the client does not support v4:
>>>>> http://thread.gmane.org/gmane.linux.nfs/36928/
>>>>> 
>>>>> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
>>>> Here is the current logic as to what will cause a fall back:
>>>>   switch (errno) {
>>>>   case EPROTONOSUPPORT:
>>>>       /* A clear indication that the server or our
>>>>        * client does not support NFS version 4. */
>>>>       goto fall_back;
>>>>   case ENOENT:
>>>>       /* Legacy Linux servers don't export an NFS
>>>>        * version 4 pseudoroot. */
>>>>       goto fall_back;
>>>>   case EPERM:
>>>>       /* Linux servers prior to 2.6.25 may return
>>>>        * EPERM when NFS version 4 is not supported. */
>>>>       goto fall_back;
>>>>   default:
>>>>       return result;
>>>>   }
>>>> fall_back:
>>>>   return nfs_try_mount_v3v2(mi);
>>>> So in the case of the Gluster server, you are dropping into the
>>>> default case which is causing the time out.
>>>> In the above patch set, Mi patches the mount code to fall back on EINVAL which is the current return value from the kernel, when v4 is not configured. I'm not totally against doing something like this, but this is very touchy  code since it could have negative effects on other legacy servers.
>>>> So I'm thinking Mi's kernel patch that cause the kernel
>>>> to return EPROTONOSUPPORT, which is the correct return
>>>> value, is probably the better way to go... 
>>> Thanks Steve. My understanding is that Mi's patch is to handle the case where the client does not support v4. Do you think the same patch will also handle a server that does not support v4 and hence prevents a client from connecting to 2049?
>> It's a best practice for clients to connect to 2049 immediately, rather than querying the server's portmapper, to discover and potentially connect to a server's NFSv4 service.
>> A full-frame network trace of a mount attempt that times out would tell us if there is something pathological going on.
> 
> Thanks Chuck. Heres the wireshark screenshot of the network trace. As you can see, the SYN from client(10.1.12.45) to the server machine(192.168.1.117) receives a RST. At the client, it manifests as;
> 
> [root@centos6-1 ~]# mount 192.168.1.117:/posix /mnt
> mount.nfs: Connection timed out
> 
> Thats it. The client is Linux centos6-1 2.6.32-71.el6.x86_64
> 
> Does this point to a bug or is it expected? I was under the impression that the version 3 becomes the failback in case v4 is not available on the server.

I assume this connection attempt comes from the kernel's NFS client.  The RST should cause the mount(2) system call to return immediately with an error code, but it's very likely this edge case was never tested.

I suppose you could try a newer kernel (2.6.39 or 3.0) to see if it behaves any better.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
  2011-08-19 16:09         ` Chuck Lever
@ 2011-08-23  9:21           ` Shehjar Tikoo
  2011-08-26 15:43             ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Shehjar Tikoo @ 2011-08-23  9:21 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Steve Dickson, linux-nfs

Chuck Lever wrote:
> On Aug 19, 2011, at 2:23 AM, Shehjar Tikoo wrote:
> 
>> Chuck Lever wrote:
>>> On Aug 17, 2011, at 2:35 AM, Shehjar Tikoo wrote:
>>>> Steve Dickson wrote:
>>>>> On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
>>>>>> Hi All
>>>>>>
>>>>>> The following thread discusses the behaviour when the client does not support v4:
>>>>>> http://thread.gmane.org/gmane.linux.nfs/36928/
>>>>>>
>>>>>> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
>>>>> Here is the current logic as to what will cause a fall back:
>>>>>   switch (errno) {
>>>>>   case EPROTONOSUPPORT:
>>>>>       /* A clear indication that the server or our
>>>>>        * client does not support NFS version 4. */
>>>>>       goto fall_back;
>>>>>   case ENOENT:
>>>>>       /* Legacy Linux servers don't export an NFS
>>>>>        * version 4 pseudoroot. */
>>>>>       goto fall_back;
>>>>>   case EPERM:
>>>>>       /* Linux servers prior to 2.6.25 may return
>>>>>        * EPERM when NFS version 4 is not supported. */
>>>>>       goto fall_back;
>>>>>   default:
>>>>>       return result;
>>>>>   }
>>>>> fall_back:
>>>>>   return nfs_try_mount_v3v2(mi);
>>>>> So in the case of the Gluster server, you are dropping into the
>>>>> default case which is causing the time out.
>>>>> In the above patch set, Mi patches the mount code to fall back on EINVAL which is the current return value from the kernel, when v4 is not configured. I'm not totally against doing something like this, but this is very touchy  code since it could have negative effects on other legacy servers.
>>>>> So I'm thinking Mi's kernel patch that cause the kernel
>>>>> to return EPROTONOSUPPORT, which is the correct return
>>>>> value, is probably the better way to go... 
>>>> Thanks Steve. My understanding is that Mi's patch is to handle the case where the client does not support v4. Do you think the same patch will also handle a server that does not support v4 and hence prevents a client from connecting to 2049?
>>> It's a best practice for clients to connect to 2049 immediately, rather than querying the server's portmapper, to discover and potentially connect to a server's NFSv4 service.
>>> A full-frame network trace of a mount attempt that times out would tell us if there is something pathological going on.
>> Thanks Chuck. Heres the wireshark screenshot of the network trace. As you can see, the SYN from client(10.1.12.45) to the server machine(192.168.1.117) receives a RST. At the client, it manifests as;
>>
>> [root@centos6-1 ~]# mount 192.168.1.117:/posix /mnt
>> mount.nfs: Connection timed out
>>
>> Thats it. The client is Linux centos6-1 2.6.32-71.el6.x86_64
>>
>> Does this point to a bug or is it expected? I was under the impression that the version 3 becomes the failback in case v4 is not available on the server.
> 
> I assume this connection attempt comes from the kernel's NFS client.  The RST should cause the mount(2) system call to return immediately with an error code, but it's very likely this edge case was never tested.

Thanks Chuck.

Thats correct. The Linux NFS kernel client and server is Gluster NFS server.
> 
> I suppose you could try a newer kernel (2.6.39 or 3.0) to see if it behaves any better.

Same behavior with 2.6.38. Should a bug be filed?

Thanks
-Shehjar

> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
  2011-08-23  9:21           ` Shehjar Tikoo
@ 2011-08-26 15:43             ` Chuck Lever
  2011-08-26 20:19               ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2011-08-26 15:43 UTC (permalink / raw)
  To: Shehjar Tikoo; +Cc: Steve Dickson, linux-nfs


On Aug 23, 2011, at 5:21 AM, Shehjar Tikoo wrote:

> Chuck Lever wrote:
>> On Aug 19, 2011, at 2:23 AM, Shehjar Tikoo wrote:
>>> Chuck Lever wrote:
>>>> On Aug 17, 2011, at 2:35 AM, Shehjar Tikoo wrote:
>>>>> Steve Dickson wrote:
>>>>>> On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
>>>>>>> Hi All
>>>>>>> 
>>>>>>> The following thread discusses the behaviour when the client does not support v4:
>>>>>>> http://thread.gmane.org/gmane.linux.nfs/36928/
>>>>>>> 
>>>>>>> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
>>>>>> Here is the current logic as to what will cause a fall back:
>>>>>>  switch (errno) {
>>>>>>  case EPROTONOSUPPORT:
>>>>>>      /* A clear indication that the server or our
>>>>>>       * client does not support NFS version 4. */
>>>>>>      goto fall_back;
>>>>>>  case ENOENT:
>>>>>>      /* Legacy Linux servers don't export an NFS
>>>>>>       * version 4 pseudoroot. */
>>>>>>      goto fall_back;
>>>>>>  case EPERM:
>>>>>>      /* Linux servers prior to 2.6.25 may return
>>>>>>       * EPERM when NFS version 4 is not supported. */
>>>>>>      goto fall_back;
>>>>>>  default:
>>>>>>      return result;
>>>>>>  }
>>>>>> fall_back:
>>>>>>  return nfs_try_mount_v3v2(mi);
>>>>>> So in the case of the Gluster server, you are dropping into the
>>>>>> default case which is causing the time out.
>>>>>> In the above patch set, Mi patches the mount code to fall back on EINVAL which is the current return value from the kernel, when v4 is not configured. I'm not totally against doing something like this, but this is very touchy  code since it could have negative effects on other legacy servers.
>>>>>> So I'm thinking Mi's kernel patch that cause the kernel
>>>>>> to return EPROTONOSUPPORT, which is the correct return
>>>>>> value, is probably the better way to go... 
>>>>> Thanks Steve. My understanding is that Mi's patch is to handle the case where the client does not support v4. Do you think the same patch will also handle a server that does not support v4 and hence prevents a client from connecting to 2049?
>>>> It's a best practice for clients to connect to 2049 immediately, rather than querying the server's portmapper, to discover and potentially connect to a server's NFSv4 service.
>>>> A full-frame network trace of a mount attempt that times out would tell us if there is something pathological going on.
>>> Thanks Chuck. Heres the wireshark screenshot of the network trace. As you can see, the SYN from client(10.1.12.45) to the server machine(192.168.1.117) receives a RST. At the client, it manifests as;
>>> 
>>> [root@centos6-1 ~]# mount 192.168.1.117:/posix /mnt
>>> mount.nfs: Connection timed out
>>> 
>>> Thats it. The client is Linux centos6-1 2.6.32-71.el6.x86_64
>>> 
>>> Does this point to a bug or is it expected? I was under the impression that the version 3 becomes the failback in case v4 is not available on the server.
>> I assume this connection attempt comes from the kernel's NFS client.  The RST should cause the mount(2) system call to return immediately with an error code, but it's very likely this edge case was never tested.
> 
> Thanks Chuck.
> 
> Thats correct. The Linux NFS kernel client and server is Gluster NFS server.
>> I suppose you could try a newer kernel (2.6.39 or 3.0) to see if it behaves any better.
> 
> Same behavior with 2.6.38. Should a bug be filed?

I was able to reproduce this with 3.1-rc2.  My expectation was that the SOFTCONN support we added a few releases back would handle this case, but it isn't.  Looking into it now.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Expected response from server not supporting v4
  2011-08-26 15:43             ` Chuck Lever
@ 2011-08-26 20:19               ` Chuck Lever
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2011-08-26 20:19 UTC (permalink / raw)
  To: Shehjar Tikoo; +Cc: Steve Dickson, linux-nfs


On Aug 26, 2011, at 11:43 AM, Chuck Lever wrote:

> 
> On Aug 23, 2011, at 5:21 AM, Shehjar Tikoo wrote:
> 
>> Chuck Lever wrote:
>>> On Aug 19, 2011, at 2:23 AM, Shehjar Tikoo wrote:
>>>> Chuck Lever wrote:
>>>>> On Aug 17, 2011, at 2:35 AM, Shehjar Tikoo wrote:
>>>>>> Steve Dickson wrote:
>>>>>>> On 08/16/2011 04:01 AM, Shehjar Tikoo wrote:
>>>>>>>> Hi All
>>>>>>>> 
>>>>>>>> The following thread discusses the behaviour when the client does not support v4:
>>>>>>>> http://thread.gmane.org/gmane.linux.nfs/36928/
>>>>>>>> 
>>>>>>>> OTOH, when the server does not support v4, for eg. Gluster NFS server, where we support only v3, I believe v4 client will attempt to connect directly to port 2049 and receive connection failure errors on TCP. Does the current nfs client handle the situation where this results in a timeout for mount? We're hearing  a report of a timeout occurring on the RHEL6 client because the server does not have v4 support. Could someone please shed some light on how this behaviour is handled at present? Thanks
>>>>>>> Here is the current logic as to what will cause a fall back:
>>>>>>> switch (errno) {
>>>>>>> case EPROTONOSUPPORT:
>>>>>>>     /* A clear indication that the server or our
>>>>>>>      * client does not support NFS version 4. */
>>>>>>>     goto fall_back;
>>>>>>> case ENOENT:
>>>>>>>     /* Legacy Linux servers don't export an NFS
>>>>>>>      * version 4 pseudoroot. */
>>>>>>>     goto fall_back;
>>>>>>> case EPERM:
>>>>>>>     /* Linux servers prior to 2.6.25 may return
>>>>>>>      * EPERM when NFS version 4 is not supported. */
>>>>>>>     goto fall_back;
>>>>>>> default:
>>>>>>>     return result;
>>>>>>> }
>>>>>>> fall_back:
>>>>>>> return nfs_try_mount_v3v2(mi);
>>>>>>> So in the case of the Gluster server, you are dropping into the
>>>>>>> default case which is causing the time out.
>>>>>>> In the above patch set, Mi patches the mount code to fall back on EINVAL which is the current return value from the kernel, when v4 is not configured. I'm not totally against doing something like this, but this is very touchy  code since it could have negative effects on other legacy servers.
>>>>>>> So I'm thinking Mi's kernel patch that cause the kernel
>>>>>>> to return EPROTONOSUPPORT, which is the correct return
>>>>>>> value, is probably the better way to go... 
>>>>>> Thanks Steve. My understanding is that Mi's patch is to handle the case where the client does not support v4. Do you think the same patch will also handle a server that does not support v4 and hence prevents a client from connecting to 2049?
>>>>> It's a best practice for clients to connect to 2049 immediately, rather than querying the server's portmapper, to discover and potentially connect to a server's NFSv4 service.
>>>>> A full-frame network trace of a mount attempt that times out would tell us if there is something pathological going on.
>>>> Thanks Chuck. Heres the wireshark screenshot of the network trace. As you can see, the SYN from client(10.1.12.45) to the server machine(192.168.1.117) receives a RST. At the client, it manifests as;
>>>> 
>>>> [root@centos6-1 ~]# mount 192.168.1.117:/posix /mnt
>>>> mount.nfs: Connection timed out
>>>> 
>>>> Thats it. The client is Linux centos6-1 2.6.32-71.el6.x86_64
>>>> 
>>>> Does this point to a bug or is it expected? I was under the impression that the version 3 becomes the failback in case v4 is not available on the server.
>>> I assume this connection attempt comes from the kernel's NFS client.  The RST should cause the mount(2) system call to return immediately with an error code, but it's very likely this edge case was never tested.
>> 
>> Thanks Chuck.
>> 
>> Thats correct. The Linux NFS kernel client and server is Gluster NFS server.
>>> I suppose you could try a newer kernel (2.6.39 or 3.0) to see if it behaves any better.
>> 
>> Same behavior with 2.6.38. Should a bug be filed?
> 
> I was able to reproduce this with 3.1-rc2.  My expectation was that the SOFTCONN support we added a few releases back would handle this case, but it isn't.  Looking into it now.

It's the mount command that is retrying.  See nfs(5).  The reason it retries is that it assumes that the NFS server is still down after a server reboot.  You can disable the retry loop with "retry=0".

However, I think this is a server bug.  A RST means an NFS server is not running at all.  That's why our client retries.  The correct server response in this case is an RPC error reply that says the client has asked for an unsupported version, followed by a normal connection shutdown (FIN).  If the server replies that way, the client should understand what to do immediately.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-08-26 20:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-16  8:01 Expected response from server not supporting v4 Shehjar Tikoo
2011-08-16 12:50 ` Steve Dickson
2011-08-17  6:35   ` Shehjar Tikoo
2011-08-17 15:27     ` Chuck Lever
     [not found]       ` <4E4E0171.6010104@gluster.com>
2011-08-19 16:09         ` Chuck Lever
2011-08-23  9:21           ` Shehjar Tikoo
2011-08-26 15:43             ` Chuck Lever
2011-08-26 20:19               ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.