All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] client cannot get lock after other client got lock occur network partition.
@ 2009-11-09  9:19 Mi Jinlong
  2009-11-09 13:16 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Mi Jinlong @ 2009-11-09  9:19 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: NFSv3 list, J. Bruce Fields

Hi Trond et all

There is a bug, when i test NFSv3 file's lock as followed:

Step1: ClientA and ClientB open a same nfs file;
Step2: ClientA locks file with write lock, it's ok;
Step3: Cut off the network between ClientA and Server;
Step4: ClientB can not acquire for write lock successful forever, even though
       the network partition larger than NLM_HOST_EXPIRE.

As i know, If use NFSv4, step4 can success after LEASE_TIME.

Is it necessary to fix NFSv3 ? 

The attached patch can make this case OK, but i am not sure it's good.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
---
 fs/lockd/host.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index 4600c20..c964327 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -550,8 +550,8 @@ nlm_gc_hosts(void)
 
 	for (chain = nlm_hosts; chain < nlm_hosts + NLM_HOST_NRHASH; ++chain) {
 		hlist_for_each_entry_safe(host, pos, next, chain, h_hash) {
-			if (atomic_read(&host->h_count) || host->h_inuse
-			 || time_before(jiffies, host->h_expires)) {
+			if (time_before(jiffies, host->h_expires)
+			    && (atomic_read(&host->h_count) || host->h_inuse))
 				dprintk("nlm_gc_hosts skipping %s (cnt %d use %d exp %ld)\n",
 					host->h_name, atomic_read(&host->h_count),
 					host->h_inuse, host->h_expires);
@@ -560,6 +560,7 @@ nlm_gc_hosts(void)
 			dprintk("lockd: delete host %s\n", host->h_name);
 			hlist_del_init(&host->h_hash);
 
+			nlmsvc_free_host_resources(host);
 			nlm_destroy_host(host);
 			nrhosts--;
 		}
---
thanks,
Mi Jinlong


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.
  2009-11-09  9:19 [RFC][PATCH] client cannot get lock after other client got lock occur network partition Mi Jinlong
@ 2009-11-09 13:16 ` Trond Myklebust
       [not found]   ` <1257772609.3754.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2009-11-09 13:16 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: NFSv3 list, J. Bruce Fields

On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
> Hi Trond et all
> 
> There is a bug, when i test NFSv3 file's lock as followed:
> 
> Step1: ClientA and ClientB open a same nfs file;
> Step2: ClientA locks file with write lock, it's ok;
> Step3: Cut off the network between ClientA and Server;
> Step4: ClientB can not acquire for write lock successful forever, even though
>        the network partition larger than NLM_HOST_EXPIRE.
> 
> As i know, If use NFSv4, step4 can success after LEASE_TIME.
> 
> Is it necessary to fix NFSv3 ? 
> 
> The attached patch can make this case OK, but i am not sure it's good.

Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
based, so the above scenario is truly an unfixable one.

The problem with applying your patch is, in essence, that we risk
breaking another scenario where a client grabs a lock, and then holds it
for a while.
The reason this breaks is that there is no equivalent in the NLM
protocol of the NFSv4 RENEW operation to tell the server that "This
client is still alive and wants you to keep its state".

Cheers,
  Trond



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.
       [not found]   ` <1257772609.3754.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-11-10  9:38     ` Mi Jinlong
  2009-11-10 12:35       ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Mi Jinlong @ 2009-11-10  9:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFSv3 list, J. Bruce Fields

Hi Trond

Trond Myklebust =8E=CA=93=B9:
> On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
>> Hi Trond et all
>>
>> There is a bug, when i test NFSv3 file's lock as followed:
>>
>> Step1: ClientA and ClientB open a same nfs file;
>> Step2: ClientA locks file with write lock, it's ok;
>> Step3: Cut off the network between ClientA and Server;
>> Step4: ClientB can not acquire for write lock successful forever, ev=
en though
>>        the network partition larger than NLM_HOST_EXPIRE.
>>
>> As i know, If use NFSv4, step4 can success after LEASE_TIME.
>>
>> Is it necessary to fix NFSv3 ?=20
>>
>> The attached patch can make this case OK, but i am not sure it's goo=
d.
>=20
> Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
> based, so the above scenario is truly an unfixable one.
>=20
> The problem with applying your patch is, in essence, that we risk
> breaking another scenario where a client grabs a lock, and then holds=
 it
> for a while.
> The reason this breaks is that there is no equivalent in the NLM
> protocol of the NFSv4 RENEW operation to tell the server that "This
> client is still alive and wants you to keep its state".

Thanks for your answer!

This bug seems serious, shouldn't we fix it?

thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.
  2009-11-10  9:38     ` Mi Jinlong
@ 2009-11-10 12:35       ` Trond Myklebust
       [not found]         ` <1257856550.3046.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2009-11-10 12:35 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: NFSv3 list, J. Bruce Fields

On Tue, 2009-11-10 at 17:38 +0800, Mi Jinlong wrote:
> Hi Trond
>=20
> Trond Myklebust =E5=86=99=E9=81=93:
> > On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
> >> Hi Trond et all
> >>
> >> There is a bug, when i test NFSv3 file's lock as followed:
> >>
> >> Step1: ClientA and ClientB open a same nfs file;
> >> Step2: ClientA locks file with write lock, it's ok;
> >> Step3: Cut off the network between ClientA and Server;
> >> Step4: ClientB can not acquire for write lock successful forever, =
even though
> >>        the network partition larger than NLM_HOST_EXPIRE.
> >>
> >> As i know, If use NFSv4, step4 can success after LEASE_TIME.
> >>
> >> Is it necessary to fix NFSv3 ?=20
> >>
> >> The attached patch can make this case OK, but i am not sure it's g=
ood.
> >=20
> > Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
> > based, so the above scenario is truly an unfixable one.
> >=20
> > The problem with applying your patch is, in essence, that we risk
> > breaking another scenario where a client grabs a lock, and then hol=
ds it
> > for a while.
> > The reason this breaks is that there is no equivalent in the NLM
> > protocol of the NFSv4 RENEW operation to tell the server that "This
> > client is still alive and wants you to keep its state".
>=20
> Thanks for your answer!
>=20
> This bug seems serious, shouldn't we fix it?

Unless you can think of a fix which works with the current NLM protocol=
,
I'd suggest simply encouraging people to move to a protocol with lease
based locks: i.e. NFSv4...

Cheers
  Trond


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.
       [not found]         ` <1257856550.3046.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-11-11  9:34           ` Mi Jinlong
  2009-11-11 14:02             ` Peter Staubach
  0 siblings, 1 reply; 6+ messages in thread
From: Mi Jinlong @ 2009-11-11  9:34 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFSv3 list, J. Bruce Fields

Hi Trond

Trond Myklebust =E5=86=99=E9=81=93:
> On Tue, 2009-11-10 at 17:38 +0800, Mi Jinlong wrote:
>> Hi Trond
>>
>> Trond Myklebust =E5=86=99=E9=81=93:
>>> On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
>>>> Hi Trond et all
>>>>
>>>> There is a bug, when i test NFSv3 file's lock as followed:
>>>>
>>>> Step1: ClientA and ClientB open a same nfs file;
>>>> Step2: ClientA locks file with write lock, it's ok;
>>>> Step3: Cut off the network between ClientA and Server;
>>>> Step4: ClientB can not acquire for write lock successful forever, =
even though
>>>>        the network partition larger than NLM_HOST_EXPIRE.
>>>>
>>>> As i know, If use NFSv4, step4 can success after LEASE_TIME.
>>>>
>>>> Is it necessary to fix NFSv3 ?=20
>>>>
>>>> The attached patch can make this case OK, but i am not sure it's g=
ood.
>>> Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
>>> based, so the above scenario is truly an unfixable one.
>>>
>>> The problem with applying your patch is, in essence, that we risk
>>> breaking another scenario where a client grabs a lock, and then hol=
ds it
>>> for a while.
>>> The reason this breaks is that there is no equivalent in the NLM
>>> protocol of the NFSv4 RENEW operation to tell the server that "This
>>> client is still alive and wants you to keep its state".
>> Thanks for your answer!
>>
>> This bug seems serious, shouldn't we fix it?
>=20
> Unless you can think of a fix which works with the current NLM protoc=
ol,
> I'd suggest simply encouraging people to move to a protocol with leas=
e
> based locks: i.e. NFSv4...

Can we add a process(like NFSv4's nfsd4) to call the nlm_gc_hosts() per=
iodically?
At nlm_gc_hosts, then call rpc_ping() to check whether network is OK, i=
f not,
its resource will be release.

thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.
  2009-11-11  9:34           ` Mi Jinlong
@ 2009-11-11 14:02             ` Peter Staubach
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Staubach @ 2009-11-11 14:02 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: Trond Myklebust, NFSv3 list, J. Bruce Fields

On 11/11/2009 04:34 AM, Mi Jinlong wrote:
> Hi Trond
>=20
> Trond Myklebust =E5=86=99=E9=81=93:
>> On Tue, 2009-11-10 at 17:38 +0800, Mi Jinlong wrote:
>>> Hi Trond
>>>
>>> Trond Myklebust =E5=86=99=E9=81=93:
>>>> On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
>>>>> Hi Trond et all
>>>>>
>>>>> There is a bug, when i test NFSv3 file's lock as followed:
>>>>>
>>>>> Step1: ClientA and ClientB open a same nfs file;
>>>>> Step2: ClientA locks file with write lock, it's ok;
>>>>> Step3: Cut off the network between ClientA and Server;
>>>>> Step4: ClientB can not acquire for write lock successful forever,=
 even though
>>>>>        the network partition larger than NLM_HOST_EXPIRE.
>>>>>
>>>>> As i know, If use NFSv4, step4 can success after LEASE_TIME.
>>>>>
>>>>> Is it necessary to fix NFSv3 ?=20
>>>>>
>>>>> The attached patch can make this case OK, but i am not sure it's =
good.
>>>> Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not leas=
e
>>>> based, so the above scenario is truly an unfixable one.
>>>>
>>>> The problem with applying your patch is, in essence, that we risk
>>>> breaking another scenario where a client grabs a lock, and then ho=
lds it
>>>> for a while.
>>>> The reason this breaks is that there is no equivalent in the NLM
>>>> protocol of the NFSv4 RENEW operation to tell the server that "Thi=
s
>>>> client is still alive and wants you to keep its state".
>>> Thanks for your answer!
>>>
>>> This bug seems serious, shouldn't we fix it?
>>
>> Unless you can think of a fix which works with the current NLM proto=
col,
>> I'd suggest simply encouraging people to move to a protocol with lea=
se
>> based locks: i.e. NFSv4...
>=20
> Can we add a process(like NFSv4's nfsd4) to call the nlm_gc_hosts() p=
eriodically?
> At nlm_gc_hosts, then call rpc_ping() to check whether network is OK,=
 if not,
> its resource will be release.
>=20

This would also violate the semantics that the current NLM has.
If, while holding the lock, the client does not need to contact
the server, it may not even notice the network partition and
will continue to expect that it holds the lock.

It might have been interesting to fix this problem about 20
years ago.  However, nowadays, we just live with it.  If it is
a real problem, then using NFSv4 can be a good solution.

		ps

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-11-11 14:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-09  9:19 [RFC][PATCH] client cannot get lock after other client got lock occur network partition Mi Jinlong
2009-11-09 13:16 ` Trond Myklebust
     [not found]   ` <1257772609.3754.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-11-10  9:38     ` Mi Jinlong
2009-11-10 12:35       ` Trond Myklebust
     [not found]         ` <1257856550.3046.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-11-11  9:34           ` Mi Jinlong
2009-11-11 14:02             ` Peter Staubach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.