All of lore.kernel.org
 help / color / mirror / Atom feed
* Migration to self
@ 2016-08-30 11:11 Benjamin Coddington
  2016-08-30 11:13 ` [PATCH] NFS4: Avoid migration loops Benjamin Coddington
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Coddington @ 2016-08-30 11:11 UTC (permalink / raw)
  To: List Linux NFS Mailing

Hi all,

Our QE has found that they can crash a client by having a server migrate 
the
client to itself.  Who would do that in real life?  Is a server allowed 
to
refer to itself?

The client crashes because it starts a migration, sets the current
nfs_client's slot table to drain, probes the new nfs_client and gets 
another
NFS4ERR_MOVED, schedules another migration, and then sleeps on the 
client
waitq waiting for the state manager in error recovery.

Then the state manager starts another migration, sets this new 
nfs_client's slot
table to drain, and the probe_fsinfo() sleeps on the slot table.

Finally someone notices the client is hung, and rpc_killall_tasks kills 
off
everything on the slot table and frees the server, but then the 
probe_fsinfo
that was waiting on the state manager in error recovery wakes up and 
tries
to read bits from that freed nfs_server:

Unable to handle kernel paging request for data at address 
0x656465736b746fa0
Faulting instruction address: 0xd0000000046006a8
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache 
virtio_balloon nfsd auth_rpcgss nf      s_acl lockd grace sunrpc 
ip_tables xfs libcrc32c virtio_net virtio_console virtio_blk virtio_pci 
virtio_ring virtio       dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 12072 Comm: ::1-manager Not tainted 3.10.0-327.el7.ppc64 #1
task: c00000023678b010 ti: c00000002c3b8000 task.ti: c00000002c3b8000
NIP: d0000000046006a8 LR: d0000000046009bc CTR: c00000000048eba0
REGS: c00000002c3bb240 TRAP: 0300   Not tainted  (3.10.0-327.el7.ppc64)
MSR: 8000000100009032 <SF,EE,ME,IR,DR,RI>  CR: 28002024  XER: 20000000
CFAR: c000000000009368 DAR: 656465736b746fa0 DSISR: 40000000 SOFTE: 1
GPR00: d0000000046009bc c00000002c3bb4c0 d00000000468a9f0 
c00000002c3bb530
GPR04: c00000023380e800 c00000002c3bb640 c00000002c3bb660 
c00000002c3bb600
GPR08: c00000002c3bb660 656465736b746f70 c00000002c3bb600 
d00000000467cf10
GPR12: c00000000048eba0 c000000007b83600 c00000000010c220 
c000000233c8dc00
GPR16: c000000234800034 0000000000000010 c000000234a820f0 
0000000000000801
GPR20: c000000234810000 c000000233900000 c00000023380e800 
c00000000130fe00
GPR24: c000000003eb6600 0000000000000010 c000000237e645c0 
c00000002a9aa700
GPR28: c000000235eb0008 c000000233c89c00 c000000235eb0008 
c00000023380e800
NIP [d0000000046006a8] .nfs4_call_sync_sequence+0x48/0xa0 [nfsv4]
LR [d0000000046009bc] ._nfs4_server_capabilities+0x7c/0x2a0 [nfsv4]
Call Trace:
[c00000002c3bb4c0] [c000000000923240] .out_of_line_wait_on_bit+0xd0/0xe0 
(unreliable)
[c00000002c3bb590] [d0000000046009bc] 
._nfs4_server_capabilities+0x7c/0x2a0 [nfsv4]
[c00000002c3bb690] [d0000000046103fc] 
.nfs4_server_capabilities+0x3c/0x70 [nfsv4]
[c00000002c3bb730] [d0000000043e1f54] .nfs_probe_fsinfo+0x74/0x730 [nfs]
[c00000002c3bb830] [d00000000463b814] .nfs4_update_server+0x234/0x330 
[nfsv4]
[c00000002c3bba00] [d000000004638e00] 
.nfs4_replace_transport+0x200/0x370 [nfsv4]
[c00000002c3bbaf0] [d00000000462aab4] .nfs4_try_migration+0x244/0x360 
[nfsv4]
[c00000002c3bbb90] [d00000000462d328] .nfs4_state_manager+0x6b8/0xc00 
[nfsv4]
[c00000002c3bbcb0] [d00000000462d898] .nfs4_run_state_manager+0x28/0x50 
[nfsv4]
[c00000002c3bbd30] [c00000000010c308] .kthread+0xe8/0xf0
[c00000002c3bbe30] [c00000000000a470] .ret_from_kernel_thread+0x58/0x68

I would like to catch these sort of migration loops before we start
following them, and an easy place to do that is to check if the
nfs_server->nfs_client is already set and is the same as the found 
client in
nfs4_set_client().  Other than nfs4_try_migration(), the only other
callpaths for nfs4_set_client() are nfs4_init_server() and
nfs4_create_referral_server(), both of which have newly initialized 
servers.

Alternatively, a fix would need to have rpc_killall_tasks unwind tasks
waiting on the state manager in a series of migrations.

Any thoughts?  I'll follow up with a patch for nfs4_set_client() to 
return
an error rather than set the same client on the server.

Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] NFS4: Avoid migration loops
  2016-08-30 11:11 Migration to self Benjamin Coddington
@ 2016-08-30 11:13 ` Benjamin Coddington
  2016-08-30 12:53   ` Trond Myklebust
  2016-08-30 13:20   ` [PATCH v2] " Benjamin Coddington
  0 siblings, 2 replies; 6+ messages in thread
From: Benjamin Coddington @ 2016-08-30 11:13 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust, Anna Schumaker

If a server returns itself as a location while migrating the client may
end up getting stuck attempting to migrate twice to the same server.  Catch
this by checking if the nfs_client found is the same as the existing
client.  For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL);

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/nfs4client.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 8d7d08d4f95f..ec8afc43d849 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -817,6 +817,12 @@ static int nfs4_set_client(struct nfs_server *server,
 		goto error;
 	}
 
+	/* This client is already set, is there a migration loop? */
+	if (server->nfs_client == clp) {
+		error = -EEXIST;
+		goto error;
+	}
+
 	/*
 	 * Query for the lease time on clientid setup or renewal
 	 *
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS4: Avoid migration loops
  2016-08-30 11:13 ` [PATCH] NFS4: Avoid migration loops Benjamin Coddington
@ 2016-08-30 12:53   ` Trond Myklebust
  2016-08-30 13:06     ` Benjamin Coddington
  2016-08-30 13:20   ` [PATCH v2] " Benjamin Coddington
  1 sibling, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2016-08-30 12:53 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: List Linux NFS Mailing, Schumaker Anna


> On Aug 30, 2016, at 07:13, Benjamin Coddington <bcodding@redhat.com> wrot=
e:
>=20
> If a server returns itself as a location while migrating the client may
> end up getting stuck attempting to migrate twice to the same server.  Cat=
ch
> this by checking if the nfs_client found is the same as the existing
> client.  For the other two callers to nfs4_set_client, the nfs_client wil=
l
> always be ERR_PTR(-EINVAL);
>=20
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
> fs/nfs/nfs4client.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>=20
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 8d7d08d4f95f..ec8afc43d849 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -817,6 +817,12 @@ static int nfs4_set_client(struct nfs_server *server=
,
> =09=09goto error;
> =09}
>=20
> +=09/* This client is already set, is there a migration loop? */
> +=09if (server->nfs_client =3D=3D clp) {
> +=09=09error =3D -EEXIST;
> +=09=09goto error;
> +=09}
> +

Good catch, but why the choice of EEXIST? It sounds as if there is a good a=
rgument for ELOOP, since this situation would literally mirror the self-ref=
erencing symlink case.

Cheers
   Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] NFS4: Avoid migration loops
  2016-08-30 12:53   ` Trond Myklebust
@ 2016-08-30 13:06     ` Benjamin Coddington
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Coddington @ 2016-08-30 13:06 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: List Linux NFS Mailing, Schumaker Anna

On 30 Aug 2016, at 8:53, Trond Myklebust wrote:

>> On Aug 30, 2016, at 07:13, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> If a server returns itself as a location while migrating the client 
>> may
>> end up getting stuck attempting to migrate twice to the same server.  
>> Catch
>> this by checking if the nfs_client found is the same as the existing
>> client.  For the other two callers to nfs4_set_client, the nfs_client 
>> will
>> always be ERR_PTR(-EINVAL);
>>
>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>> ---
>> fs/nfs/nfs4client.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
>> index 8d7d08d4f95f..ec8afc43d849 100644
>> --- a/fs/nfs/nfs4client.c
>> +++ b/fs/nfs/nfs4client.c
>> @@ -817,6 +817,12 @@ static int nfs4_set_client(struct nfs_server 
>> *server,
>> 		goto error;
>> 	}
>>
>> +	/* This client is already set, is there a migration loop? */
>> +	if (server->nfs_client == clp) {
>> +		error = -EEXIST;
>> +		goto error;
>> +	}
>> +
>
> Good catch, but why the choice of EEXIST? It sounds as if there is a 
> good
> argument for ELOOP, since this situation would literally mirror the
> self-referencing symlink case.

I thought EEXIST more indicative of what nfs4_set_client() was noticing 
in
the local context, but I'll send you a v2 with ELOOP.  That will also be
more informative when the state manager logs the migration failure.

Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] NFS4: Avoid migration loops
  2016-08-30 11:13 ` [PATCH] NFS4: Avoid migration loops Benjamin Coddington
  2016-08-30 12:53   ` Trond Myklebust
@ 2016-08-30 13:20   ` Benjamin Coddington
  2016-08-30 16:39     ` Chuck Lever
  1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Coddington @ 2016-08-30 13:20 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust, Anna Schumaker

Change from v1:
	Use ELOOP rather than EEXIST, and drop the now-redundant comment.

8<-------------------------------------------------------------------------

If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server.  Catch
this by checking if the nfs_client found is the same as the existing
client.  For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/nfs4client.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 8d7d08d4f95f..cd3b7cfdde16 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -817,6 +817,11 @@ static int nfs4_set_client(struct nfs_server *server,
 		goto error;
 	}
 
+	if (server->nfs_client == clp) {
+		error = -ELOOP;
+		goto error;
+	}
+
 	/*
 	 * Query for the lease time on clientid setup or renewal
 	 *
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] NFS4: Avoid migration loops
  2016-08-30 13:20   ` [PATCH v2] " Benjamin Coddington
@ 2016-08-30 16:39     ` Chuck Lever
  0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2016-08-30 16:39 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Linux NFS Mailing List, Trond Myklebust, Anna Schumaker


> On Aug 30, 2016, at 9:20 AM, Benjamin Coddington <bcodding@redhat.com> =
wrote:
>=20
> Change from v1:
> 	Use ELOOP rather than EEXIST, and drop the now-redundant =
comment.
>=20
> =
8<------------------------------------------------------------------------=
-
>=20
> If a server returns itself as a location while migrating, the client =
may
> end up getting stuck attempting to migrate twice to the same server.  =
Catch
> this by checking if the nfs_client found is the same as the existing
> client.  For the other two callers to nfs4_set_client, the nfs_client =
will
> always be ERR_PTR(-EINVAL).
>=20
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>

I don't have any objection.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>


> ---
> fs/nfs/nfs4client.c | 5 +++++
> 1 file changed, 5 insertions(+)
>=20
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 8d7d08d4f95f..cd3b7cfdde16 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -817,6 +817,11 @@ static int nfs4_set_client(struct nfs_server =
*server,
> 		goto error;
> 	}
>=20
> +	if (server->nfs_client =3D=3D clp) {
> +		error =3D -ELOOP;
> +		goto error;
> +	}
> +
> 	/*
> 	 * Query for the lease time on clientid setup or renewal
> 	 *
> --=20
> 2.5.5
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-30 16:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-30 11:11 Migration to self Benjamin Coddington
2016-08-30 11:13 ` [PATCH] NFS4: Avoid migration loops Benjamin Coddington
2016-08-30 12:53   ` Trond Myklebust
2016-08-30 13:06     ` Benjamin Coddington
2016-08-30 13:20   ` [PATCH v2] " Benjamin Coddington
2016-08-30 16:39     ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.