All of lore.kernel.org
 help / color / mirror / Atom feed
* Random IO errors on nfs clients running linux > 4.20
@ 2020-04-29 17:15 Alberto Gonzalez Iniesta
  2020-04-30  6:08 ` Alberto Gonzalez Iniesta
  2020-04-30 17:32 ` J. Bruce Fields
  0 siblings, 2 replies; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-04-29 17:15 UTC (permalink / raw)
  To: linux-nfs; +Cc: Miguel Rodriguez, Isaac Marco Blancas

Hello NFS maintainers,

I'm sorry for reporting this (a little bit) late, but it took us (Miguel
in Cc:) some time to track this issue to an exact kernel update.

We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04
clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and
nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now
without issues.

But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04
some of them started experiencing failures while working on NFS mounts.
The failures are arbitrary and sometimes it may take more than 20 minutes
to come out (which made finding out which kernel version introduced
this a pain). We are almost sure that some directories are more prone to
suffer from this than others (maybe related to path length/chars?).

The error is also not very "verbose", from an strace:

execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0
[lots of uninteresting output]
openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, TCGETS, 0x7ffd8b725c80)        = -1 ENOTTY (Inappropriate ioctl for device)
getdents(3, /* 35 entries */, 32768)    = 1936
[lots of lstats)
lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
getdents(3, 0x561af78de890, 32768)      = -1 EIO (Input/output error)

(I can send you the full output if you need it)

We can run the previous "ls -lR" 20 times and get no error, or get
this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
(ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
now and then.

The error happens (obviously?) with ls, rsync and the users's GUI tools.

There's nothing in dmesg (or elsewhere).
These are the kernels with tried:
4.18.0-25   -> Can't reproduce
4.19.0      -> Can't reproduce
4.20.17     -> Happening (hard to reproduce)
5.0.0-15    -> Happening (hard to reproduce)
5.3.0-45    -> Happening (more frequently)
5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing

We did long (as in daylong) testing trying to reproduce this with all
those kernel versions, so we are pretty sure 4.18 and 4.19 don't
experience this and our Ubuntu 16.04 clients don't have any issue.

I know we aren't providing much info but we are really looking forward
to doing all the testing required (we already spent lots of time in it).

Thanks for your work.

Regards,

Alberto

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-04-29 17:15 Random IO errors on nfs clients running linux > 4.20 Alberto Gonzalez Iniesta
@ 2020-04-30  6:08 ` Alberto Gonzalez Iniesta
  2020-04-30 17:32 ` J. Bruce Fields
  1 sibling, 0 replies; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-04-30  6:08 UTC (permalink / raw)
  To: linux-nfs

On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
> Hello NFS maintainers,
>
> I know we aren't providing much info but we are really looking forward
> to doing all the testing required (we already spent lots of time in it).

Hi,

Sorry, I was providing way too little info...
We're using NFSv4 with kerberos, mounts are done with:
mount -t nfs4 -o sec=krb5p,exec,noauto pluto.XXXX:/publico /media/pluto

Regards,

Alberto

-- 
Alberto Gonzalez Iniesta    | Formación, consultoría y soporte técnico
mailto/sip: agi@inittab.org | en GNU/Linux y software libre
Encrypted mail preferred    | http://inittab.com

Key fingerprint = 5347 CBD8 3E30 A9EB 4D7D  4BF2 009B 3375 6B9A AA55

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-04-29 17:15 Random IO errors on nfs clients running linux > 4.20 Alberto Gonzalez Iniesta
  2020-04-30  6:08 ` Alberto Gonzalez Iniesta
@ 2020-04-30 17:32 ` J. Bruce Fields
  2020-09-09  9:29   ` Alberto Gonzalez Iniesta
  1 sibling, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2020-04-30 17:32 UTC (permalink / raw)
  To: Alberto Gonzalez Iniesta; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas

On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
> I'm sorry for reporting this (a little bit) late, but it took us (Miguel
> in Cc:) some time to track this issue to an exact kernel update.
> 
> We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04
> clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and
> nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now
> without issues.
> 
> But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04
> some of them started experiencing failures while working on NFS mounts.
> The failures are arbitrary and sometimes it may take more than 20 minutes
> to come out (which made finding out which kernel version introduced
> this a pain). We are almost sure that some directories are more prone to
> suffer from this than others (maybe related to path length/chars?).
> 
> The error is also not very "verbose", from an strace:
> 
> execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0
> [lots of uninteresting output]
> openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
> fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
> ioctl(1, TCGETS, 0x7ffd8b725c80)        = -1 ENOTTY (Inappropriate ioctl for device)
> getdents(3, /* 35 entries */, 32768)    = 1936
> [lots of lstats)
> lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> getdents(3, 0x561af78de890, 32768)      = -1 EIO (Input/output error)

Ideas off the top of my head....

It'd be really useful to get a network trace--something like tcpdump -s0
-wtmp.pcap -i<interface>, then reproduce the problem, then look through
it to see if you can find the READDIR or STAT or whatever that results
in the unexpected EIO.  But if takes a while to reproduce, that may be
difficult.

Is there anything in the logs?

It might be worth turning on some more debug logging--see the "rpcdebug"
command.

--b.

> 
> (I can send you the full output if you need it)
> 
> We can run the previous "ls -lR" 20 times and get no error, or get
> this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
> (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
> now and then.
> 
> The error happens (obviously?) with ls, rsync and the users's GUI tools.
> 
> There's nothing in dmesg (or elsewhere).
> These are the kernels with tried:
> 4.18.0-25   -> Can't reproduce
> 4.19.0      -> Can't reproduce
> 4.20.17     -> Happening (hard to reproduce)
> 5.0.0-15    -> Happening (hard to reproduce)
> 5.3.0-45    -> Happening (more frequently)
> 5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing
> 
> We did long (as in daylong) testing trying to reproduce this with all
> those kernel versions, so we are pretty sure 4.18 and 4.19 don't
> experience this and our Ubuntu 16.04 clients don't have any issue.
> 
> I know we aren't providing much info but we are really looking forward
> to doing all the testing required (we already spent lots of time in it).
> 
> Thanks for your work.
> 
> Regards,
> 
> Alberto
> 
> -- 
> Alberto González Iniesta             | Universidad a Distancia
> alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-04-30 17:32 ` J. Bruce Fields
@ 2020-09-09  9:29   ` Alberto Gonzalez Iniesta
  2020-09-09 13:47     ` J. Bruce Fields
  0 siblings, 1 reply; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-09-09  9:29 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas

[-- Attachment #1: Type: text/plain, Size: 4154 bytes --]

On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote:
> On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
> > I'm sorry for reporting this (a little bit) late, but it took us (Miguel
> > in Cc:) some time to track this issue to an exact kernel update.
> > 
> > We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04
> > clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and
> > nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now
> > without issues.
> > 
> > But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04
> > some of them started experiencing failures while working on NFS mounts.
> > The failures are arbitrary and sometimes it may take more than 20 minutes
> > to come out (which made finding out which kernel version introduced
> > this a pain). We are almost sure that some directories are more prone to
> > suffer from this than others (maybe related to path length/chars?).
> > 
> > The error is also not very "verbose", from an strace:
> > 
> > execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0
> > [lots of uninteresting output]
> > openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
> > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> > fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
> > ioctl(1, TCGETS, 0x7ffd8b725c80)        = -1 ENOTTY (Inappropriate ioctl for device)
> > getdents(3, /* 35 entries */, 32768)    = 1936
> > [lots of lstats)
> > lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> > getdents(3, 0x561af78de890, 32768)      = -1 EIO (Input/output error)
> 
> Ideas off the top of my head....
> 
> It'd be really useful to get a network trace--something like tcpdump -s0
> -wtmp.pcap -i<interface>, then reproduce the problem, then look through
> it to see if you can find the READDIR or STAT or whatever that results
> in the unexpected EIO.  But if takes a while to reproduce, that may be
> difficult.
> 
> Is there anything in the logs?
> 
> It might be worth turning on some more debug logging--see the "rpcdebug"
> command.

Hi, Bruce et at.

I'm sorry this reply took so long, but with debugging enabled the error
was harder to reproduce.

I'm attaching 3 log files (2 with just "nfs" debugging and one with
"nfs" and "rpc" debugging modules enabled).

I'm also attaching a pcap file, don't know if it would be useful since
we run "sec=krb5p".

Let me know if there's anything else I can test/provide.
These tests were done with Ubuntu's 5.3.0-53-generic.

Thanks,

Alberto

> > (I can send you the full output if you need it)
> > 
> > We can run the previous "ls -lR" 20 times and get no error, or get
> > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
> > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
> > now and then.
> > 
> > The error happens (obviously?) with ls, rsync and the users's GUI tools.
> > 
> > There's nothing in dmesg (or elsewhere).
> > These are the kernels with tried:
> > 4.18.0-25   -> Can't reproduce
> > 4.19.0      -> Can't reproduce
> > 4.20.17     -> Happening (hard to reproduce)
> > 5.0.0-15    -> Happening (hard to reproduce)
> > 5.3.0-45    -> Happening (more frequently)
> > 5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing
> > 
> > We did long (as in daylong) testing trying to reproduce this with all
> > those kernel versions, so we are pretty sure 4.18 and 4.19 don't
> > experience this and our Ubuntu 16.04 clients don't have any issue.
> > 
> > I know we aren't providing much info but we are really looking forward
> > to doing all the testing required (we already spent lots of time in it).
> > 
> > Thanks for your work.
> > 
> > Regards,
> > 
> > Alberto
> > 
> > -- 
> > Alberto González Iniesta             | Universidad a Distancia
> > alberto.gonzalez@udima.es            | de Madrid

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

[-- Attachment #2: err3.txt --]
[-- Type: text/plain, Size: 10137 bytes --]

Sep  8 16:03:23 portatil264 kernel: [15032.854134] RPC:  3283 call_decode result 0
Sep  8 16:03:23 portatil264 kernel: [15032.854140] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 16:03:23 portatil264 kernel: [15032.854142] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 16:03:23 portatil264 kernel: [15032.854145] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
Sep  8 16:03:23 portatil264 kernel: [15032.854147] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 16:03:23 portatil264 kernel: [15032.854148] nfs41_sequence_process: Error 0 free the slot 
Sep  8 16:03:23 portatil264 kernel: [15032.854150] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
Sep  8 16:03:23 portatil264 kernel: [15032.854152] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 16:03:23 portatil264 kernel: [15032.854154] RPC:  3283 return 0, status 0
Sep  8 16:03:23 portatil264 kernel: [15032.854155] RPC:  3283 release task
Sep  8 16:03:23 portatil264 kernel: [15032.854159] RPC:       freeing buffer of size 4584 at 00000000a3649daf
Sep  8 16:03:23 portatil264 kernel: [15032.854162] RPC:  3283 release request 0000000079df89b2
Sep  8 16:03:23 portatil264 kernel: [15032.854164] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
Sep  8 16:03:23 portatil264 kernel: [15032.854166] RPC:       rpc_release_client(00000000b930c343)
Sep  8 16:03:23 portatil264 kernel: [15032.854169] RPC:  3283 freeing task
Sep  8 16:03:23 portatil264 kernel: [15032.854179] NFS: nfs_update_inode(0:53/31982477 fh_crc=0x47b6a915 ct=1 info=0x427e7f)
Sep  8 16:03:23 portatil264 kernel: [15032.854183] NFS: (0:53/31982477) revalidation complete
Sep  8 16:03:23 portatil264 kernel: [15032.854186] NFS: dentry_delete(innovacion/Proyectos de investigación, 28080c)
Sep  8 16:03:23 portatil264 kernel: [15032.854240] NFS: readdir(departamentos/innovacion) starting at cookie 19
Sep  8 16:03:23 portatil264 kernel: [15032.854259] _nfs4_proc_readdir: dentry = departamentos/innovacion, cookie = 0
Sep  8 16:03:23 portatil264 kernel: [15032.854262] RPC:       new task initialized, procpid 26029
Sep  8 16:03:23 portatil264 kernel: [15032.854263] RPC:       allocated task 00000000493c7c0e
Sep  8 16:03:23 portatil264 kernel: [15032.854267] RPC:  3284 __rpc_execute flags=0x4080
Sep  8 16:03:23 portatil264 kernel: [15032.854269] --> nfs41_call_sync_prepare data->seq_server 00000000eb43854e
Sep  8 16:03:23 portatil264 kernel: [15032.854271] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 16:03:23 portatil264 kernel: [15032.854273] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 16:03:23 portatil264 kernel: [15032.854276] RPC:  3284 call_start nfs4 proc READDIR (sync)
Sep  8 16:03:23 portatil264 kernel: [15032.854278] RPC:  3284 call_reserve (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854281] RPC:  3284 reserved req 0000000079df89b2 xid 5dfa3876
Sep  8 16:03:23 portatil264 kernel: [15032.854282] RPC:  3284 call_reserveresult (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854284] RPC:  3284 call_refresh (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854287] RPC:  3284 call_refreshresult (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854289] RPC:  3284 call_allocate (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854292] RPC:  3284 allocated buffer of size 4144 at 00000000a3649daf
Sep  8 16:03:23 portatil264 kernel: [15032.854297] RPC:  3284 call_encode (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854301] RPC:       gss_get_mic_v2
Sep  8 16:03:23 portatil264 kernel: [15032.854319] encode_sequence: sessionid=1582481124:15749:9224:0 seqid=195 slotid=0 max_slotid=0 cache_this=0
Sep  8 16:03:23 portatil264 kernel: [15032.854323] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b0a23a:00000000
Sep  8 16:03:23 portatil264 kernel: [15032.854325] RPC:       gss_wrap_kerberos_v2
Sep  8 16:03:23 portatil264 kernel: [15032.854350] RPC:  3284 call_transmit (status 0)
Sep  8 16:03:23 portatil264 kernel: [15032.854352] RPC:  3284 xprt_prepare_transmit
Sep  8 16:03:23 portatil264 kernel: [15032.854389] RPC:       xs_tcp_send_request(272) = 0
Sep  8 16:03:23 portatil264 kernel: [15032.854392] RPC:       wake_up_first(000000001bb91292 "xprt_sending")
Sep  8 16:03:23 portatil264 kernel: [15032.854395] RPC:  3284 sleep_on(queue "xprt_pending" time 4298650492)
Sep  8 16:03:23 portatil264 kernel: [15032.854397] RPC:  3284 added to queue 00000000a9a9ecc7 "xprt_pending"
Sep  8 16:03:23 portatil264 kernel: [15032.854398] RPC:  3284 setting alarm for 60000 ms
Sep  8 16:03:23 portatil264 kernel: [15032.854401] RPC:  3284 sync task going to sleep
Sep  8 16:03:23 portatil264 kernel: [15032.935579] RPC:       xs_data_ready...
Sep  8 16:03:23 portatil264 kernel: [15032.939135] RPC:       xs_data_ready...
Sep  8 16:03:23 portatil264 kernel: [15032.941013] decode_attr_owner: uid=138184
Sep  8 16:03:23 portatil264 kernel: [15033.016025] RPC:       xs_data_ready...
Sep  8 16:03:23 portatil264 kernel: [15033.016117] RPC:       xs_data_ready...
Sep  8 16:03:23 portatil264 kernel: [15033.016124] RPC:  3284 xid 5dfa3876 complete (4064 bytes received)
Sep  8 16:03:23 portatil264 kernel: [15033.016129] RPC:  3284 __rpc_wake_up_task (now 4298650533)
Sep  8 16:03:23 portatil264 kernel: [15033.016131] RPC:  3284 disabling timer
Sep  8 16:03:23 portatil264 kernel: [15033.016136] RPC:  3284 removed from queue 00000000a9a9ecc7 "xprt_pending"
Sep  8 16:03:23 portatil264 kernel: [15033.016149] RPC:       __rpc_wake_up_task done
Sep  8 16:03:23 portatil264 kernel: [15033.016231] RPC:  3284 sync task resuming
Sep  8 16:03:23 portatil264 kernel: [15033.016234] RPC:  3284 call_status (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016235] RPC:  3284 call_decode (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016240] RPC:       gss_verify_mic_v2
Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
Sep  8 16:03:23 portatil264 kernel: [15033.016788] NFS: permission(0:53/28049412), mask=0x81, res=0
Sep  8 16:03:23 portatil264 kernel: [15033.016793] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid
Sep  8 16:03:23 portatil264 kernel: [15033.016797] NFS: permission(0:53/28051004), mask=0x81, res=0
Sep  8 16:03:23 portatil264 kernel: [15033.016801] NFS: revalidating (0:53/30801922)
Sep  8 16:03:23 portatil264 kernel: [15033.016808] RPC:       new task initialized, procpid 26029
Sep  8 16:03:23 portatil264 kernel: [15033.016810] RPC:       allocated task 000000005afe768d
Sep  8 16:03:23 portatil264 kernel: [15033.016814] RPC:  3285 __rpc_execute flags=0x4080
Sep  8 16:03:23 portatil264 kernel: [15033.016817] --> nfs41_call_sync_prepare data->seq_server 00000000eb43854e
Sep  8 16:03:23 portatil264 kernel: [15033.016820] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 16:03:23 portatil264 kernel: [15033.016823] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 16:03:23 portatil264 kernel: [15033.016826] RPC:  3285 call_start nfs4 proc GETATTR (sync)
Sep  8 16:03:23 portatil264 kernel: [15033.016829] RPC:  3285 call_reserve (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016832] RPC:  3285 reserved req 0000000079df89b2 xid 5efa3876
Sep  8 16:03:23 portatil264 kernel: [15033.016834] RPC:  3285 call_reserveresult (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016836] RPC:  3285 call_refresh (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016840] RPC:  3285 call_refreshresult (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016844] RPC:  3285 call_allocate (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016848] RPC:  3285 allocated buffer of size 4584 at 000000007624e941
Sep  8 16:03:23 portatil264 kernel: [15033.016850] RPC:  3285 call_encode (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016857] RPC:       gss_get_mic_v2
Sep  8 16:03:23 portatil264 kernel: [15033.016876] encode_sequence: sessionid=1582481124:15749:9224:0 seqid=196 slotid=0 max_slotid=0 cache_this=0
Sep  8 16:03:23 portatil264 kernel: [15033.016879] RPC:       gss_wrap_kerberos_v2
Sep  8 16:03:23 portatil264 kernel: [15033.016907] RPC:  3285 call_transmit (status 0)
Sep  8 16:03:23 portatil264 kernel: [15033.016908] RPC:  3285 xprt_prepare_transmit
Sep  8 16:03:23 portatil264 kernel: [15033.016941] RPC:       xs_tcp_send_request(248) = 0
Sep  8 16:03:23 portatil264 kernel: [15033.016944] RPC:       wake_up_first(000000001bb91292 "xprt_sending")
Sep  8 16:03:23 portatil264 kernel: [15033.016947] RPC:  3285 sleep_on(queue "xprt_pending" time 4298650533)
Sep  8 16:03:23 portatil264 kernel: [15033.016948] RPC:  3285 added to queue 00000000a9a9ecc7 "xprt_pending"
Sep  8 16:03:23 portatil264 kernel: [15033.016949] RPC:  3285 setting alarm for 60000 ms
Sep  8 16:03:23 portatil264 kernel: [15033.016952] RPC:  3285 sync task going to sleep

[-- Attachment #3: err2.txt --]
[-- Type: text/plain, Size: 8642 bytes --]

Sep  8 11:03:08 portatil264 kernel: [  294.429356] NFS: permission(0:53/28049412), mask=0x81, res=0
Sep  8 11:03:08 portatil264 kernel: [  294.429365] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid
Sep  8 11:03:08 portatil264 kernel: [  294.429369] NFS: permission(0:53/28051004), mask=0x81, res=0
Sep  8 11:03:08 portatil264 kernel: [  294.429372] NFS: nfs_lookup_revalidate_done(departamentos/innovacion) is valid
Sep  8 11:03:08 portatil264 kernel: [  294.429377] NFS: revalidating (0:53/30801924)
Sep  8 11:03:08 portatil264 kernel: [  294.429392] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:03:08 portatil264 kernel: [  294.429396] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:03:08 portatil264 kernel: [  294.429399] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:03:08 portatil264 kernel: [  294.429456] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=50 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:03:09 portatil264 kernel: [  294.480184] decode_attr_type: type=040000
Sep  8 11:03:09 portatil264 kernel: [  294.480187] decode_attr_change: change attribute=6810276040818143508
Sep  8 11:03:09 portatil264 kernel: [  294.480189] decode_attr_size: file size=4096
Sep  8 11:03:09 portatil264 kernel: [  294.480190] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:03:09 portatil264 kernel: [  294.480192] decode_attr_fileid: fileid=30801924
Sep  8 11:03:09 portatil264 kernel: [  294.480193] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:03:09 portatil264 kernel: [  294.480194] decode_attr_mode: file mode=02770
Sep  8 11:03:09 portatil264 kernel: [  294.480196] decode_attr_nlink: nlink=19
Sep  8 11:03:09 portatil264 kernel: [  294.480208] decode_attr_owner: uid=0
Sep  8 11:03:09 portatil264 kernel: [  294.480212] decode_attr_group: gid=709
Sep  8 11:03:09 portatil264 kernel: [  294.480213] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:03:09 portatil264 kernel: [  294.480215] decode_attr_space_used: space used=4096
Sep  8 11:03:09 portatil264 kernel: [  294.480216] decode_attr_time_access: atime=1599472206
Sep  8 11:03:09 portatil264 kernel: [  294.480217] decode_attr_time_metadata: ctime=1585640954
Sep  8 11:03:09 portatil264 kernel: [  294.480218] decode_attr_time_modify: mtime=1585640954
Sep  8 11:03:09 portatil264 kernel: [  294.480220] decode_attr_mounted_on_fileid: fileid=30801924
Sep  8 11:03:09 portatil264 kernel: [  294.480221] decode_getfattr_attrs: xdr returned 0
Sep  8 11:03:09 portatil264 kernel: [  294.480223] decode_getfattr_generic: xdr returned 0
Sep  8 11:03:09 portatil264 kernel: [  294.480228] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:03:09 portatil264 kernel: [  294.480230] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:03:09 portatil264 kernel: [  294.480232] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:03:09 portatil264 kernel: [  294.480233] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:03:09 portatil264 kernel: [  294.480234] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:03:09 portatil264 kernel: [  294.480246] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x427e7f)
Sep  8 11:03:09 portatil264 kernel: [  294.480249] NFS: (0:53/30801924) revalidation complete
Sep  8 11:03:09 portatil264 kernel: [  294.480252] NFS: dentry_delete(departamentos/innovacion, 28084c)
Sep  8 11:03:09 portatil264 kernel: [  294.480271] NFS: permission(0:53/28049412), mask=0x81, res=0
Sep  8 11:03:09 portatil264 kernel: [  294.480273] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid
Sep  8 11:03:09 portatil264 kernel: [  294.480275] NFS: permission(0:53/28051004), mask=0x81, res=0
Sep  8 11:03:09 portatil264 kernel: [  294.480277] NFS: revalidating (0:53/30801924)
Sep  8 11:03:09 portatil264 kernel: [  294.480282] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:03:09 portatil264 kernel: [  294.480284] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:03:09 portatil264 kernel: [  294.480285] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:03:09 portatil264 kernel: [  294.480306] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=51 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:03:09 portatil264 kernel: [  294.562935] decode_attr_type: type=040000
Sep  8 11:03:09 portatil264 kernel: [  294.562937] decode_attr_change: change attribute=6810276040818143508
Sep  8 11:03:09 portatil264 kernel: [  294.562939] decode_attr_size: file size=4096
Sep  8 11:03:09 portatil264 kernel: [  294.562940] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:03:09 portatil264 kernel: [  294.562941] decode_attr_fileid: fileid=30801924
Sep  8 11:03:09 portatil264 kernel: [  294.562942] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:03:09 portatil264 kernel: [  294.562943] decode_attr_mode: file mode=02770
Sep  8 11:03:09 portatil264 kernel: [  294.562944] decode_attr_nlink: nlink=19
Sep  8 11:03:09 portatil264 kernel: [  294.562950] decode_attr_owner: uid=0
Sep  8 11:03:09 portatil264 kernel: [  294.562954] decode_attr_group: gid=709
Sep  8 11:03:09 portatil264 kernel: [  294.562955] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:03:09 portatil264 kernel: [  294.562956] decode_attr_space_used: space used=4096
Sep  8 11:03:09 portatil264 kernel: [  294.562957] decode_attr_time_access: atime=1599472206
Sep  8 11:03:09 portatil264 kernel: [  294.562958] decode_attr_time_metadata: ctime=1585640954
Sep  8 11:03:09 portatil264 kernel: [  294.562961] decode_attr_time_modify: mtime=1585640954
Sep  8 11:03:09 portatil264 kernel: [  294.562962] decode_attr_mounted_on_fileid: fileid=30801924
Sep  8 11:03:09 portatil264 kernel: [  294.562963] decode_getfattr_attrs: xdr returned 0
Sep  8 11:03:09 portatil264 kernel: [  294.562964] decode_getfattr_generic: xdr returned 0
Sep  8 11:03:09 portatil264 kernel: [  294.562968] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:03:09 portatil264 kernel: [  294.562970] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:03:09 portatil264 kernel: [  294.562971] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:03:09 portatil264 kernel: [  294.562973] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:03:09 portatil264 kernel: [  294.562974] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:03:09 portatil264 kernel: [  294.562982] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x427e7f)
Sep  8 11:03:09 portatil264 kernel: [  294.562984] NFS: (0:53/30801924) revalidation complete
Sep  8 11:03:09 portatil264 kernel: [  294.562987] NFS: nfs_lookup_revalidate_done(departamentos/innovacion) is valid
Sep  8 11:03:09 portatil264 kernel: [  294.562991] NFS: permission(0:53/30801924), mask=0x24, res=0
Sep  8 11:03:09 portatil264 kernel: [  294.562995] NFS: open dir(departamentos/innovacion)
Sep  8 11:03:09 portatil264 kernel: [  294.563017] NFS: readdir(departamentos/innovacion) starting at cookie 0
Sep  8 11:03:09 portatil264 kernel: [  294.563029] _nfs4_proc_readdir: dentry = departamentos/innovacion, cookie = 0
Sep  8 11:03:09 portatil264 kernel: [  294.563034] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:03:09 portatil264 kernel: [  294.563036] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:03:09 portatil264 kernel: [  294.563037] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:03:09 portatil264 kernel: [  294.563057] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=52 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:03:09 portatil264 kernel: [  294.563060] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b0a23a:00000000
Sep  8 11:03:09 portatil264 kernel: [  294.643573] nfs41_sequence_process: Error 1 free the slot 
Sep  8 11:03:09 portatil264 kernel: [  294.643576] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:03:09 portatil264 kernel: [  294.643583] _nfs4_proc_readdir: returns -5
Sep  8 11:03:09 portatil264 kernel: [  294.643589] NFS: readdir(departamentos/innovacion) returns -5
Sep  8 11:03:09 portatil264 kernel: [  294.644109] NFS: dentry_delete(departamentos/innovacion, 28084c)
Sep  8 11:03:10 portatil264 kernel: [  296.012504] nfs4_renew_state: start
Sep  8 11:03:10 portatil264 kernel: [  296.012510] nfs4_schedule_state_renewal: requeueing work. Lease period = 59
Sep  8 11:03:10 portatil264 kernel: [  296.012513] nfs4_renew_state: done

[-- Attachment #4: err1.txt --]
[-- Type: text/plain, Size: 16867 bytes --]

Sep  8 11:02:03 portatil264 kernel: [  229.334583] FS-Cache: Loaded
Sep  8 11:02:03 portatil264 kernel: [  229.374147] FS-Cache: Netfs 'nfs' registered for caching
Sep  8 11:02:04 portatil264 kernel: [  229.528716] NFS: Registering the id_resolver key type
Sep  8 11:02:04 portatil264 kernel: [  229.528731] Key type id_resolver registered
Sep  8 11:02:04 portatil264 kernel: [  229.528732] Key type id_legacy registered
Sep  8 11:02:52 portatil264 kernel: [  278.316243] NFS: permission(0:53/28049412), mask=0x81, res=-10
Sep  8 11:02:52 portatil264 kernel: [  278.316247] NFS: revalidating (0:53/28049412)
Sep  8 11:02:52 portatil264 kernel: [  278.316260] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:02:52 portatil264 kernel: [  278.316263] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:02:52 portatil264 kernel: [  278.316265] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:02:52 portatil264 kernel: [  278.316309] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=44 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:02:52 portatil264 kernel: [  278.373736] decode_attr_type: type=040000
Sep  8 11:02:52 portatil264 kernel: [  278.373738] decode_attr_change: change attribute=6769953312283491654
Sep  8 11:02:52 portatil264 kernel: [  278.373739] decode_attr_size: file size=4096
Sep  8 11:02:52 portatil264 kernel: [  278.373741] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:02:52 portatil264 kernel: [  278.373742] decode_attr_fileid: fileid=28049412
Sep  8 11:02:52 portatil264 kernel: [  278.373743] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:02:52 portatil264 kernel: [  278.373745] decode_attr_mode: file mode=0755
Sep  8 11:02:52 portatil264 kernel: [  278.373746] decode_attr_nlink: nlink=6
Sep  8 11:02:52 portatil264 kernel: [  278.373757] decode_attr_owner: uid=0
Sep  8 11:02:52 portatil264 kernel: [  278.373761] decode_attr_group: gid=0
Sep  8 11:02:52 portatil264 kernel: [  278.373763] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:02:52 portatil264 kernel: [  278.373764] decode_attr_space_used: space used=4096
Sep  8 11:02:52 portatil264 kernel: [  278.373765] decode_attr_time_access: atime=1599538321
Sep  8 11:02:52 portatil264 kernel: [  278.373766] decode_attr_time_metadata: ctime=1576252587
Sep  8 11:02:52 portatil264 kernel: [  278.373767] decode_attr_time_modify: mtime=1576252587
Sep  8 11:02:52 portatil264 kernel: [  278.373768] decode_attr_mounted_on_fileid: fileid=28049412
Sep  8 11:02:52 portatil264 kernel: [  278.373770] decode_getfattr_attrs: xdr returned 0
Sep  8 11:02:52 portatil264 kernel: [  278.373771] decode_getfattr_generic: xdr returned 0
Sep  8 11:02:52 portatil264 kernel: [  278.373776] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:02:52 portatil264 kernel: [  278.373778] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:02:52 portatil264 kernel: [  278.373780] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:02:52 portatil264 kernel: [  278.373781] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:02:52 portatil264 kernel: [  278.373782] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:02:52 portatil264 kernel: [  278.373794] NFS: nfs_update_inode(0:53/28049412 fh_crc=0xc875a563 ct=2 info=0x427e7f)
Sep  8 11:02:52 portatil264 kernel: [  278.373797] NFS: (0:53/28049412) revalidation complete
Sep  8 11:02:52 portatil264 kernel: [  278.373800] NFS: permission(0:53/28049412), mask=0x1, res=0
Sep  8 11:02:52 portatil264 kernel: [  278.373806] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid
Sep  8 11:02:52 portatil264 kernel: [  278.373812] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:02:52 portatil264 kernel: [  278.373814] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:02:52 portatil264 kernel: [  278.373815] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:02:52 portatil264 kernel: [  278.373834] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=45 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:02:52 portatil264 kernel: [  278.430991] decode_attr_type: type=00
Sep  8 11:02:52 portatil264 kernel: [  278.430994] decode_attr_change: change attribute=6792489597889629835
Sep  8 11:02:52 portatil264 kernel: [  278.430995] decode_attr_size: file size=4096
Sep  8 11:02:52 portatil264 kernel: [  278.430996] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:02:52 portatil264 kernel: [  278.430997] decode_attr_fileid: fileid=0
Sep  8 11:02:52 portatil264 kernel: [  278.430998] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:02:52 portatil264 kernel: [  278.430999] decode_attr_mode: file mode=00
Sep  8 11:02:52 portatil264 kernel: [  278.431000] decode_attr_nlink: nlink=1
Sep  8 11:02:52 portatil264 kernel: [  278.431001] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:02:52 portatil264 kernel: [  278.431002] decode_attr_space_used: space used=0
Sep  8 11:02:52 portatil264 kernel: [  278.431003] decode_attr_time_access: atime=0
Sep  8 11:02:52 portatil264 kernel: [  278.431004] decode_attr_time_metadata: ctime=1581499725
Sep  8 11:02:52 portatil264 kernel: [  278.431005] decode_attr_time_modify: mtime=1581499725
Sep  8 11:02:52 portatil264 kernel: [  278.431006] decode_attr_mounted_on_fileid: fileid=0
Sep  8 11:02:52 portatil264 kernel: [  278.431007] decode_getfattr_attrs: xdr returned 0
Sep  8 11:02:52 portatil264 kernel: [  278.431008] decode_getfattr_generic: xdr returned 0
Sep  8 11:02:52 portatil264 kernel: [  278.431012] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:02:52 portatil264 kernel: [  278.431014] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:02:52 portatil264 kernel: [  278.431016] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:02:52 portatil264 kernel: [  278.431017] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:02:52 portatil264 kernel: [  278.431018] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:02:52 portatil264 kernel: [  278.431026] NFS: nfs_update_inode(0:53/28051004 fh_crc=0x3843eff0 ct=1 info=0x26040)
Sep  8 11:02:52 portatil264 kernel: [  278.431033] NFS: permission(0:53/28051004), mask=0x1, res=0
Sep  8 11:02:52 portatil264 kernel: [  278.431037] NFS: lookup(departamentos/innovacion)
Sep  8 11:02:52 portatil264 kernel: [  278.431039] NFS call  lookup innovacion
Sep  8 11:02:52 portatil264 kernel: [  278.431043] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:02:52 portatil264 kernel: [  278.431045] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:02:52 portatil264 kernel: [  278.431047] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:02:52 portatil264 kernel: [  278.431066] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=46 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:02:53 portatil264 kernel: [  278.585742] decode_attr_type: type=040000
Sep  8 11:02:53 portatil264 kernel: [  278.585745] decode_attr_change: change attribute=6810276040818143508
Sep  8 11:02:53 portatil264 kernel: [  278.585746] decode_attr_size: file size=4096
Sep  8 11:02:53 portatil264 kernel: [  278.585747] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:02:53 portatil264 kernel: [  278.585748] decode_attr_fileid: fileid=30801924
Sep  8 11:02:53 portatil264 kernel: [  278.585749] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:02:53 portatil264 kernel: [  278.585750] decode_attr_mode: file mode=02770
Sep  8 11:02:53 portatil264 kernel: [  278.585751] decode_attr_nlink: nlink=19
Sep  8 11:02:53 portatil264 kernel: [  278.585758] decode_attr_owner: uid=0
Sep  8 11:02:53 portatil264 kernel: [  278.898245] decode_attr_group: gid=709
Sep  8 11:02:53 portatil264 kernel: [  278.898249] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:02:53 portatil264 kernel: [  278.898250] decode_attr_space_used: space used=4096
Sep  8 11:02:53 portatil264 kernel: [  278.898251] decode_attr_time_access: atime=1599472206
Sep  8 11:02:53 portatil264 kernel: [  278.898252] decode_attr_time_metadata: ctime=1585640954
Sep  8 11:02:53 portatil264 kernel: [  278.898253] decode_attr_time_modify: mtime=1585640954
Sep  8 11:02:53 portatil264 kernel: [  278.898255] decode_attr_mounted_on_fileid: fileid=30801924
Sep  8 11:02:53 portatil264 kernel: [  278.898256] decode_getfattr_attrs: xdr returned 0
Sep  8 11:02:53 portatil264 kernel: [  278.898258] decode_getfattr_generic: xdr returned 0
Sep  8 11:02:53 portatil264 kernel: [  278.898266] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:02:53 portatil264 kernel: [  278.898268] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:02:53 portatil264 kernel: [  278.898270] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:02:53 portatil264 kernel: [  278.898271] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:02:53 portatil264 kernel: [  278.898273] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:02:53 portatil264 kernel: [  278.898281] NFS reply lookup: 0
Sep  8 11:02:53 portatil264 kernel: [  278.898292] NFS: nfs_fhget(0:53/30801924 fh_crc=0x21ccf1a0 ct=1)
Sep  8 11:02:53 portatil264 kernel: [  278.898302] NFS: dentry_delete(departamentos/innovacion, 20080c)
Sep  8 11:02:53 portatil264 kernel: [  278.898330] NFS: permission(0:53/28049412), mask=0x81, res=0
Sep  8 11:02:53 portatil264 kernel: [  278.898334] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid
Sep  8 11:02:53 portatil264 kernel: [  278.898336] NFS: permission(0:53/28051004), mask=0x81, res=0
Sep  8 11:02:53 portatil264 kernel: [  278.898338] NFS: revalidating (0:53/30801924)
Sep  8 11:02:53 portatil264 kernel: [  278.898349] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:02:53 portatil264 kernel: [  278.898351] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:02:53 portatil264 kernel: [  278.898352] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:02:53 portatil264 kernel: [  278.898393] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=47 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:02:53 portatil264 kernel: [  278.984901] decode_attr_type: type=040000
Sep  8 11:02:53 portatil264 kernel: [  278.984904] decode_attr_change: change attribute=6810276040818143508
Sep  8 11:02:53 portatil264 kernel: [  278.984905] decode_attr_size: file size=4096
Sep  8 11:02:53 portatil264 kernel: [  278.984906] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:02:53 portatil264 kernel: [  278.984907] decode_attr_fileid: fileid=30801924
Sep  8 11:02:53 portatil264 kernel: [  278.984909] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:02:53 portatil264 kernel: [  278.984910] decode_attr_mode: file mode=02770
Sep  8 11:02:53 portatil264 kernel: [  278.984911] decode_attr_nlink: nlink=19
Sep  8 11:02:53 portatil264 kernel: [  278.984919] decode_attr_owner: uid=0
Sep  8 11:02:53 portatil264 kernel: [  278.984922] decode_attr_group: gid=709
Sep  8 11:02:53 portatil264 kernel: [  278.984923] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:02:53 portatil264 kernel: [  278.984924] decode_attr_space_used: space used=4096
Sep  8 11:02:53 portatil264 kernel: [  278.984926] decode_attr_time_access: atime=1599472206
Sep  8 11:02:53 portatil264 kernel: [  278.984927] decode_attr_time_metadata: ctime=1585640954
Sep  8 11:02:53 portatil264 kernel: [  278.984928] decode_attr_time_modify: mtime=1585640954
Sep  8 11:02:53 portatil264 kernel: [  278.984929] decode_attr_mounted_on_fileid: fileid=30801924
Sep  8 11:02:53 portatil264 kernel: [  278.984930] decode_getfattr_attrs: xdr returned 0
Sep  8 11:02:53 portatil264 kernel: [  278.984931] decode_getfattr_generic: xdr returned 0
Sep  8 11:02:53 portatil264 kernel: [  278.984935] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:02:53 portatil264 kernel: [  278.984937] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:02:53 portatil264 kernel: [  278.984939] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:02:53 portatil264 kernel: [  278.984940] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:02:53 portatil264 kernel: [  278.984941] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:02:53 portatil264 kernel: [  278.984951] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x427e7f)
Sep  8 11:02:53 portatil264 kernel: [  278.984954] NFS: (0:53/30801924) revalidation complete
Sep  8 11:02:53 portatil264 kernel: [  278.984957] NFS: nfs_lookup_revalidate_done(departamentos/innovacion) is valid
Sep  8 11:02:53 portatil264 kernel: [  278.984964] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:02:53 portatil264 kernel: [  278.984966] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:02:53 portatil264 kernel: [  278.984968] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:02:53 portatil264 kernel: [  278.984989] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=48 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:02:53 portatil264 kernel: [  279.083208] decode_attr_type: type=00
Sep  8 11:02:53 portatil264 kernel: [  279.083210] decode_attr_change: change attribute=6810276040818143508
Sep  8 11:02:53 portatil264 kernel: [  279.083211] decode_attr_size: file size=4096
Sep  8 11:02:53 portatil264 kernel: [  279.083212] decode_attr_fsid: fsid=(0x0/0x0)
Sep  8 11:02:53 portatil264 kernel: [  279.083213] decode_attr_fileid: fileid=0
Sep  8 11:02:53 portatil264 kernel: [  279.083215] decode_attr_fs_locations: fs_locations done, error = 0
Sep  8 11:02:53 portatil264 kernel: [  279.083216] decode_attr_mode: file mode=00
Sep  8 11:02:53 portatil264 kernel: [  279.083217] decode_attr_nlink: nlink=1
Sep  8 11:02:53 portatil264 kernel: [  279.083218] decode_attr_rdev: rdev=(0x0:0x0)
Sep  8 11:02:53 portatil264 kernel: [  279.083219] decode_attr_space_used: space used=0
Sep  8 11:02:53 portatil264 kernel: [  279.083220] decode_attr_time_access: atime=0
Sep  8 11:02:53 portatil264 kernel: [  279.083221] decode_attr_time_metadata: ctime=1585640954
Sep  8 11:02:53 portatil264 kernel: [  279.083222] decode_attr_time_modify: mtime=1585640954
Sep  8 11:02:53 portatil264 kernel: [  279.083223] decode_attr_mounted_on_fileid: fileid=0
Sep  8 11:02:53 portatil264 kernel: [  279.083224] decode_getfattr_attrs: xdr returned 0
Sep  8 11:02:53 portatil264 kernel: [  279.083225] decode_getfattr_generic: xdr returned 0
Sep  8 11:02:53 portatil264 kernel: [  279.083230] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
Sep  8 11:02:53 portatil264 kernel: [  279.083232] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Sep  8 11:02:53 portatil264 kernel: [  279.083233] nfs4_free_slot: slotid 1 highest_used_slotid 0
Sep  8 11:02:53 portatil264 kernel: [  279.083234] nfs41_sequence_process: Error 0 free the slot 
Sep  8 11:02:53 portatil264 kernel: [  279.083236] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:02:53 portatil264 kernel: [  279.083244] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x26040)
Sep  8 11:02:53 portatil264 kernel: [  279.083249] NFS: permission(0:53/30801924), mask=0x24, res=0
Sep  8 11:02:53 portatil264 kernel: [  279.083253] NFS: open dir(departamentos/innovacion)
Sep  8 11:02:53 portatil264 kernel: [  279.083278] NFS: readdir(departamentos/innovacion) starting at cookie 0
Sep  8 11:02:53 portatil264 kernel: [  279.083293] _nfs4_proc_readdir: dentry = departamentos/innovacion, cookie = 0
Sep  8 11:02:53 portatil264 kernel: [  279.083298] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5
Sep  8 11:02:53 portatil264 kernel: [  279.083300] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
Sep  8 11:02:53 portatil264 kernel: [  279.083302] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Sep  8 11:02:53 portatil264 kernel: [  279.083326] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=49 slotid=0 max_slotid=0 cache_this=0
Sep  8 11:02:53 portatil264 kernel: [  279.083329] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b0a23a:00000000
Sep  8 11:02:53 portatil264 kernel: [  279.188491] nfs41_sequence_process: Error 1 free the slot 
Sep  8 11:02:53 portatil264 kernel: [  279.188494] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Sep  8 11:02:53 portatil264 kernel: [  279.188501] _nfs4_proc_readdir: returns -5
Sep  8 11:02:53 portatil264 kernel: [  279.188507] NFS: readdir(departamentos/innovacion) returns -5
Sep  8 11:02:53 portatil264 kernel: [  279.189204] NFS: dentry_delete(departamentos/innovacion, 28080c)

[-- Attachment #5: nfs_error.pcap --]
[-- Type: application/vnd.tcpdump.pcap, Size: 6496 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-09-09  9:29   ` Alberto Gonzalez Iniesta
@ 2020-09-09 13:47     ` J. Bruce Fields
  2020-10-26 13:42       ` Alberto Gonzalez Iniesta
  0 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2020-09-09 13:47 UTC (permalink / raw)
  To: Alberto Gonzalez Iniesta; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas

On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote:
> On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote:
> > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
> > > I'm sorry for reporting this (a little bit) late, but it took us (Miguel
> > > in Cc:) some time to track this issue to an exact kernel update.
> > > 
> > > We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04
> > > clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and
> > > nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now
> > > without issues.
> > > 
> > > But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04
> > > some of them started experiencing failures while working on NFS mounts.
> > > The failures are arbitrary and sometimes it may take more than 20 minutes
> > > to come out (which made finding out which kernel version introduced
> > > this a pain). We are almost sure that some directories are more prone to
> > > suffer from this than others (maybe related to path length/chars?).
> > > 
> > > The error is also not very "verbose", from an strace:
> > > 
> > > execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0
> > > [lots of uninteresting output]
> > > openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
> > > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> > > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> > > fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
> > > ioctl(1, TCGETS, 0x7ffd8b725c80)        = -1 ENOTTY (Inappropriate ioctl for device)
> > > getdents(3, /* 35 entries */, 32768)    = 1936
> > > [lots of lstats)
> > > lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> > > getdents(3, 0x561af78de890, 32768)      = -1 EIO (Input/output error)
> > 
> > Ideas off the top of my head....
> > 
> > It'd be really useful to get a network trace--something like tcpdump -s0
> > -wtmp.pcap -i<interface>, then reproduce the problem, then look through
> > it to see if you can find the READDIR or STAT or whatever that results
> > in the unexpected EIO.  But if takes a while to reproduce, that may be
> > difficult.
> > 
> > Is there anything in the logs?
> > 
> > It might be worth turning on some more debug logging--see the "rpcdebug"
> > command.
> 
> Hi, Bruce et at.
> 
> I'm sorry this reply took so long, but with debugging enabled the error
> was harder to reproduce.
> 
> I'm attaching 3 log files (2 with just "nfs" debugging and one with
> "nfs" and "rpc" debugging modules enabled).
> 
> I'm also attaching a pcap file, don't know if it would be useful since
> we run "sec=krb5p".

Yeah, not too useful.

> Let me know if there's anything else I can test/provide.
> These tests were done with Ubuntu's 5.3.0-53-generic.
> 
> Thanks,
> 
> Alberto
> 
> > > (I can send you the full output if you need it)
> > > 
> > > We can run the previous "ls -lR" 20 times and get no error, or get
> > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
> > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
> > > now and then.
> > > 
> > > The error happens (obviously?) with ls, rsync and the users's GUI tools.
> > > 
> > > There's nothing in dmesg (or elsewhere).
> > > These are the kernels with tried:
> > > 4.18.0-25   -> Can't reproduce
> > > 4.19.0      -> Can't reproduce
> > > 4.20.17     -> Happening (hard to reproduce)
> > > 5.0.0-15    -> Happening (hard to reproduce)
> > > 5.3.0-45    -> Happening (more frequently)
> > > 5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing
> > > 
> > > We did long (as in daylong) testing trying to reproduce this with all
> > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't
> > > experience this and our Ubuntu 16.04 clients don't have any issue.
> > > 
> > > I know we aren't providing much info but we are really looking forward
> > > to doing all the testing required (we already spent lots of time in it).
> > > 
> > > Thanks for your work.

So all I notice from this one is the readdir EIO came from call_decode.
I suspect that means it failed in the xdr decoding.  Looks like xdr
decoding of the actual directory data (which is the complicated part) is
done later, so this means it failed decoding the header or verifier,
which is a little odd:

> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-09-09 13:47     ` J. Bruce Fields
@ 2020-10-26 13:42       ` Alberto Gonzalez Iniesta
  2020-10-26 13:58         ` Chuck Lever
  2020-10-26 14:14         ` J. Bruce Fields
  0 siblings, 2 replies; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-10-26 13:42 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas

On Wed, Sep 09, 2020 at 09:47:27AM -0400, J. Bruce Fields wrote:
> On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote:
> > On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote:
> > > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
> > > > We can run the previous "ls -lR" 20 times and get no error, or get
> > > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
> > > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
> > > > now and then.
> > > > 
> > > > The error happens (obviously?) with ls, rsync and the users's GUI tools.
> > > > 
> > > > There's nothing in dmesg (or elsewhere).
> > > > These are the kernels with tried:
> > > > 4.18.0-25   -> Can't reproduce
> > > > 4.19.0      -> Can't reproduce
> > > > 4.20.17     -> Happening (hard to reproduce)
> > > > 5.0.0-15    -> Happening (hard to reproduce)
> > > > 5.3.0-45    -> Happening (more frequently)
> > > > 5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing
> > > > 
> > > > We did long (as in daylong) testing trying to reproduce this with all
> > > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't
> > > > experience this and our Ubuntu 16.04 clients don't have any issue.
> > > > 
> > > > I know we aren't providing much info but we are really looking forward
> > > > to doing all the testing required (we already spent lots of time in it).
> > > > 
> > > > Thanks for your work.
> 
> So all I notice from this one is the readdir EIO came from call_decode.
> I suspect that means it failed in the xdr decoding.  Looks like xdr
> decoding of the actual directory data (which is the complicated part) is
> done later, so this means it failed decoding the header or verifier,
> which is a little odd:
> 
> > Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> > Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> > Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> > Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> > Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> > Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> > Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> > Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> > Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> > Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> > Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> > Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> > Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5

Hi, Bruce et al.

Is there anything we can do to help debugging/fixing this? It's still
biting our users with a +4.20.x kernel.

Thanks,

Alberto

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 13:42       ` Alberto Gonzalez Iniesta
@ 2020-10-26 13:58         ` Chuck Lever
  2020-10-26 14:43           ` Alberto Gonzalez Iniesta
  2020-10-26 14:14         ` J. Bruce Fields
  1 sibling, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2020-10-26 13:58 UTC (permalink / raw)
  To: Alberto Gonzalez Iniesta
  Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez,
	Isaac Marco Blancas



> On Oct 26, 2020, at 9:42 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
> 
> On Wed, Sep 09, 2020 at 09:47:27AM -0400, J. Bruce Fields wrote:
>> On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote:
>>> On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote:
>>>> On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
>>>>> We can run the previous "ls -lR" 20 times and get no error, or get
>>>>> this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
>>>>> (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
>>>>> now and then.
>>>>> 
>>>>> The error happens (obviously?) with ls, rsync and the users's GUI tools.
>>>>> 
>>>>> There's nothing in dmesg (or elsewhere).
>>>>> These are the kernels with tried:
>>>>> 4.18.0-25   -> Can't reproduce
>>>>> 4.19.0      -> Can't reproduce
>>>>> 4.20.17     -> Happening (hard to reproduce)
>>>>> 5.0.0-15    -> Happening (hard to reproduce)
>>>>> 5.3.0-45    -> Happening (more frequently)
>>>>> 5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing
>>>>> 
>>>>> We did long (as in daylong) testing trying to reproduce this with all
>>>>> those kernel versions, so we are pretty sure 4.18 and 4.19 don't
>>>>> experience this and our Ubuntu 16.04 clients don't have any issue.
>>>>> 
>>>>> I know we aren't providing much info but we are really looking forward
>>>>> to doing all the testing required (we already spent lots of time in it).
>>>>> 
>>>>> Thanks for your work.
>> 
>> So all I notice from this one is the readdir EIO came from call_decode.
>> I suspect that means it failed in the xdr decoding.  Looks like xdr
>> decoding of the actual directory data (which is the complicated part) is
>> done later, so this means it failed decoding the header or verifier,
>> which is a little odd:
>> 
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
> 
> Hi, Bruce et al.
> 
> Is there anything we can do to help debugging/fixing this? It's still
> biting our users with a +4.20.x kernel.

Alberto, can you share a snippet of a raw network capture that shows
the READDIR Reply that fails to decode?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 13:42       ` Alberto Gonzalez Iniesta
  2020-10-26 13:58         ` Chuck Lever
@ 2020-10-26 14:14         ` J. Bruce Fields
  2020-10-26 14:54           ` Alberto Gonzalez Iniesta
  1 sibling, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2020-10-26 14:14 UTC (permalink / raw)
  To: Alberto Gonzalez Iniesta; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas

On Mon, Oct 26, 2020 at 02:42:16PM +0100, Alberto Gonzalez Iniesta wrote:
> On Wed, Sep 09, 2020 at 09:47:27AM -0400, J. Bruce Fields wrote:
> > On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote:
> > > On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote:
> > > > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote:
> > > > > We can run the previous "ls -lR" 20 times and get no error, or get
> > > > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
> > > > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
> > > > > now and then.
> > > > > 
> > > > > The error happens (obviously?) with ls, rsync and the users's GUI tools.
> > > > > 
> > > > > There's nothing in dmesg (or elsewhere).
> > > > > These are the kernels with tried:
> > > > > 4.18.0-25   -> Can't reproduce
> > > > > 4.19.0      -> Can't reproduce
> > > > > 4.20.17     -> Happening (hard to reproduce)
> > > > > 5.0.0-15    -> Happening (hard to reproduce)
> > > > > 5.3.0-45    -> Happening (more frequently)
> > > > > 5.6.0-rc7   -> Reproduced a couple of times after boot, then nothing
> > > > > 
> > > > > We did long (as in daylong) testing trying to reproduce this with all
> > > > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't
> > > > > experience this and our Ubuntu 16.04 clients don't have any issue.
> > > > > 
> > > > > I know we aren't providing much info but we are really looking forward
> > > > > to doing all the testing required (we already spent lots of time in it).
> > > > > 
> > > > > Thanks for your work.
> > 
> > So all I notice from this one is the readdir EIO came from call_decode.
> > I suspect that means it failed in the xdr decoding.  Looks like xdr
> > decoding of the actual directory data (which is the complicated part) is
> > done later, so this means it failed decoding the header or verifier,
> > which is a little odd:
> > 
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> > > Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
> 
> Hi, Bruce et al.
> 
> Is there anything we can do to help debugging/fixing this? It's still
> biting our users with a +4.20.x kernel.

Sorry, I just don't know what this is off the top of my head.  If I had
the time, stuff I might try:

	- look at the wire traffic with wireshark: try to figure out
	  which operation this is happening on, and if there's anything
	  unusual about the reply.  May be difficult if a lot of traffic
	  is required to reproduce.
	- if you're using krb5, try without just to see if that makes a
	  difference.
	- look at history: gitk v4.19..v4.20.17 fs/nfs net/sunrpc, see
	  if anything looks relevant; or even just try a git bisect
	  (difficult again given the intermittent failure).
	- trace through the code to work out where call_decode might be
	  returning -EIO, try to narrow it down with tracepoints or even
	  just debugging printk's.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 13:58         ` Chuck Lever
@ 2020-10-26 14:43           ` Alberto Gonzalez Iniesta
  2020-10-26 14:46             ` Chuck Lever
  0 siblings, 1 reply; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-10-26 14:43 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez,
	Isaac Marco Blancas

On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
> >> So all I notice from this one is the readdir EIO came from call_decode.
> >> I suspect that means it failed in the xdr decoding.  Looks like xdr
> >> decoding of the actual directory data (which is the complicated part) is
> >> done later, so this means it failed decoding the header or verifier,
> >> which is a little odd:
> >> 
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> >>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
> > 
> > Hi, Bruce et al.
> > 
> > Is there anything we can do to help debugging/fixing this? It's still
> > biting our users with a +4.20.x kernel.
> 
> Alberto, can you share a snippet of a raw network capture that shows
> the READDIR Reply that fails to decode?

Hi, Chuck.

Thanks for your reply. We're using "sec=krb5p", which makes the network
capture useless :-(
I'm afraid it's quite difficult to disable that, since we have +200 NFS
clients on production.

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 14:43           ` Alberto Gonzalez Iniesta
@ 2020-10-26 14:46             ` Chuck Lever
  2020-10-26 14:54               ` Alberto Gonzalez Iniesta
  2020-10-26 15:02               ` Bruce Fields
  0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2020-10-26 14:46 UTC (permalink / raw)
  To: Alberto Gonzalez Iniesta
  Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez,
	Isaac Marco Blancas



> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
> 
> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
>>>> So all I notice from this one is the readdir EIO came from call_decode.
>>>> I suspect that means it failed in the xdr decoding.  Looks like xdr
>>>> decoding of the actual directory data (which is the complicated part) is
>>>> done later, so this means it failed decoding the header or verifier,
>>>> which is a little odd:
>>>> 
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
>>> 
>>> Hi, Bruce et al.
>>> 
>>> Is there anything we can do to help debugging/fixing this? It's still
>>> biting our users with a +4.20.x kernel.
>> 
>> Alberto, can you share a snippet of a raw network capture that shows
>> the READDIR Reply that fails to decode?
> 
> Hi, Chuck.
> 
> Thanks for your reply. We're using "sec=krb5p", which makes the network
> capture useless :-(

You can plug keytabs into Wireshark to enable it to decrypt the traffic.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 14:14         ` J. Bruce Fields
@ 2020-10-26 14:54           ` Alberto Gonzalez Iniesta
  0 siblings, 0 replies; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-10-26 14:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas

On Mon, Oct 26, 2020 at 10:14:52AM -0400, J. Bruce Fields wrote:
> Sorry, I just don't know what this is off the top of my head.  If I had
> the time, stuff I might try:
> 
> 	- look at the wire traffic with wireshark: try to figure out
> 	  which operation this is happening on, and if there's anything
> 	  unusual about the reply.  May be difficult if a lot of traffic
> 	  is required to reproduce.
> 	- if you're using krb5, try without just to see if that makes a
> 	  difference.
	
Using "sec=krb5p" here... And with lots of active clients. :-/

Oh! Could I add another entry in /etc/exports with krb5 without afecting
current krb5p clients?

> 	- look at history: gitk v4.19..v4.20.17 fs/nfs net/sunrpc, see
> 	  if anything looks relevant; or even just try a git bisect
> 	  (difficult again given the intermittent failure).
> 	- trace through the code to work out where call_decode might be
> 	  returning -EIO, try to narrow it down with tracepoints or even
> 	  just debugging printk's.

I'm afraid I don't have the skills to carry out those tasks, haven't
coded in C for decades, not to mention kernel/nfs hacking.
I guess we'll have to wait until someone else hits this too, or we
update the server and (hopefully/miraculously) fixes it self :-)

Thanks a lot,

Alberto

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 14:46             ` Chuck Lever
@ 2020-10-26 14:54               ` Alberto Gonzalez Iniesta
  2020-10-26 15:02               ` Bruce Fields
  1 sibling, 0 replies; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-10-26 14:54 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez,
	Isaac Marco Blancas

On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote:
> 
> 
> > On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
> > 
> > On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
> >>>> So all I notice from this one is the readdir EIO came from call_decode.
> >>>> I suspect that means it failed in the xdr decoding.  Looks like xdr
> >>>> decoding of the actual directory data (which is the complicated part) is
> >>>> done later, so this means it failed decoding the header or verifier,
> >>>> which is a little odd:
> >>>> 
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
> >>> 
> >>> Hi, Bruce et al.
> >>> 
> >>> Is there anything we can do to help debugging/fixing this? It's still
> >>> biting our users with a +4.20.x kernel.
> >> 
> >> Alberto, can you share a snippet of a raw network capture that shows
> >> the READDIR Reply that fails to decode?
> > 
> > Hi, Chuck.
> > 
> > Thanks for your reply. We're using "sec=krb5p", which makes the network
> > capture useless :-(
> 
> You can plug keytabs into Wireshark to enable it to decrypt the traffic.
> 

Cool, I'll look into that then.

Thanks!

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 14:46             ` Chuck Lever
  2020-10-26 14:54               ` Alberto Gonzalez Iniesta
@ 2020-10-26 15:02               ` Bruce Fields
  2020-10-26 15:06                 ` Chuck Lever
  1 sibling, 1 reply; 16+ messages in thread
From: Bruce Fields @ 2020-10-26 15:02 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Alberto Gonzalez Iniesta, Linux NFS Mailing List,
	Miguel Rodriguez, Isaac Marco Blancas

On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote:
> 
> 
> > On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
> > 
> > On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
> >>>> So all I notice from this one is the readdir EIO came from call_decode.
> >>>> I suspect that means it failed in the xdr decoding.  Looks like xdr
> >>>> decoding of the actual directory data (which is the complicated part) is
> >>>> done later, so this means it failed decoding the header or verifier,
> >>>> which is a little odd:
> >>>> 
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> >>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
> >>> 
> >>> Hi, Bruce et al.
> >>> 
> >>> Is there anything we can do to help debugging/fixing this? It's still
> >>> biting our users with a +4.20.x kernel.
> >> 
> >> Alberto, can you share a snippet of a raw network capture that shows
> >> the READDIR Reply that fails to decode?
> > 
> > Hi, Chuck.
> > 
> > Thanks for your reply. We're using "sec=krb5p", which makes the network
> > capture useless :-(
> 
> You can plug keytabs into Wireshark to enable it to decrypt the traffic.

Just skimming that range of history, there's some changes to the
handling of gss sequence numbers, I wonder if there's a chance he could
be hitting that?  You had a workload that would lead to calls dropping
out of the sequence number window, didn't you, Chuck?  Is there a quick
way to check whether that's happening?

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 15:02               ` Bruce Fields
@ 2020-10-26 15:06                 ` Chuck Lever
  2020-10-26 15:26                   ` Alberto Gonzalez Iniesta
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2020-10-26 15:06 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Alberto Gonzalez Iniesta, Linux NFS Mailing List,
	Miguel Rodriguez, Isaac Marco Blancas



> On Oct 26, 2020, at 11:02 AM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote:
>> 
>> 
>>> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
>>> 
>>> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
>>>>>> So all I notice from this one is the readdir EIO came from call_decode.
>>>>>> I suspect that means it failed in the xdr decoding.  Looks like xdr
>>>>>> decoding of the actual directory data (which is the complicated part) is
>>>>>> done later, so this means it failed decoding the header or verifier,
>>>>>> which is a little odd:
>>>>>> 
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
>>>>> 
>>>>> Hi, Bruce et al.
>>>>> 
>>>>> Is there anything we can do to help debugging/fixing this? It's still
>>>>> biting our users with a +4.20.x kernel.
>>>> 
>>>> Alberto, can you share a snippet of a raw network capture that shows
>>>> the READDIR Reply that fails to decode?
>>> 
>>> Hi, Chuck.
>>> 
>>> Thanks for your reply. We're using "sec=krb5p", which makes the network
>>> capture useless :-(
>> 
>> You can plug keytabs into Wireshark to enable it to decrypt the traffic.
> 
> Just skimming that range of history, there's some changes to the
> handling of gss sequence numbers, I wonder if there's a chance he could
> be hitting that?  You had a workload that would lead to calls dropping
> out of the sequence number window, didn't you, Chuck?  Is there a quick
> way to check whether that's happening?

The server is supposed to drop the connection when that happens, though
I'm not sure 4.20's NFSD was perfect in that regard. Connection loss in
itself wouldn't result in EIO.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 15:06                 ` Chuck Lever
@ 2020-10-26 15:26                   ` Alberto Gonzalez Iniesta
  2020-10-26 17:23                     ` Chuck Lever
  0 siblings, 1 reply; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-10-26 15:26 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez,
	Isaac Marco Blancas

On Mon, Oct 26, 2020 at 11:06:24AM -0400, Chuck Lever wrote:
> 
> 
> > On Oct 26, 2020, at 11:02 AM, Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote:
> >> 
> >> 
> >>> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
> >>> 
> >>> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
> >>>>>> So all I notice from this one is the readdir EIO came from call_decode.
> >>>>>> I suspect that means it failed in the xdr decoding.  Looks like xdr
> >>>>>> decoding of the actual directory data (which is the complicated part) is
> >>>>>> done later, so this means it failed decoding the header or verifier,
> >>>>>> which is a little odd:
> >>>>>> 
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
> >>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
> >>>>> 
> >>>>> Hi, Bruce et al.
> >>>>> 
> >>>>> Is there anything we can do to help debugging/fixing this? It's still
> >>>>> biting our users with a +4.20.x kernel.
> >>>> 
> >>>> Alberto, can you share a snippet of a raw network capture that shows
> >>>> the READDIR Reply that fails to decode?
> >>> 
> >>> Hi, Chuck.
> >>> 
> >>> Thanks for your reply. We're using "sec=krb5p", which makes the network
> >>> capture useless :-(
> >> 
> >> You can plug keytabs into Wireshark to enable it to decrypt the traffic.
> > 
> > Just skimming that range of history, there's some changes to the
> > handling of gss sequence numbers, I wonder if there's a chance he could
> > be hitting that?  You had a workload that would lead to calls dropping
> > out of the sequence number window, didn't you, Chuck?  Is there a quick
> > way to check whether that's happening?
> 
> The server is supposed to drop the connection when that happens, though
> I'm not sure 4.20's NFSD was perfect in that regard. Connection loss in
> itself wouldn't result in EIO.

In case this is relevant, server is running 3.16.0. Clients (with
issues) +4.20.

-- 
Alberto González Iniesta             | Universidad a Distancia
alberto.gonzalez@udima.es            | de Madrid

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Random IO errors on nfs clients running linux > 4.20
  2020-10-26 15:26                   ` Alberto Gonzalez Iniesta
@ 2020-10-26 17:23                     ` Chuck Lever
  0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2020-10-26 17:23 UTC (permalink / raw)
  To: Alberto Gonzalez Iniesta
  Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez,
	Isaac Marco Blancas



> On Oct 26, 2020, at 11:26 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
> 
> On Mon, Oct 26, 2020 at 11:06:24AM -0400, Chuck Lever wrote:
>> 
>> 
>>> On Oct 26, 2020, at 11:02 AM, Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote:
>>>> 
>>>> 
>>>>> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote:
>>>>> 
>>>>> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote:
>>>>>>>> So all I notice from this one is the readdir EIO came from call_decode.
>>>>>>>> I suspect that means it failed in the xdr decoding.  Looks like xdr
>>>>>>>> decoding of the actual directory data (which is the complicated part) is
>>>>>>>> done later, so this means it failed decoding the header or verifier,
>>>>>>>> which is a little odd:
>>>>>>>> 
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016276] RPC:  3284 call_decode result -5
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot 
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016286] RPC:       wake_up_first(00000000d3f50f4d "ForeChannel Slot table")
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016290] RPC:  3284 return 0, status -5
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016291] RPC:  3284 release task
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016295] RPC:       freeing buffer of size 4144 at 00000000a3649daf
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016298] RPC:  3284 release request 0000000079df89b2
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016300] RPC:       wake_up_first(00000000c5ee49ee "xprt_backlog")
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016302] RPC:       rpc_release_client(00000000b930c343)
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016304] RPC:  3284 freeing task
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5
>>>>>>>>> Sep  8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5
>>>>>>> 
>>>>>>> Hi, Bruce et al.
>>>>>>> 
>>>>>>> Is there anything we can do to help debugging/fixing this? It's still
>>>>>>> biting our users with a +4.20.x kernel.
>>>>>> 
>>>>>> Alberto, can you share a snippet of a raw network capture that shows
>>>>>> the READDIR Reply that fails to decode?
>>>>> 
>>>>> Hi, Chuck.
>>>>> 
>>>>> Thanks for your reply. We're using "sec=krb5p", which makes the network
>>>>> capture useless :-(
>>>> 
>>>> You can plug keytabs into Wireshark to enable it to decrypt the traffic.
>>> 
>>> Just skimming that range of history, there's some changes to the
>>> handling of gss sequence numbers, I wonder if there's a chance he could
>>> be hitting that?  You had a workload that would lead to calls dropping
>>> out of the sequence number window, didn't you, Chuck?  Is there a quick
>>> way to check whether that's happening?
>> 
>> The server is supposed to drop the connection when that happens, though
>> I'm not sure 4.20's NFSD was perfect in that regard. Connection loss in
>> itself wouldn't result in EIO.
> 
> In case this is relevant, server is running 3.16.0. Clients (with
> issues) +4.20.

Ah, I see. Well that's an old kernel. Have you engaged your distributor?
They might be able to provide builds with debugging instrumentation, for
example, if we can give them some instructions or a patch.

My experience tells me that this is probably an issue with either the
server's GSS wrap function, or the client's GSS unwrap function, if
you don't ever see this failure without krb5p in the picture.

If you don't see this problem on older clients, than I would start
looking at the client's GSS unwrap function, fwiw.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-26 17:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29 17:15 Random IO errors on nfs clients running linux > 4.20 Alberto Gonzalez Iniesta
2020-04-30  6:08 ` Alberto Gonzalez Iniesta
2020-04-30 17:32 ` J. Bruce Fields
2020-09-09  9:29   ` Alberto Gonzalez Iniesta
2020-09-09 13:47     ` J. Bruce Fields
2020-10-26 13:42       ` Alberto Gonzalez Iniesta
2020-10-26 13:58         ` Chuck Lever
2020-10-26 14:43           ` Alberto Gonzalez Iniesta
2020-10-26 14:46             ` Chuck Lever
2020-10-26 14:54               ` Alberto Gonzalez Iniesta
2020-10-26 15:02               ` Bruce Fields
2020-10-26 15:06                 ` Chuck Lever
2020-10-26 15:26                   ` Alberto Gonzalez Iniesta
2020-10-26 17:23                     ` Chuck Lever
2020-10-26 14:14         ` J. Bruce Fields
2020-10-26 14:54           ` Alberto Gonzalez Iniesta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.