All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Drokin <green@linuxhacker.ru>
To: linux-nfs@vger.kernel.org
Cc: Jeff Layton <jlayton@poochiereds.net>
Subject: 4.6, 4.7 slow ifs export with more than one client.
Date: Mon, 5 Sep 2016 00:55:25 -0400	[thread overview]
Message-ID: <6C329B27-111A-4B16-84F4-7357940EBC01@linuxhacker.ru> (raw)

Hello!

   I have a somewhat mysterious problem with my nfs test rig that I suspect is something
   stupid I am missing, but I cannot figure it out and would appreciate any help.

   NFS server is Fedora23 with 4.6.7-200.fc23.x86_64 as the kernel.
   Clients are a bunch of 4.8-rc5 nodes, nfsroot.
   If I only start one of them, all is fine, if I start all 9 or 10, then suddenly all
   operations ground to a half (nfs-wise). NFS server side there's very little load.

   I hit this (or something similar) back in June, when testing 4.6-rcs (and the server
   was running 4.4.something I believe), and back then after some mucking around
   I set:
net.core.rmem_default=268435456
net.core.wmem_default=268435456
net.core.rmem_max=268435456
net.core.wmem_max=268435456

   and while no idea why, that helped, so I stopped looking into it completely.

   Now fast forward to now, I am back at the same problem and the workaround above
   does not help anymore.

   I also have a bunch of "NFSD: client 192.168.10.191 testing state ID with incorrect client ID"
   in my logs (also had in June. Tried to disable nfs 4.2 and 4.1 and that did not
   help).

   So anyway I discovered the nfsdcltrack and such and I noticed that whenever
   the kernel calls it, it's always with the same hexid of
   4c696e7578204e465376342e32206c6f63616c686f7374

   NAturally if I try to list the content of the sqlite file, I get:
sqlite> select * from clients;
Linux NFSv4.2 localhost|1473049735|1
sqlite> select * from clients;
Linux NFSv4.2 localhost|1473049736|1
sqlite> select * from clients;
Linux NFSv4.2 localhost|1473049737|1
sqlite> select * from clients;
Linux NFSv4.2 localhost|1473049751|1
sqlite> select * from clients;
Linux NFSv4.2 localhost|1473049752|1
sqlite> 

   (the number keeps changing), so it looks like client id detection broke somehow?

   These same clients (and a bunch more) also mount another nfs server (for crashdump
   purposes) that is centos7-based, there everything is detected correctly
   and performance is ok. The select shows:
sqlite> select * from clients;
Linux NFSv4.0 192.168.10.219/192.168.10.1 tcp|1472868376|0
Linux NFSv4.0 192.168.10.218/192.168.10.1 tcp|1472868376|0
Linux NFSv4.0 192.168.10.210/192.168.10.1 tcp|1472868384|0
Linux NFSv4.0 192.168.10.221/192.168.10.1 tcp|1472868387|0
Linux NFSv4.0 192.168.10.220/192.168.10.1 tcp|1472868388|0
Linux NFSv4.0 192.168.10.211/192.168.10.1 tcp|1472868389|0
Linux NFSv4.0 192.168.10.222/192.168.10.1 tcp|1473035496|0
Linux NFSv4.0 192.168.10.217/192.168.10.1 tcp|1473035500|0
Linux NFSv4.0 192.168.10.216/192.168.10.1 tcp|1473035501|0
Linux NFSv4.0 192.168.10.224/192.168.10.1 tcp|1473035520|0
Linux NFSv4.0 192.168.10.226/192.168.10.1 tcp|1473045789|0
Linux NFSv4.0 192.168.10.227/192.168.10.1 tcp|1473045789|0
Linux NFSv4.1 fedora1.localnet|1473046045|1
Linux NFSv4.1 fedora-1-3.localnet|1473046139|1
Linux NFSv4.1 fedora-2-4.localnet|1473046229|1
Linux NFSv4.1 fedora-1-1.localnet|1473046244|1
Linux NFSv4.1 fedora-1-4.localnet|1473046251|1
Linux NFSv4.1 fedora-2-1.localnet|1473046342|1
Linux NFSv4.1 fedora-1-2.localnet|1473046498|1
Linux NFSv4.1 fedora-2-3.localnet|1473046524|1
Linux NFSv4.1 fedora-2-2.localnet|1473046689|1
sqlite> 

  (the first nameless bunch is centos7 nfsroot clients, fedora* bunch are
  the ones on 4.8-rc5).
  If I try to mount the Fedora23 server from one of the centos7 clients, the record
  does not appear in the output either.

   Now, while a theory that "aha, it's nfs 4.2 that is broken with Fedora23"
   might look possible, I have another Fedora23 server that is mounted by
   yet another (single) client and there things seems to be fine:
sqlite> select * from clients;
Linux NFSv4.2 xbmc.localnet|1471825025|1


   So with all of that in the picture, I wonder what is it I am doing wrong just on
   this server?

   Thanks.

Bye,
    Oleg

             reply	other threads:[~2016-09-05  5:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-05  4:55 Oleg Drokin [this message]
2016-09-06 14:30 ` 4.6, 4.7 slow ifs export with more than one client Jeff Layton
2016-09-06 14:58   ` Oleg Drokin
2016-09-06 15:18     ` Jeff Layton
2016-09-06 15:47       ` Oleg Drokin
2016-09-06 16:00         ` Jeff Layton
2016-09-06 16:29           ` Oleg Drokin
2016-09-06 22:51             ` Jeff Layton
2016-09-06 16:38         ` Chuck Lever
2016-09-06 18:52           ` Oleg Drokin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6C329B27-111A-4B16-84F4-7357940EBC01@linuxhacker.ru \
    --to=green@linuxhacker.ru \
    --cc=jlayton@poochiereds.net \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.