All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Cc: neilb@suse.de, linux-nfs@vger.kernel.org
Subject: Re: sunrpc/cache.c: races while updating cache entries
Date: Fri, 5 Apr 2013 17:08:30 -0400	[thread overview]
Message-ID: <20130405210830.GA7079@fieldses.org> (raw)
In-Reply-To: <d6437a$47jkcm@dgate10u.abg.fsc.net>

On Fri, Apr 05, 2013 at 05:33:49PM +0200, Bodo Stroesser wrote:
> On 05 Apr 2013 14:40:00 +0100 J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Thu, Apr 04, 2013 at 07:59:35PM +0200, Bodo Stroesser wrote:
> > > There is no reason for apologies. The thread meanwhile seems to be a bit
> > > confusing :-)
> > > 
> > > Current state is:
> > > 
> > > - Neil Brown has created two series of patches. One for SLES11-SP1 and a
> > >   second one for -SP2
> > > 
> > > - AFAICS, the series for -SP2 will match with mainline also.
> > > 
> > > - Today I found and fixed the (hopefully) last problem in the -SP1 series.
> > >   My test using this patchset will run until Monday.
> > > 
> > > - Provided the test on SP1 succeeds, probably on Tuesday I'll start to test
> > >   the patches for SP2 (and mainline). If it runs fine, we'll have a tested
> > >   patchset not later than Mon 15th.
> > 
> > OK, great, as long as it hasn't just been forgotten!
> > 
> > I'd also be curious to understand why we aren't getting a lot of
> > complaints about this from elsewhere....  Is there something unique
> > about your setup?  Do the bugs that remain upstream take a long time to
> > reproduce?
> > 
> > --b.
> > 
> 
> It's no secret, what we are doing. So let me try to explain:

Thanks for the detailed explanation!  I'll look forward to the patches.

--b.

> 
> We build appliances for storage purposes. Each appliance mainly consists of
> a cluster of servers and a bunch of FibreChannel RAID systems. The servers
> of the appliance run SLES11.
> 
> One ore more of the servers in the cluster can act as a NFS server.
> 
> Each NFS server is connected to the RAID systems and has two 10 GBit/s Ethernet
> controllers for the link to the clients.
> 
> The appliance not only offers NFS access for clients, but also has some other
> types of interfaces to be used by the clients.
> 
> For QA of the appliances we use a special test system, that runs the entire
> appliance with all its interfaces under heavy load.
> 
> For the test of the NFS interfaces of the appliance, we connect the Ethernet
> links one by one to 10 GBit/s Ethernet controllers on a linux machine of the
> test system.
> 
> The SW on the test system for each Ethernet link uses 32 TCP connections to the
> NFS server in parallel. 
> 
> So between NFS server of the appliance and linux machine of the test system we
> have two 10 GBit/s links with 32 TCP/RPC/NFS_V3 connections each. Each link
> is running at up to 1 GByte/s throughput (per second and per link a total of
> 32k NFS3_READ or NFS3_WRITE RPCs of 32k data each.)
> 
> Normal Linux-NFS-Clients open only one single connection to a specific NFS
> server, even if there are multiple mounts. We do not use the linux builtin
> client, but create a RPC client by clnttcp_create() and do the NFS handling
> directly. Thus we can have multiple connections and we immediately can
> see if something goes wrong (e.g. if a RPC request is dropped), while the
> builtin linux client probably would do a silent retry. (But probably one
> could see single connections hang for a few minutes sporadically. Maybe
> someone hit by this would complain about the network ...)
> 
> As a side effect of this test setup all 64 connections to the NFS server
> use the same uid/gid and all 32 connections on one link come from the same
> ip address. This - as we know now - maximizes the stress for a single entry
> of the caches.
> 
> With our test setup at the beginning we had more than two dropped RPC request
> per hour and per NFS server. (Of course, this rate varied widely.) With each
> single change in cache.c the rate went down. The latest drop caused by a
> missing detail in the latest patchset for -SP1 occured after more than 2 days
> of testing!
> 
> Thus, to verify the patches I schedule a test for at least 4 days.
> 
> HTH
> Bodo

       reply	other threads:[~2013-04-05 21:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d6437a$47jkcm@dgate10u.abg.fsc.net>
2013-04-05 21:08 ` J. Bruce Fields [this message]
     [not found] <61eb00$3oamkh@dgate20u.abg.fsc.net>
2013-06-13  1:54 ` sunrpc/cache.c: races while updating cache entries NeilBrown
2013-06-13  2:04   ` J. Bruce Fields
2013-06-03 14:27 Bodo Stroesser
  -- strict thread matches above, loose matches on Subject: below --
2013-04-19 16:55 Bodo Stroesser
2013-05-10  7:51 ` Namjae Jeon
2013-05-13  4:08   ` Namjae Jeon
2013-04-05 15:33 Bodo Stroesser
     [not found] <61eb00$3itd78@dgate20u.abg.fsc.net>
2013-04-05 12:40 ` J. Bruce Fields
2013-04-04 17:59 Bodo Stroesser
     [not found] <61eb00$3hon1j@dgate20u.abg.fsc.net>
2013-04-03 18:36 ` J. Bruce Fields
2013-03-21 16:41 Bodo Stroesser
     [not found] <61eb00$3hl8ah@dgate20u.abg.fsc.net>
2013-03-20 23:33 ` NeilBrown
2013-03-20 18:45 Bodo Stroesser
     [not found] <d6437a$45t6bs@dgate10u.abg.fsc.net>
2013-03-20  4:39 ` NeilBrown
2013-03-19 19:58 Bodo Stroesser
     [not found] <d6437a$45efvo@dgate10u.abg.fsc.net>
2013-03-19  3:23 ` NeilBrown
2013-03-15 20:35 Bodo Stroesser
2013-03-14 17:31 Bodo Stroesser
2013-03-13 16:47 Bodo Stroesser
     [not found] <61eb00$3gpm51@dgate20u.abg.fsc.net>
2013-03-13  5:55 ` NeilBrown
2013-03-11 16:13 Bodo Stroesser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130405210830.GA7079@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=bstroesser@ts.fujitsu.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.