From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ashish Samant Subject: Re: fuse scalability part 1 Date: Fri, 25 Sep 2015 10:53:56 -0700 Message-ID: <56058A34.8050900@oracle.com> References: <20150518151336.GA9960@tucsk> <56044C66.1090207@oracle.com> Reply-To: ashish.samant@oracle.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Linux-Fsdevel , Kernel Mailing List , fuse-devel , Srinivas Eeda To: Miklos Szeredi Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 09/25/2015 05:11 AM, Miklos Szeredi wrote: > On Thu, Sep 24, 2015 at 9:17 PM, Ashish Samant wrote: > >> We did some performance testing without these patches and with these patches >> (with -o clone_fd option specified). We did 2 types of tests: >> >> 1. Throughput test : We did some parallel dd tests to read/write to FUSE >> based database fs on a system with 8 numa nodes and 288 cpus. The >> performance here is almost equal to the the per-numa patches we submitted a >> while back.Please find results attached. > Interesting. This means, that serving the request on a different NUMA > node as the one where the request originated doesn't appear to make > the performance much worse. > > Thanks, > Miklos Yes. The main performance gain is due to the reduced contention on one spinlock(fc->lock) , especially with a large number of requests. Splitting fc->fiq per cloned device will definitely improve performance further and we can experiment further with per numa / cpu cloned device. Thanks, Ashish