All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: <linux-btrfs@vger.kernel.org>, <kernel-team@fb.com>,
	<jack@suse.cz>, <linux-fsdevel@vger.kernel.org>,
	<viro@zeniv.linux.org.uk>, <hch@infradead.org>, <jweiner@fb.com>
Subject: Re: [PATCH 5/5] fs: don't set *REFERENCED unless we are on the lru list
Date: Wed, 26 Oct 2016 16:03:54 -0400	[thread overview]
Message-ID: <20683c6a-5419-3a66-8574-fcf19f2347f9@fb.com> (raw)
In-Reply-To: <20161025233604.GF14023@dastard>

On 10/25/2016 07:36 PM, Dave Chinner wrote:
> On Wed, Oct 26, 2016 at 09:01:13AM +1100, Dave Chinner wrote:
>> On Tue, Oct 25, 2016 at 02:41:44PM -0400, Josef Bacik wrote:
>>> With anything that populates the inode/dentry cache with a lot of one time use
>>> inodes we can really put a lot of pressure on the system for things we don't
>>> need to keep in cache.  It takes two runs through the LRU to evict these one use
>>> entries, and if you have a lot of memory you can end up with 10's of millions of
>>> entries in the dcache or icache that have never actually been touched since they
>>> were first instantiated, and it will take a lot of CPU and a lot of pressure to
>>> evict all of them.
>>>
>>> So instead do what we do with pagecache, only set the *REFERENCED flags if we
>>> are being used after we've been put onto the LRU.  This makes a significant
>>> difference in the system's ability to evict these useless cache entries.  With a
>>> fs_mark workload that creates 40 million files we get the following results (all
>>> in files/sec)
>>
>> What's the workload, storage, etc?
>>
>>> Btrfs			Patched		Unpatched
>>> Average Files/sec:	72209.3		63254.2
>>> p50 Files/sec:	70850		57560
>>> p90 Files/sec:	68757		53085
>>> p99 Files/sec:	68757		53085
>>
>> So how much of this is from changing the dentry referenced
>> behaviour, and how much from the inode? Can you separate out the two
>> changes so we know which one is actually affecting reclaim
>> performance?

(Ok figured out the problem, apparently outlook thinks you are spam and has been 
filtering all your emails into its junk folder without giving them to me.  I've 
fixed this so now we should be able to get somewhere)

>
> FWIW, I just went to run my usual zero-length file creation fsmark
> test (16-way create, large sparse FS image on SSDs). XFS (with debug
> enabled) takes 4m10s to run at an average of ~230k files/s, with a
> std deviation of +/-1.7k files/s.
>
> Unfortunately, btrfs turns that into more heat than it ever has done
> before. It's only managing 35k files/s and the profile looks like
> this:
>
>   58.79%  [kernel]  [k] __pv_queued_spin_lock_slowpath
>    5.61%  [kernel]  [k] queued_write_lock_slowpath
>    1.65%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
>    1.38%  [kernel]  [k] reschedule_interrupt
>    1.08%  [kernel]  [k] _raw_spin_lock
>    0.92%  [kernel]  [k] __radix_tree_lookup
>    0.86%  [kernel]  [k] _raw_spin_lock_irqsave
>    0.83%  [kernel]  [k] btrfs_set_lock_blocking_rw
>
> I killed it because this would take too long to run.
>
> I reduced the concurrency down to 4-way, spinlock contention went
> down to about 40% of the CPU time.  I reduced the concurrency down
> to 2 and saw about 16% of CPU time being spent in lock contention.
>
> Throughput results:
> 				btrfs throughput
> 			2-way			4-way
> unpatched	46938.1+/-2.8e+03	40273.4+/-3.1e+03
> patched		45697.2+/-2.4e+03	49287.1+/-3e+03
>
>
> So, 2-way has not improved. If changing referenced behaviour was an
> obvious win for btrfs, we'd expect to see that here as well.
> however, because 4-way improved by 20%, I think all we're seeing is
> a slight change in lock contention levels inside btrfs. Indeed,
> looking at the profiles I see that lock contention time was reduced
> to around 32% of the total CPU used (down by about 20%):
>
>   20.79%  [kernel]  [k] __pv_queued_spin_lock_slowpath
>    3.85%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
>    3.68%  [kernel]  [k] _raw_spin_lock
>    3.40%  [kernel]  [k] queued_write_lock_slowpath
>    .....
>
> IOWs, the performance increase comes from the fact that changing
> inode cache eviction patterns causes slightly less lock contention
> in btrfs inode reclaim. IOWs, the problem that needs fixing is the
> btrfs lock contention, not the VFS cache LRU algorithms.
>
> Root cause analysis needs to be done properly before behavioural
> changes like this are proposed, people!
>

We are doing different things.  Yes when you do it all into the same subvol the 
lock contention is obviously much worse and the bottleneck, but that's not what 
I'm doing, I'm creating a subvol for each thread to reduce the lock contention 
on the root nodes.  If you retest by doing that then you will see the 
differences I was seeing.  Are you suggesting that I just made these numbers up? 
  Instead of assuming I'm an idiot and don't know how to root cause issues 
please instead ask what is different from your run versus my run.  Thanks,

Josef

  reply	other threads:[~2016-10-26 20:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-25 18:41 [PATCH 0/5][RESEND] Support for metadata specific accounting Josef Bacik
2016-10-25 18:41 ` [PATCH 1/5] remove mapping from balance_dirty_pages*() Josef Bacik
2016-10-25 18:47   ` Tejun Heo
2016-10-25 18:41 ` [PATCH 2/5] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik
2016-10-25 19:03   ` Tejun Heo
2016-10-25 19:09     ` Josef Bacik
2016-10-30 15:13   ` Jan Kara
2016-10-25 18:41 ` [PATCH 3/5] writeback: add counters for metadata usage Josef Bacik
2016-10-25 19:50   ` Tejun Heo
2016-10-26 15:20     ` Josef Bacik
2016-10-26 15:49       ` Tejun Heo
2016-10-30 15:36   ` Jan Kara
2016-10-25 18:41 ` [PATCH 4/5] writeback: introduce super_operations->write_metadata Josef Bacik
2016-10-25 20:00   ` Tejun Heo
2016-10-25 18:41 ` [PATCH 5/5] fs: don't set *REFERENCED unless we are on the lru list Josef Bacik
2016-10-25 22:01   ` Dave Chinner
2016-10-25 23:36     ` Dave Chinner
2016-10-26 20:03       ` Josef Bacik [this message]
2016-10-26 22:20         ` Dave Chinner
2016-10-26 15:11     ` Josef Bacik
2016-10-27  0:30       ` Dave Chinner
2016-10-27 13:13         ` Josef Bacik
2016-10-28  3:48           ` Dave Chinner
2016-10-25 22:44   ` Omar Sandoval
2016-10-26  4:17     ` [PATCH 5/5] " Andreas Dilger
2016-10-26  5:24       ` Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20683c6a-5419-3a66-8574-fcf19f2347f9@fb.com \
    --to=jbacik@fb.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jweiner@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.