From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [PATCH 6/8] bdi: add a new writeback list for sync Date: Mon, 15 Jun 2015 16:12:21 +0200 Message-ID: <20150615141221.GH4368@quack.suse.cz> References: <1434051673-13838-1-git-send-email-jbacik@fb.com> <1434051673-13838-7-git-send-email-jbacik@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, kernel-team@fb.com, viro@ZenIV.linux.org.uk, hch@infradead.org, jack@suse.cz, david@fromorbit.com, Dave Chinner To: Josef Bacik Return-path: Received: from cantor2.suse.de ([195.135.220.15]:55425 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752139AbbFOOM0 (ORCPT ); Mon, 15 Jun 2015 10:12:26 -0400 Content-Disposition: inline In-Reply-To: <1434051673-13838-7-git-send-email-jbacik@fb.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu 11-06-15 15:41:11, Josef Bacik wrote: > From: Dave Chinner > > wait_sb_inodes() current does a walk of all inodes in the filesystem > to find dirty one to wait on during sync. This is highly > inefficient and wastes a lot of CPU when there are lots of clean > cached inodes that we don't need to wait on. > > To avoid this "all inode" walk, we need to track inodes that are > currently under writeback that we need to wait for. We do this by > adding inodes to a writeback list on the bdi when the mapping is > first tagged as having pages under writeback. wait_sb_inodes() can > then walk this list of "inodes under IO" and wait specifically just > for the inodes that the current sync(2) needs to wait for. > > To avoid needing all the realted locking to be safe against > interrupts, Jan Kara suggested that we be lazy about removal from > the writeback list. That is, we don't remove inodes from the > writeback list on IO completion, but do it directly during a > wait_sb_inodes() walk. > > This means that the a rare sync(2) call will have some work to do > skipping clean inodes However, in the current problem case of > concurrent sync workloads, concurrent wait_sb_inodes() calls only > walk the very recently dispatched inodes and hence should have very > little work to do. > > This also means that we have to remove the inodes from the writeback > list during eviction. Do this in inode_wait_for_writeback() once > all writeback on the inode is complete. > > Signed-off-by: Dave Chinner > Signed-off-by: Josef Bacik I don't think you noticed comments I did when you last posted this series: > /* > - * Wait for writeback on an inode to complete. Caller must have inode pinned. > + * Wait for writeback on an inode to complete during eviction. Caller must have > + * inode pinned. > */ > void inode_wait_for_writeback(struct inode *inode) > { > + BUG_ON(!(inode->i_state & I_FREEING)); > + > spin_lock(&inode->i_lock); > __inode_wait_for_writeback(inode); > spin_unlock(&inode->i_lock); > + > + /* > + * bd_inode's will have already had the bdev free'd so don't bother > + * doing the bdi_clear_inode_writeback. > + */ > + if (!sb_is_blkdev_sb(inode->i_sb)) > + bdi_clear_inode_writeback(inode_to_bdi(inode), inode); > } Why do we bother with not putting bdev inode back? ... > spin_lock(&inode->i_lock); > - if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || > - (mapping->nrpages == 0)) { > + if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) { > + list_move(&inode->i_wb_list, &bdi->wb.b_writeback); Indenting is broken here... > spin_unlock(&inode->i_lock); > + cond_resched_lock(&bdi->wb.list_lock); > continue; > } > __iget(inode); > spin_unlock(&inode->i_lock); Honza -- Jan Kara SUSE Labs, CR