From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756582Ab3JNN4u (ORCPT ); Mon, 14 Oct 2013 09:56:50 -0400 Received: from mga14.intel.com ([143.182.124.37]:65230 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755633Ab3JNN4s (ORCPT ); Mon, 14 Oct 2013 09:56:48 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.93,492,1378882800"; d="scan'208";a="410634012" From: "Kirill A. Shutemov" To: Dave Chinner Cc: "Kirill A. Shutemov" , Andrew Morton , Andrea Arcangeli , Al Viro , Hugh Dickins , Wu Fengguang , Jan Kara , Mel Gorman , linux-mm@kvack.org, Andi Kleen , Matthew Wilcox , "Kirill A. Shutemov" , Hillf Danton , Dave Hansen , Ning Qu , Alexander Shishkin , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20130925232915.GK26872@dastard> References: <1379937950-8411-1-git-send-email-kirill.shutemov@linux.intel.com> <20130924163740.4bc7db61e3e520798220dc4c@linux-foundation.org> <20130925095105.06464E0090@blue.fi.intel.com> <20130925232915.GK26872@dastard> Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Content-Transfer-Encoding: 7bit Message-Id: <20131014135642.05DFEE0090@blue.fi.intel.com> Date: Mon, 14 Oct 2013 16:56:41 +0300 (EEST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dave Chinner wrote: > On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote: > > Andrew Morton wrote: > > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" wrote: > > > > > > > It brings thp support for ramfs, but without mmap() -- it will be posted > > > > separately. > > > > > > We were never going to do this :( > > > > > > Has anyone reviewed these patches much yet? > > > > Dave did very good review. Few other people looked to separate patches. > > See Reviewed-by/Acked-by tags in patches. > > > > It looks like most mm experts are busy with numa balancing nowadays, so > > it's hard to get more review. > > Nobody has reviewed it from the filesystem side, though. > > The changes that require special code paths for huge pages in the > write_begin/write_end paths are nasty. You're adding conditional > code that depends on the page size and then having to add checks to > ensure that large page operations don't step over small page > boundaries and other such corner cases. It's an extremely fragile > design, IMO. > > In general, I don't like all the if (thp) {} else {}; code that this > series introduces - they are code paths that simply won't get tested > with any sort of regularity and make the code more complex for those > that aren't using THP to understand and debug... Okay, I'll try to get rid of special cases where it's possible. > Then there is a new per-inode lock that is used in > generic_perform_write() which is held across page faults and calls > to filesystem block mapping callbacks. This inserts into the middle > of an existing locking chain that needs to be strictly ordered, and > as such will lead to the same type of lock inversion problems that > the mmap_sem had. We do not want to introduce a new lock that has > this same problem just as we are getting rid of that long standing > nastiness from the page fault path... I don't see how we can protect against splitting with existing locks, but I'll try find a way. > I also note that you didn't convert invalidate_inode_pages2_range() > to support huge pages which is needed by real filesystems that > support direct IO. There are other truncate/invalidate interfaces > that you didn't convert, either, and some of them will present you > with interesting locking challenges as a result of adding that new > lock... Thanks. I'll take a look on these code paths. > > The patchset was mostly ignored for few rounds and Dave suggested to split > > to have less scary patch number. > > It's still being ignored by filesystem people because you haven't > actually tried to implement support into a real filesystem..... If it will support a real filesystem, wouldn't it be ignored due patch count? ;) > > > > Please review and consider applying. > > > > > > It appears rather too immature at this stage. > > > > More review is always welcome and I'm committed to address issues. > > IMO, supporting a real block based filesystem like ext4 or XFS and > demonstrating that everything works is necessary before we go any > further... Will see what numbers I can bring in next iterations. Thanks for your feedback. And sorry for late answer. -- Kirill A. Shutemov