From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757101Ab2JWNVd (ORCPT ); Tue, 23 Oct 2012 09:21:33 -0400 Received: from mail-oa0-f46.google.com ([209.85.219.46]:45174 "EHLO mail-oa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755147Ab2JWNVb (ORCPT ); Tue, 23 Oct 2012 09:21:31 -0400 Message-ID: <508699D3.9040509@gmail.com> Date: Tue, 23 Oct 2012 21:21:23 +0800 From: Ni zhan Chen User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Ying Zhu CC: akpm@linux-foundation.org, fengguang.wu@intel.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state References: <1350996411-5425-1-git-send-email-casualfisher@gmail.com> In-Reply-To: <1350996411-5425-1-git-send-email-casualfisher@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/23/2012 08:46 PM, Ying Zhu wrote: > Hi, > Recently we ran into the bug that an opened file's ra_pages does not > synchronize with it's backing device's when the latter is changed > with blockdev --setra, the application needs to reopen the file > to know the change, which is inappropriate under our circumstances. Could you tell me in which function do this synchronize stuff? > This bug is also mentioned in scst (generic SCSI target subsystem for Linux)'s > README file. > This patch tries to unify the ra_pages in struct file_ra_state > and struct backing_dev_info. Basically current readahead algorithm > will ramp file_ra_state.ra_pages up to bdi.ra_pages once it detects the You mean ondemand readahead algorithm will do this? I don't think so. file_ra_state_init only called in btrfs path, correct? > read mode is sequential. Then all files sharing the same backing device > have the same max value bdi.ra_pages set in file_ra_state. why remove file_ra_state? If one file is read sequential and another file is read ramdom, how can use the global bdi.ra_pages to indicate the max readahead window of each file? > Applying this means the flags POSIX_FADV_NORMAL and POSIX_FADV_SEQUENTIAL > in fadivse will only set file reading mode without signifying the > max readahead size of the file. The current apporach adds no additional > overhead in read IO path, IMHO is the simplest solution. > Any comments are welcome, thanks in advance. Could you show me how you test this patch? > > Thanks, > Ying Zhu > > Signed-off-by: Ying Zhu > --- > include/linux/fs.h | 1 - > mm/fadvise.c | 2 -- > mm/filemap.c | 17 +++++++++++------ > mm/readahead.c | 8 ++++---- > 4 files changed, 15 insertions(+), 13 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 17fd887..36303a5 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -991,7 +991,6 @@ struct file_ra_state { > unsigned int async_size; /* do asynchronous readahead when > there are only # of pages ahead */ > > - unsigned int ra_pages; /* Maximum readahead window */ > unsigned int mmap_miss; /* Cache miss stat for mmap accesses */ > loff_t prev_pos; /* Cache last read() position */ > }; > diff --git a/mm/fadvise.c b/mm/fadvise.c > index 469491e..75e2378 100644 > --- a/mm/fadvise.c > +++ b/mm/fadvise.c > @@ -76,7 +76,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) > > switch (advice) { > case POSIX_FADV_NORMAL: > - file->f_ra.ra_pages = bdi->ra_pages; > spin_lock(&file->f_lock); > file->f_mode &= ~FMODE_RANDOM; > spin_unlock(&file->f_lock); > @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) > spin_unlock(&file->f_lock); > break; > case POSIX_FADV_SEQUENTIAL: > - file->f_ra.ra_pages = bdi->ra_pages * 2; > spin_lock(&file->f_lock); > file->f_mode &= ~FMODE_RANDOM; > spin_unlock(&file->f_lock); > diff --git a/mm/filemap.c b/mm/filemap.c > index a4a5260..e7e4409 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1058,11 +1058,15 @@ EXPORT_SYMBOL(grab_cache_page_nowait); > * readahead(R+4...B+3) => bang => read(R+4) => read(R+5) => ...... > * > * It is going insane. Fix it by quickly scaling down the readahead size. > + * It's hard to estimate how the bad sectors lay out, so to be conservative, > + * set the read mode in random. > */ > static void shrink_readahead_size_eio(struct file *filp, > struct file_ra_state *ra) > { > - ra->ra_pages /= 4; > + spin_lock(&filp->f_lock); > + filp->f_mode |= FMODE_RANDOM; > + spin_unlock(&filp->f_lock); > } > > /** > @@ -1527,12 +1531,12 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, > /* If we don't want any read-ahead, don't bother */ > if (VM_RandomReadHint(vma)) > return; > - if (!ra->ra_pages) > + if (!mapping->backing_dev_info->ra_pages) > return; > > if (VM_SequentialReadHint(vma)) { > - page_cache_sync_readahead(mapping, ra, file, offset, > - ra->ra_pages); > + page_cache_sync_readahead(mapping, ra, file, offset, > + mapping->backing_dev_info->ra_pages); > return; > } > > @@ -1550,7 +1554,7 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, > /* > * mmap read-around > */ > - ra_pages = max_sane_readahead(ra->ra_pages); > + ra_pages = max_sane_readahead(mapping->backing_dev_info->ra_pages); > ra->start = max_t(long, 0, offset - ra_pages / 2); > ra->size = ra_pages; > ra->async_size = ra_pages / 4; > @@ -1576,7 +1580,8 @@ static void do_async_mmap_readahead(struct vm_area_struct *vma, > ra->mmap_miss--; > if (PageReadahead(page)) > page_cache_async_readahead(mapping, ra, file, > - page, offset, ra->ra_pages); > + page, offset, > + mapping->backing_dev_info->ra_pages); > } > > /** > diff --git a/mm/readahead.c b/mm/readahead.c > index ea8f8fa..6ea5999 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -27,7 +27,6 @@ > void > file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping) > { > - ra->ra_pages = mapping->backing_dev_info->ra_pages; > ra->prev_pos = -1; > } > EXPORT_SYMBOL_GPL(file_ra_state_init); > @@ -400,7 +399,8 @@ ondemand_readahead(struct address_space *mapping, > bool hit_readahead_marker, pgoff_t offset, > unsigned long req_size) > { > - unsigned long max = max_sane_readahead(ra->ra_pages); > + struct backing_dev_info *bdi = mapping->backing_dev_info; > + unsigned long max = max_sane_readahead(bdi->ra_pages); > > /* > * start of file > @@ -507,7 +507,7 @@ void page_cache_sync_readahead(struct address_space *mapping, > pgoff_t offset, unsigned long req_size) > { > /* no read-ahead */ > - if (!ra->ra_pages) > + if (!mapping->backing_dev_info->ra_pages) > return; > > /* be dumb */ > @@ -543,7 +543,7 @@ page_cache_async_readahead(struct address_space *mapping, > unsigned long req_size) > { > /* no read-ahead */ > - if (!ra->ra_pages) > + if (!mapping->backing_dev_info->ra_pages) > return; > > /*