All of lore.kernel.org
 help / color / mirror / Atom feed
* Please Help: 2.6.37 a_ops->readpages is never called, 2.6.36 was fine
@ 2011-02-02 16:48 Boaz Harrosh
  2011-02-02 18:46 ` Problem found, but why??? Boaz Harrosh
  2011-02-03 11:22 ` [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi Boaz Harrosh
  0 siblings, 2 replies; 5+ messages in thread
From: Boaz Harrosh @ 2011-02-02 16:48 UTC (permalink / raw)
  To: linux-fsdevel, Al Viro, Andrew Morton

I was so busy with doing "write" development that I totally missed it!

Somewhere between 2.6.36 and 2.6.37 my exofs file system stopped to receive
any a_ops->readpages() (with "s") calls it only receives the single page read,
a_ops->readpage(). Needless to say that things get very slow.

In 2.6.36 I can clearly see both members getting called the plural one from
read-ahead code.

I've reverted the few simple patches that went into 2.6.37 for exofs, but
clearly it's not it, because the vector is just not called.

I'm now bisecting it, but in an hope that some kind soul can put me out of
my misery.

So the question is:
 Is there some new nob that needs to be set so read-head starts working again?

Thanks in Advance
Boaz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem found, but why???
  2011-02-02 16:48 Please Help: 2.6.37 a_ops->readpages is never called, 2.6.36 was fine Boaz Harrosh
@ 2011-02-02 18:46 ` Boaz Harrosh
  2011-02-02 21:51   ` Chris Mason
  2011-02-03 11:22 ` [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi Boaz Harrosh
  1 sibling, 1 reply; 5+ messages in thread
From: Boaz Harrosh @ 2011-02-02 18:46 UTC (permalink / raw)
  To: linux-fsdevel, Al Viro, Andrew Morton, Nick Piggin

115e19c53501edc11f730191f7f047736815ae3d is the first bad commit
commit 115e19c53501edc11f730191f7f047736815ae3d
Author: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>

    exofs: Set i_mapping->backing_dev_info anyway

OK It's stupid ME. But why please someone explain this issue?
For now I'll send a revert, but I would like this explained,
perhaps a more solid fix is needed.

Please look at this patch:

diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 24ab327..0ba9886 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -1030,6 +1030,7 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
 		memcpy(oi->i_data, fcb.i_data, sizeof(fcb.i_data));
 	}
 
+	inode->i_mapping->backing_dev_info = sb->s_bdi;
 	if (S_ISREG(inode->i_mode)) {
 		inode->i_op = &exofs_file_inode_operations;
 		inode->i_fop = &exofs_file_operations;
@@ -1129,6 +1130,7 @@ struct inode *exofs_new_inode(struct inode *dir, int mode)
 
 	sbi = sb->s_fs_info;
 
+	inode->i_mapping->backing_dev_info = sb->s_bdi;
 	sb->s_dirt = 1;
 	inode_init_owner(inode, dir, mode);
 	inode->i_ino = sbi->s_nextid++;


All it does is set the inode's bdi to point to the super-block bdi. In stead
of this default "null" bdi from VFS. When we do so the read-ahead stops
coming, completely. Is there something that needs to be implemented farther
for a super-block's bdi to support read-ahead? Please help?

Thanks
Boaz

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Problem found, but why???
  2011-02-02 18:46 ` Problem found, but why??? Boaz Harrosh
@ 2011-02-02 21:51   ` Chris Mason
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Mason @ 2011-02-02 21:51 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: linux-fsdevel, Al Viro, Andrew Morton, Nick Piggin

Excerpts from Boaz Harrosh's message of 2011-02-02 13:46:52 -0500:
> 115e19c53501edc11f730191f7f047736815ae3d is the first bad commit
> commit 115e19c53501edc11f730191f7f047736815ae3d
> Author: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>
> 
>     exofs: Set i_mapping->backing_dev_info anyway
> 
> OK It's stupid ME. But why please someone explain this issue?
> For now I'll send a revert, but I would like this explained,
> perhaps a more solid fix is needed.

The bdi_init code doesn't seem to set the ra_pages to a default number.
You got there via kzalloc, so the ra_pages field in your bdi is probably
zero.

This should stop all readahead.

-chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi
  2011-02-02 16:48 Please Help: 2.6.37 a_ops->readpages is never called, 2.6.36 was fine Boaz Harrosh
  2011-02-02 18:46 ` Problem found, but why??? Boaz Harrosh
@ 2011-02-03 11:22 ` Boaz Harrosh
  2011-02-03 12:12   ` Boaz Harrosh
  1 sibling, 1 reply; 5+ messages in thread
From: Boaz Harrosh @ 2011-02-03 11:22 UTC (permalink / raw)
  To: linux-fsdevel, Jens Axboe, open-osd; +Cc: Chris Mason, Marc Dionne, Stable Tree


commit: [115e19c] exofs: Set i_mapping->backing_dev_info anyway

Has caused a regression in read-ahead because bdi_init() does
not set bdi->ra_pages to anything and it was left as zero.
So when moving all inodes to point to the super-block's
bdi, they all had ra_pages disabled.

Should bdi_init() be patched to some sane default value?

I fix that by calculating a read_ahead that is:
- preferable 2 stripes long (I'll add a mount option to next Kernel)
- Minimum 128K aligned up to stripe-size
- Caped to maximum-IO-sizes round down to stripe_size.
  (Max sizes are governed by max bio size that fits a page times
   # of devices)

This is a regression since the 2.6.37 Kernel.
2.6.36 was fine

CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/exofs/exofs.h |    2 ++
 fs/exofs/inode.c |   17 +++++++++++++----
 fs/exofs/super.c |   18 ++++++++++++++++++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index 2dc925f..99fcb91 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -256,6 +256,8 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
 }
 
 /* inode.c               */
+unsigned exofs_max_io_pages(struct exofs_layout *layout,
+			    unsigned expected_pages);
 int exofs_setattr(struct dentry *, struct iattr *);
 int exofs_write_begin(struct file *file, struct address_space *mapping,
 		loff_t pos, unsigned len, unsigned flags,
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 4268542..deede10 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -43,6 +43,17 @@ enum { BIO_MAX_PAGES_KMALLOC =
 		PAGE_SIZE / sizeof(struct page *),
 };
 
+unsigned exofs_max_io_pages(struct exofs_layout *layout,
+			    unsigned expected_pages)
+{
+	unsigned pages = min_t(unsigned, expected_pages, MAX_PAGES_KMALLOC);
+
+	/* TODO: easily support bio chaining */
+	pages =  min_t(unsigned, pages,
+		       layout->group_width * BIO_MAX_PAGES_KMALLOC);
+	return pages;
+}
+
 struct page_collect {
 	struct exofs_sb_info *sbi;
 	struct inode *inode;
@@ -97,8 +108,7 @@ static void _pcol_reset(struct page_collect *pcol)
 
 static int pcol_try_alloc(struct page_collect *pcol)
 {
-	unsigned pages = min_t(unsigned, pcol->expected_pages,
-			  MAX_PAGES_KMALLOC);
+	unsigned pages;
 
 	if (!pcol->ios) { /* First time allocate io_state */
 		int ret = exofs_get_io_state(&pcol->sbi->layout, &pcol->ios);
@@ -108,8 +118,7 @@ static int pcol_try_alloc(struct page_collect *pcol)
 	}
 
 	/* TODO: easily support bio chaining */
-	pages =  min_t(unsigned, pages,
-		       pcol->sbi->layout.group_width * BIO_MAX_PAGES_KMALLOC);
+	pages =  exofs_max_io_pages(&pcol->sbi->layout, pcol->expected_pages);
 
 	for (; pages; pages >>= 1) {
 		pcol->pages = kmalloc(pages * sizeof(struct page *),
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 79c3ae6..75d42ac 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -383,6 +383,23 @@ static int _read_and_match_data_map(struct exofs_sb_info *sbi, unsigned numdevs,
 	return 0;
 }
 
+static unsigned __ra_pages(struct exofs_layout *layout)
+{
+	const unsigned _MIN_RA = 32; /* min 128K read-ahead */
+	unsigned ra_pages = layout->group_width * layout->stripe_unit /
+				PAGE_SIZE;
+	unsigned max_io_pages = exofs_max_io_pages(layout, ~0);
+
+	ra_pages *= 2; /* two stripes */
+	if (ra_pages < _MIN_RA)
+		ra_pages = roundup(_MIN_RA, ra_pages / 2);
+
+	if (ra_pages > max_io_pages)
+		ra_pages = max_io_pages;
+
+	return ra_pages;
+}
+
 /* @odi is valid only as long as @fscb_dev is valid */
 static int exofs_devs_2_odi(struct exofs_dt_device_info *dt_dev,
 			     struct osd_dev_info *odi)
@@ -616,6 +633,7 @@ static int exofs_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	/* set up operation vectors */
+	sbi->bdi.ra_pages = __ra_pages(&sbi->layout);
 	sb->s_bdi = &sbi->bdi;
 	sb->s_fs_info = sbi;
 	sb->s_op = &exofs_sops;
-- 
1.7.2.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi
  2011-02-03 11:22 ` [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi Boaz Harrosh
@ 2011-02-03 12:12   ` Boaz Harrosh
  0 siblings, 0 replies; 5+ messages in thread
From: Boaz Harrosh @ 2011-02-03 12:12 UTC (permalink / raw)
  To: linux-fsdevel, Stable Tree, Greg KH
  Cc: Jens Axboe, open-osd, Chris Mason, Marc Dionne

On 02/03/2011 01:22 PM, Boaz Harrosh wrote:
> 
> commit: [115e19c] exofs: Set i_mapping->backing_dev_info anyway
> 
> Has caused a regression in read-ahead because bdi_init() does
> not set bdi->ra_pages to anything and it was left as zero.
> So when moving all inodes to point to the super-block's
> bdi, they all had ra_pages disabled.
> 
> Should bdi_init() be patched to some sane default value?
> 
> I fix that by calculating a read_ahead that is:
> - preferable 2 stripes long (I'll add a mount option to next Kernel)
> - Minimum 128K aligned up to stripe-size
> - Caped to maximum-IO-sizes round down to stripe_size.
>   (Max sizes are governed by max bio size that fits a page times
>    # of devices)
> 
> This is a regression since the 2.6.37 Kernel.
> 2.6.36 was fine
> 

Dear Greg, Stable Tree.

I see that Linus has already merged the revert of the offending
patch, as a BUG fix for above. That's fine. PLEASE forget about
this patch. (Sorry for the noise)

I will submit this patch as a regular enhancement for the
next merge window.

Thanks
Boaz

> CC: Stable Tree <stable@kernel.org>
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  fs/exofs/exofs.h |    2 ++
>  fs/exofs/inode.c |   17 +++++++++++++----
>  fs/exofs/super.c |   18 ++++++++++++++++++
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
> index 2dc925f..99fcb91 100644
> --- a/fs/exofs/exofs.h
> +++ b/fs/exofs/exofs.h
> @@ -256,6 +256,8 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
>  }
>  
>  /* inode.c               */
> +unsigned exofs_max_io_pages(struct exofs_layout *layout,
> +			    unsigned expected_pages);
>  int exofs_setattr(struct dentry *, struct iattr *);
>  int exofs_write_begin(struct file *file, struct address_space *mapping,
>  		loff_t pos, unsigned len, unsigned flags,
> diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
> index 4268542..deede10 100644
> --- a/fs/exofs/inode.c
> +++ b/fs/exofs/inode.c
> @@ -43,6 +43,17 @@ enum { BIO_MAX_PAGES_KMALLOC =
>  		PAGE_SIZE / sizeof(struct page *),
>  };
>  
> +unsigned exofs_max_io_pages(struct exofs_layout *layout,
> +			    unsigned expected_pages)
> +{
> +	unsigned pages = min_t(unsigned, expected_pages, MAX_PAGES_KMALLOC);
> +
> +	/* TODO: easily support bio chaining */
> +	pages =  min_t(unsigned, pages,
> +		       layout->group_width * BIO_MAX_PAGES_KMALLOC);
> +	return pages;
> +}
> +
>  struct page_collect {
>  	struct exofs_sb_info *sbi;
>  	struct inode *inode;
> @@ -97,8 +108,7 @@ static void _pcol_reset(struct page_collect *pcol)
>  
>  static int pcol_try_alloc(struct page_collect *pcol)
>  {
> -	unsigned pages = min_t(unsigned, pcol->expected_pages,
> -			  MAX_PAGES_KMALLOC);
> +	unsigned pages;
>  
>  	if (!pcol->ios) { /* First time allocate io_state */
>  		int ret = exofs_get_io_state(&pcol->sbi->layout, &pcol->ios);
> @@ -108,8 +118,7 @@ static int pcol_try_alloc(struct page_collect *pcol)
>  	}
>  
>  	/* TODO: easily support bio chaining */
> -	pages =  min_t(unsigned, pages,
> -		       pcol->sbi->layout.group_width * BIO_MAX_PAGES_KMALLOC);
> +	pages =  exofs_max_io_pages(&pcol->sbi->layout, pcol->expected_pages);
>  
>  	for (; pages; pages >>= 1) {
>  		pcol->pages = kmalloc(pages * sizeof(struct page *),
> diff --git a/fs/exofs/super.c b/fs/exofs/super.c
> index 79c3ae6..75d42ac 100644
> --- a/fs/exofs/super.c
> +++ b/fs/exofs/super.c
> @@ -383,6 +383,23 @@ static int _read_and_match_data_map(struct exofs_sb_info *sbi, unsigned numdevs,
>  	return 0;
>  }
>  
> +static unsigned __ra_pages(struct exofs_layout *layout)
> +{
> +	const unsigned _MIN_RA = 32; /* min 128K read-ahead */
> +	unsigned ra_pages = layout->group_width * layout->stripe_unit /
> +				PAGE_SIZE;
> +	unsigned max_io_pages = exofs_max_io_pages(layout, ~0);
> +
> +	ra_pages *= 2; /* two stripes */
> +	if (ra_pages < _MIN_RA)
> +		ra_pages = roundup(_MIN_RA, ra_pages / 2);
> +
> +	if (ra_pages > max_io_pages)
> +		ra_pages = max_io_pages;
> +
> +	return ra_pages;
> +}
> +
>  /* @odi is valid only as long as @fscb_dev is valid */
>  static int exofs_devs_2_odi(struct exofs_dt_device_info *dt_dev,
>  			     struct osd_dev_info *odi)
> @@ -616,6 +633,7 @@ static int exofs_fill_super(struct super_block *sb, void *data, int silent)
>  	}
>  
>  	/* set up operation vectors */
> +	sbi->bdi.ra_pages = __ra_pages(&sbi->layout);
>  	sb->s_bdi = &sbi->bdi;
>  	sb->s_fs_info = sbi;
>  	sb->s_op = &exofs_sops;


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-03 12:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-02 16:48 Please Help: 2.6.37 a_ops->readpages is never called, 2.6.36 was fine Boaz Harrosh
2011-02-02 18:46 ` Problem found, but why??? Boaz Harrosh
2011-02-02 21:51   ` Chris Mason
2011-02-03 11:22 ` [PATCH] exofs: Fix read ahead BUG, caused by moving inodes to sb->s_bdi Boaz Harrosh
2011-02-03 12:12   ` Boaz Harrosh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.