All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 12:49 ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 12:49 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, linux-mm


On Christoph's suggestion, take the guts of the proposed
xfs_vm_page_mkwrite function and implement it as a generic
core function as it used no specific XFS code at all.

This allows any filesystem to easily hook the ->page_mkwrite()
VM callout to allow them to set up pages dirtied by mmap
writes correctly for later writeout.

Signed-Off-By: Dave Chinner <dgc@sgi.com>

---
 fs/buffer.c                 |   30 ++++++++++++++++++++++++++++++
 include/linux/buffer_head.h |    2 ++
 2 files changed, 32 insertions(+)

Index: 2.6.x-xfs-new/fs/buffer.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/buffer.c	2007-02-07 23:00:05.000000000 +1100
+++ 2.6.x-xfs-new/fs/buffer.c	2007-02-07 23:09:47.642356116 +1100
@@ -2194,6 +2194,36 @@ int generic_commit_write(struct file *fi
 	return 0;
 }
 
+/*
+ * block_page_mkwrite() is not allowed to change the file size as
+ * it gets called from a page fault handler when a page is first
+ * dirtied. Hence we must be careful to check for EOF conditions
+ * here. We set the page up correctly for a written page which means
+ * we get ENOSPC checking when writing into holes and correct
+ * delalloc and unwritten extent mapping on filesystems that support
+ * these features.
+ */
+int
+block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+		   get_block_t get_block)
+{
+	struct inode	*inode = vma->vm_file->f_path.dentry->d_inode;
+	unsigned long	end;
+	int		ret = 0;
+
+	if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
+		end = i_size_read(inode) & ~PAGE_CACHE_MASK;
+	else
+		end = PAGE_CACHE_SIZE;
+
+	lock_page(page);
+	ret = block_prepare_write(page, 0, end, get_block);
+	if (!ret)
+		ret = block_commit_write(page, 0, end);
+	unlock_page(page);
+
+	return ret;
+}
 
 /*
  * nobh_prepare_write()'s prereads are special: the buffer_heads are freed
Index: 2.6.x-xfs-new/include/linux/buffer_head.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/buffer_head.h	2007-02-07 23:00:02.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/buffer_head.h	2007-02-07 23:12:33.156749344 +1100
@@ -206,6 +206,8 @@ int cont_prepare_write(struct page*, uns
 int generic_cont_expand(struct inode *inode, loff_t size);
 int generic_cont_expand_simple(struct inode *inode, loff_t size);
 int block_commit_write(struct page *page, unsigned from, unsigned to);
+int block_page_mkwrite(struct vma_area_struct *vma, struct page *page,
+				get_block_t get_block);
 void block_sync_page(struct page *);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
 int generic_commit_write(struct file *, struct page *, unsigned, unsigned);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 12:49 ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 12:49 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, linux-mm

On Christoph's suggestion, take the guts of the proposed
xfs_vm_page_mkwrite function and implement it as a generic
core function as it used no specific XFS code at all.

This allows any filesystem to easily hook the ->page_mkwrite()
VM callout to allow them to set up pages dirtied by mmap
writes correctly for later writeout.

Signed-Off-By: Dave Chinner <dgc@sgi.com>

---
 fs/buffer.c                 |   30 ++++++++++++++++++++++++++++++
 include/linux/buffer_head.h |    2 ++
 2 files changed, 32 insertions(+)

Index: 2.6.x-xfs-new/fs/buffer.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/buffer.c	2007-02-07 23:00:05.000000000 +1100
+++ 2.6.x-xfs-new/fs/buffer.c	2007-02-07 23:09:47.642356116 +1100
@@ -2194,6 +2194,36 @@ int generic_commit_write(struct file *fi
 	return 0;
 }
 
+/*
+ * block_page_mkwrite() is not allowed to change the file size as
+ * it gets called from a page fault handler when a page is first
+ * dirtied. Hence we must be careful to check for EOF conditions
+ * here. We set the page up correctly for a written page which means
+ * we get ENOSPC checking when writing into holes and correct
+ * delalloc and unwritten extent mapping on filesystems that support
+ * these features.
+ */
+int
+block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+		   get_block_t get_block)
+{
+	struct inode	*inode = vma->vm_file->f_path.dentry->d_inode;
+	unsigned long	end;
+	int		ret = 0;
+
+	if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
+		end = i_size_read(inode) & ~PAGE_CACHE_MASK;
+	else
+		end = PAGE_CACHE_SIZE;
+
+	lock_page(page);
+	ret = block_prepare_write(page, 0, end, get_block);
+	if (!ret)
+		ret = block_commit_write(page, 0, end);
+	unlock_page(page);
+
+	return ret;
+}
 
 /*
  * nobh_prepare_write()'s prereads are special: the buffer_heads are freed
Index: 2.6.x-xfs-new/include/linux/buffer_head.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/buffer_head.h	2007-02-07 23:00:02.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/buffer_head.h	2007-02-07 23:12:33.156749344 +1100
@@ -206,6 +206,8 @@ int cont_prepare_write(struct page*, uns
 int generic_cont_expand(struct inode *inode, loff_t size);
 int generic_cont_expand_simple(struct inode *inode, loff_t size);
 int block_commit_write(struct page *page, unsigned from, unsigned to);
+int block_page_mkwrite(struct vma_area_struct *vma, struct page *page,
+				get_block_t get_block);
 void block_sync_page(struct page *);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
 int generic_commit_write(struct file *, struct page *, unsigned, unsigned);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 12:49 ` David Chinner
@ 2007-02-07 12:55   ` David Chinner
  -1 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 12:55 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, linux-mm


Mental Note: must remember to refresh patch after fixing compile
errors. New patch attached.

-

On Christoph's suggestion, take the guts of the proposed
xfs_vm_page_mkwrite function and implement it as a generic
core function as it used no specific XFS code at all.

This allows any filesystem to easily hook the ->page_mkwrite()
VM callout to allow them to set up pages dirtied by mmap
writes correctly for later writeout.

Signed-Off-By: Dave Chinner <dgc@sgi.com>


---
 fs/buffer.c                 |   30 ++++++++++++++++++++++++++++++
 include/linux/buffer_head.h |    2 ++
 2 files changed, 32 insertions(+)

Index: 2.6.x-xfs-new/fs/buffer.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/buffer.c	2007-02-07 23:00:05.000000000 +1100
+++ 2.6.x-xfs-new/fs/buffer.c	2007-02-07 23:09:47.642356116 +1100
@@ -2194,6 +2194,36 @@ int generic_commit_write(struct file *fi
 	return 0;
 }
 
+/*
+ * block_page_mkwrite() is not allowed to change the file size as
+ * it gets called from a page fault handler when a page is first
+ * dirtied. Hence we must be careful to check for EOF conditions
+ * here. We set the page up correctly for a written page which means
+ * we get ENOSPC checking when writing into holes and correct
+ * delalloc and unwritten extent mapping on filesystems that support
+ * these features.
+ */
+int
+block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+		   get_block_t get_block)
+{
+	struct inode	*inode = vma->vm_file->f_path.dentry->d_inode;
+	unsigned long	end;
+	int		ret = 0;
+
+	if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
+		end = i_size_read(inode) & ~PAGE_CACHE_MASK;
+	else
+		end = PAGE_CACHE_SIZE;
+
+	lock_page(page);
+	ret = block_prepare_write(page, 0, end, get_block);
+	if (!ret)
+		ret = block_commit_write(page, 0, end);
+	unlock_page(page);
+
+	return ret;
+}
 
 /*
  * nobh_prepare_write()'s prereads are special: the buffer_heads are freed
Index: 2.6.x-xfs-new/include/linux/buffer_head.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/buffer_head.h	2007-02-07 23:00:02.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/buffer_head.h	2007-02-07 23:19:25.110816993 +1100
@@ -206,6 +206,8 @@ int cont_prepare_write(struct page*, uns
 int generic_cont_expand(struct inode *inode, loff_t size);
 int generic_cont_expand_simple(struct inode *inode, loff_t size);
 int block_commit_write(struct page *page, unsigned from, unsigned to);
+int block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+				get_block_t get_block);
 void block_sync_page(struct page *);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
 int generic_commit_write(struct file *, struct page *, unsigned, unsigned);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 12:55   ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 12:55 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, linux-mm

Mental Note: must remember to refresh patch after fixing compile
errors. New patch attached.

-

On Christoph's suggestion, take the guts of the proposed
xfs_vm_page_mkwrite function and implement it as a generic
core function as it used no specific XFS code at all.

This allows any filesystem to easily hook the ->page_mkwrite()
VM callout to allow them to set up pages dirtied by mmap
writes correctly for later writeout.

Signed-Off-By: Dave Chinner <dgc@sgi.com>


---
 fs/buffer.c                 |   30 ++++++++++++++++++++++++++++++
 include/linux/buffer_head.h |    2 ++
 2 files changed, 32 insertions(+)

Index: 2.6.x-xfs-new/fs/buffer.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/buffer.c	2007-02-07 23:00:05.000000000 +1100
+++ 2.6.x-xfs-new/fs/buffer.c	2007-02-07 23:09:47.642356116 +1100
@@ -2194,6 +2194,36 @@ int generic_commit_write(struct file *fi
 	return 0;
 }
 
+/*
+ * block_page_mkwrite() is not allowed to change the file size as
+ * it gets called from a page fault handler when a page is first
+ * dirtied. Hence we must be careful to check for EOF conditions
+ * here. We set the page up correctly for a written page which means
+ * we get ENOSPC checking when writing into holes and correct
+ * delalloc and unwritten extent mapping on filesystems that support
+ * these features.
+ */
+int
+block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+		   get_block_t get_block)
+{
+	struct inode	*inode = vma->vm_file->f_path.dentry->d_inode;
+	unsigned long	end;
+	int		ret = 0;
+
+	if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
+		end = i_size_read(inode) & ~PAGE_CACHE_MASK;
+	else
+		end = PAGE_CACHE_SIZE;
+
+	lock_page(page);
+	ret = block_prepare_write(page, 0, end, get_block);
+	if (!ret)
+		ret = block_commit_write(page, 0, end);
+	unlock_page(page);
+
+	return ret;
+}
 
 /*
  * nobh_prepare_write()'s prereads are special: the buffer_heads are freed
Index: 2.6.x-xfs-new/include/linux/buffer_head.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/buffer_head.h	2007-02-07 23:00:02.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/buffer_head.h	2007-02-07 23:19:25.110816993 +1100
@@ -206,6 +206,8 @@ int cont_prepare_write(struct page*, uns
 int generic_cont_expand(struct inode *inode, loff_t size);
 int generic_cont_expand_simple(struct inode *inode, loff_t size);
 int block_commit_write(struct page *page, unsigned from, unsigned to);
+int block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+				get_block_t get_block);
 void block_sync_page(struct page *);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
 int generic_commit_write(struct file *, struct page *, unsigned, unsigned);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 12:49 ` David Chinner
@ 2007-02-07 13:00   ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2007-02-07 13:00 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs, linux-fsdevel, linux-mm

On Wed, 7 Feb 2007, David Chinner wrote:

> On Christoph's suggestion, take the guts of the proposed
> xfs_vm_page_mkwrite function and implement it as a generic
> core function as it used no specific XFS code at all.
> 
> This allows any filesystem to easily hook the ->page_mkwrite()
> VM callout to allow them to set up pages dirtied by mmap
> writes correctly for later writeout.
> 
> Signed-Off-By: Dave Chinner <dgc@sgi.com>

I'm worried about concurrent truncation.  Isn't it the case that
i_mutex is held when prepare_write and commit_write are normally
called?  But not here when page_mkwrite is called.

Hugh

> 
> ---
>  fs/buffer.c                 |   30 ++++++++++++++++++++++++++++++
>  include/linux/buffer_head.h |    2 ++
>  2 files changed, 32 insertions(+)
> 
> Index: 2.6.x-xfs-new/fs/buffer.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/buffer.c	2007-02-07 23:00:05.000000000 +1100
> +++ 2.6.x-xfs-new/fs/buffer.c	2007-02-07 23:09:47.642356116 +1100
> @@ -2194,6 +2194,36 @@ int generic_commit_write(struct file *fi
>  	return 0;
>  }
>  
> +/*
> + * block_page_mkwrite() is not allowed to change the file size as
> + * it gets called from a page fault handler when a page is first
> + * dirtied. Hence we must be careful to check for EOF conditions
> + * here. We set the page up correctly for a written page which means
> + * we get ENOSPC checking when writing into holes and correct
> + * delalloc and unwritten extent mapping on filesystems that support
> + * these features.
> + */
> +int
> +block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> +		   get_block_t get_block)
> +{
> +	struct inode	*inode = vma->vm_file->f_path.dentry->d_inode;
> +	unsigned long	end;
> +	int		ret = 0;
> +
> +	if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
> +		end = i_size_read(inode) & ~PAGE_CACHE_MASK;
> +	else
> +		end = PAGE_CACHE_SIZE;
> +
> +	lock_page(page);
> +	ret = block_prepare_write(page, 0, end, get_block);
> +	if (!ret)
> +		ret = block_commit_write(page, 0, end);
> +	unlock_page(page);
> +
> +	return ret;
> +}
>  
>  /*
>   * nobh_prepare_write()'s prereads are special: the buffer_heads are freed
> Index: 2.6.x-xfs-new/include/linux/buffer_head.h
> ===================================================================
> --- 2.6.x-xfs-new.orig/include/linux/buffer_head.h	2007-02-07 23:00:02.000000000 +1100
> +++ 2.6.x-xfs-new/include/linux/buffer_head.h	2007-02-07 23:12:33.156749344 +1100
> @@ -206,6 +206,8 @@ int cont_prepare_write(struct page*, uns
>  int generic_cont_expand(struct inode *inode, loff_t size);
>  int generic_cont_expand_simple(struct inode *inode, loff_t size);
>  int block_commit_write(struct page *page, unsigned from, unsigned to);
> +int block_page_mkwrite(struct vma_area_struct *vma, struct page *page,
> +				get_block_t get_block);
>  void block_sync_page(struct page *);
>  sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
>  int generic_commit_write(struct file *, struct page *, unsigned, unsigned);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 13:00   ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2007-02-07 13:00 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs, linux-fsdevel, linux-mm

On Wed, 7 Feb 2007, David Chinner wrote:

> On Christoph's suggestion, take the guts of the proposed
> xfs_vm_page_mkwrite function and implement it as a generic
> core function as it used no specific XFS code at all.
> 
> This allows any filesystem to easily hook the ->page_mkwrite()
> VM callout to allow them to set up pages dirtied by mmap
> writes correctly for later writeout.
> 
> Signed-Off-By: Dave Chinner <dgc@sgi.com>

I'm worried about concurrent truncation.  Isn't it the case that
i_mutex is held when prepare_write and commit_write are normally
called?  But not here when page_mkwrite is called.

Hugh

> 
> ---
>  fs/buffer.c                 |   30 ++++++++++++++++++++++++++++++
>  include/linux/buffer_head.h |    2 ++
>  2 files changed, 32 insertions(+)
> 
> Index: 2.6.x-xfs-new/fs/buffer.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/buffer.c	2007-02-07 23:00:05.000000000 +1100
> +++ 2.6.x-xfs-new/fs/buffer.c	2007-02-07 23:09:47.642356116 +1100
> @@ -2194,6 +2194,36 @@ int generic_commit_write(struct file *fi
>  	return 0;
>  }
>  
> +/*
> + * block_page_mkwrite() is not allowed to change the file size as
> + * it gets called from a page fault handler when a page is first
> + * dirtied. Hence we must be careful to check for EOF conditions
> + * here. We set the page up correctly for a written page which means
> + * we get ENOSPC checking when writing into holes and correct
> + * delalloc and unwritten extent mapping on filesystems that support
> + * these features.
> + */
> +int
> +block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> +		   get_block_t get_block)
> +{
> +	struct inode	*inode = vma->vm_file->f_path.dentry->d_inode;
> +	unsigned long	end;
> +	int		ret = 0;
> +
> +	if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
> +		end = i_size_read(inode) & ~PAGE_CACHE_MASK;
> +	else
> +		end = PAGE_CACHE_SIZE;
> +
> +	lock_page(page);
> +	ret = block_prepare_write(page, 0, end, get_block);
> +	if (!ret)
> +		ret = block_commit_write(page, 0, end);
> +	unlock_page(page);
> +
> +	return ret;
> +}
>  
>  /*
>   * nobh_prepare_write()'s prereads are special: the buffer_heads are freed
> Index: 2.6.x-xfs-new/include/linux/buffer_head.h
> ===================================================================
> --- 2.6.x-xfs-new.orig/include/linux/buffer_head.h	2007-02-07 23:00:02.000000000 +1100
> +++ 2.6.x-xfs-new/include/linux/buffer_head.h	2007-02-07 23:12:33.156749344 +1100
> @@ -206,6 +206,8 @@ int cont_prepare_write(struct page*, uns
>  int generic_cont_expand(struct inode *inode, loff_t size);
>  int generic_cont_expand_simple(struct inode *inode, loff_t size);
>  int block_commit_write(struct page *page, unsigned from, unsigned to);
> +int block_page_mkwrite(struct vma_area_struct *vma, struct page *page,
> +				get_block_t get_block);
>  void block_sync_page(struct page *);
>  sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
>  int generic_commit_write(struct file *, struct page *, unsigned, unsigned);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 13:00   ` Hugh Dickins
@ 2007-02-07 14:44     ` David Chinner
  -1 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 14:44 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: David Chinner, xfs, linux-fsdevel, linux-mm

On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> On Wed, 7 Feb 2007, David Chinner wrote:
> 
> > On Christoph's suggestion, take the guts of the proposed
> > xfs_vm_page_mkwrite function and implement it as a generic
> > core function as it used no specific XFS code at all.
> > 
> > This allows any filesystem to easily hook the ->page_mkwrite()
> > VM callout to allow them to set up pages dirtied by mmap
> > writes correctly for later writeout.
> > 
> > Signed-Off-By: Dave Chinner <dgc@sgi.com>
> 
> I'm worried about concurrent truncation.  Isn't it the case that
> i_mutex is held when prepare_write and commit_write are normally
> called?  But not here when page_mkwrite is called.

I'm not holding i_mutex. I assumed that it was probably safe to do
because we are likely to be reading the page off disk just before we
call mkwrite and that has to be synchronised with truncate in some
manner....

So, do I need to grab the i_mutex here? Is that safe to do that in
the middle of a page fault? If we do race with a truncate and the
page is now beyond EOF, what am I supposed to return?

I'm fishing for what I'm supposed to be doing here because there's
zero implementations of this callout in the kernel and the comments
in the code explaining the interface constraints are
non-existant....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 14:44     ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 14:44 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: David Chinner, xfs, linux-fsdevel, linux-mm

On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> On Wed, 7 Feb 2007, David Chinner wrote:
> 
> > On Christoph's suggestion, take the guts of the proposed
> > xfs_vm_page_mkwrite function and implement it as a generic
> > core function as it used no specific XFS code at all.
> > 
> > This allows any filesystem to easily hook the ->page_mkwrite()
> > VM callout to allow them to set up pages dirtied by mmap
> > writes correctly for later writeout.
> > 
> > Signed-Off-By: Dave Chinner <dgc@sgi.com>
> 
> I'm worried about concurrent truncation.  Isn't it the case that
> i_mutex is held when prepare_write and commit_write are normally
> called?  But not here when page_mkwrite is called.

I'm not holding i_mutex. I assumed that it was probably safe to do
because we are likely to be reading the page off disk just before we
call mkwrite and that has to be synchronised with truncate in some
manner....

So, do I need to grab the i_mutex here? Is that safe to do that in
the middle of a page fault? If we do race with a truncate and the
page is now beyond EOF, what am I supposed to return?

I'm fishing for what I'm supposed to be doing here because there's
zero implementations of this callout in the kernel and the comments
in the code explaining the interface constraints are
non-existant....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 14:44     ` David Chinner
@ 2007-02-07 15:52       ` Chris Mason
  -1 siblings, 0 replies; 20+ messages in thread
From: Chris Mason @ 2007-02-07 15:52 UTC (permalink / raw)
  To: David Chinner; +Cc: Hugh Dickins, xfs, linux-fsdevel, linux-mm

On Thu, Feb 08, 2007 at 01:44:15AM +1100, David Chinner wrote:
> On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> > On Wed, 7 Feb 2007, David Chinner wrote:
> > 
> > > On Christoph's suggestion, take the guts of the proposed
> > > xfs_vm_page_mkwrite function and implement it as a generic
> > > core function as it used no specific XFS code at all.
> > > 
> > > This allows any filesystem to easily hook the ->page_mkwrite()
> > > VM callout to allow them to set up pages dirtied by mmap
> > > writes correctly for later writeout.
> > > 
> > > Signed-Off-By: Dave Chinner <dgc@sgi.com>
> > 
> > I'm worried about concurrent truncation.  Isn't it the case that
> > i_mutex is held when prepare_write and commit_write are normally
> > called?  But not here when page_mkwrite is called.
> 
> I'm not holding i_mutex. I assumed that it was probably safe to do
> because we are likely to be reading the page off disk just before we
> call mkwrite and that has to be synchronised with truncate in some
> manner....

In general, commit_write is allowed to update i_size, and prepare/commit
are called with i_mutex.  block_prepare_write and block_commit_write
both look safe to me for calling with only the page lock held.  It more
or less translates to: call get_block in a sane fashion and zero out the
parts of the page past eof.

But, if someone copies the code and puts their own fancy
prepare/commit_write in there, they will get in trouble in a hurry...

> 
> So, do I need to grab the i_mutex here? Is that safe to do that in
> the middle of a page fault? If we do race with a truncate and the
> page is now beyond EOF, what am I supposed to return?

Should it check to make sure the page is still in the address space
after locking it?

-chris


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 15:52       ` Chris Mason
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Mason @ 2007-02-07 15:52 UTC (permalink / raw)
  To: David Chinner; +Cc: Hugh Dickins, xfs, linux-fsdevel, linux-mm

On Thu, Feb 08, 2007 at 01:44:15AM +1100, David Chinner wrote:
> On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> > On Wed, 7 Feb 2007, David Chinner wrote:
> > 
> > > On Christoph's suggestion, take the guts of the proposed
> > > xfs_vm_page_mkwrite function and implement it as a generic
> > > core function as it used no specific XFS code at all.
> > > 
> > > This allows any filesystem to easily hook the ->page_mkwrite()
> > > VM callout to allow them to set up pages dirtied by mmap
> > > writes correctly for later writeout.
> > > 
> > > Signed-Off-By: Dave Chinner <dgc@sgi.com>
> > 
> > I'm worried about concurrent truncation.  Isn't it the case that
> > i_mutex is held when prepare_write and commit_write are normally
> > called?  But not here when page_mkwrite is called.
> 
> I'm not holding i_mutex. I assumed that it was probably safe to do
> because we are likely to be reading the page off disk just before we
> call mkwrite and that has to be synchronised with truncate in some
> manner....

In general, commit_write is allowed to update i_size, and prepare/commit
are called with i_mutex.  block_prepare_write and block_commit_write
both look safe to me for calling with only the page lock held.  It more
or less translates to: call get_block in a sane fashion and zero out the
parts of the page past eof.

But, if someone copies the code and puts their own fancy
prepare/commit_write in there, they will get in trouble in a hurry...

> 
> So, do I need to grab the i_mutex here? Is that safe to do that in
> the middle of a page fault? If we do race with a truncate and the
> page is now beyond EOF, what am I supposed to return?

Should it check to make sure the page is still in the address space
after locking it?

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 14:44     ` David Chinner
@ 2007-02-07 15:56       ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2007-02-07 15:56 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs, linux-fsdevel, linux-mm

On Thu, 8 Feb 2007, David Chinner wrote:
> On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> > 
> > I'm worried about concurrent truncation.  Isn't it the case that
> > i_mutex is held when prepare_write and commit_write are normally
> > called?  But not here when page_mkwrite is called.
> 
> I'm not holding i_mutex. I assumed that it was probably safe to do
> because we are likely to be reading the page off disk just before we
> call mkwrite and that has to be synchronised with truncate in some
> manner....

"assumed"..."probably"..."likely"..."just before"..."in some manner"
doesn't sound very safe, does it :-?

The well-established paths are almost safe against truncation (I insert
"almost" there because although we like to think they're entirely safe,
from time to time a hole is discovered, and Nick has been wrestling
with filling them for some while now).

But page_mkwrite is something new: so far, it's up to the implementor
(the filesystem) to work out how to guard against truncation.

> 
> So, do I need to grab the i_mutex here? Is that safe to do that in
> the middle of a page fault?

It's certainly easier to think about if you don't grab i_mutex there:
sys_msync used to take i_mutex within down_read of mmap_sem, but we
were quite happy to get rid of that, because usually it's down_read
of mmap_sem within i_mutex (page fault when writing from userspace
to file).  I can't at this moment put my finger on an actual deadlock
if you take i_mutex in page_mkwrite, but it feels wrong: hmm, if you
add another thread waiting to down_write the mmap_sem, I think you
would be able to deadlock.

You don't need to lock out all truncation, but you do need to lock
out truncation of the page in question.  Instead of your i_size
checks, check page->mapping isn't NULL after the lock_page?

But aside from the truncation issue, if prepare_write and commit_write
are always called with i_mutex held at present, I'm doubtful whether
you can provide a generic default page_mkwrite which calls them without.
Which would take us back to grabbing i_mutex within page_mkwrite.  Ugh.

> If we do race with a truncate and the
> page is now beyond EOF, what am I supposed to return?

Something negative.  Nothing presently reports the error code in
question, it just does SIGBUS; but it would be better for the
truncation case to avoid -ENOMEM and -ENOSPC, which could easily
have meanings here.  I don't see a good choice, so maybe -EINVAL.

> 
> I'm fishing for what I'm supposed to be doing here because there's
> zero implementations of this callout in the kernel and the comments
> in the code explaining the interface constraints are
> non-existant....

Well, you seem to be the first to implement it.  Hmm, that's not true,
David was: what magic saved him from addressing the truncation issue?

Don't be surprised if it turns out page_mkwrite needs more thought.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 15:56       ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2007-02-07 15:56 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs, linux-fsdevel, linux-mm

On Thu, 8 Feb 2007, David Chinner wrote:
> On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> > 
> > I'm worried about concurrent truncation.  Isn't it the case that
> > i_mutex is held when prepare_write and commit_write are normally
> > called?  But not here when page_mkwrite is called.
> 
> I'm not holding i_mutex. I assumed that it was probably safe to do
> because we are likely to be reading the page off disk just before we
> call mkwrite and that has to be synchronised with truncate in some
> manner....

"assumed"..."probably"..."likely"..."just before"..."in some manner"
doesn't sound very safe, does it :-?

The well-established paths are almost safe against truncation (I insert
"almost" there because although we like to think they're entirely safe,
from time to time a hole is discovered, and Nick has been wrestling
with filling them for some while now).

But page_mkwrite is something new: so far, it's up to the implementor
(the filesystem) to work out how to guard against truncation.

> 
> So, do I need to grab the i_mutex here? Is that safe to do that in
> the middle of a page fault?

It's certainly easier to think about if you don't grab i_mutex there:
sys_msync used to take i_mutex within down_read of mmap_sem, but we
were quite happy to get rid of that, because usually it's down_read
of mmap_sem within i_mutex (page fault when writing from userspace
to file).  I can't at this moment put my finger on an actual deadlock
if you take i_mutex in page_mkwrite, but it feels wrong: hmm, if you
add another thread waiting to down_write the mmap_sem, I think you
would be able to deadlock.

You don't need to lock out all truncation, but you do need to lock
out truncation of the page in question.  Instead of your i_size
checks, check page->mapping isn't NULL after the lock_page?

But aside from the truncation issue, if prepare_write and commit_write
are always called with i_mutex held at present, I'm doubtful whether
you can provide a generic default page_mkwrite which calls them without.
Which would take us back to grabbing i_mutex within page_mkwrite.  Ugh.

> If we do race with a truncate and the
> page is now beyond EOF, what am I supposed to return?

Something negative.  Nothing presently reports the error code in
question, it just does SIGBUS; but it would be better for the
truncation case to avoid -ENOMEM and -ENOSPC, which could easily
have meanings here.  I don't see a good choice, so maybe -EINVAL.

> 
> I'm fishing for what I'm supposed to be doing here because there's
> zero implementations of this callout in the kernel and the comments
> in the code explaining the interface constraints are
> non-existant....

Well, you seem to be the first to implement it.  Hmm, that's not true,
David was: what magic saved him from addressing the truncation issue?

Don't be surprised if it turns out page_mkwrite needs more thought.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 15:56       ` Hugh Dickins
@ 2007-02-07 22:50         ` David Chinner
  -1 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 22:50 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: xfs, linux-fsdevel, linux-mm

On Wed, Feb 07, 2007 at 03:56:15PM +0000, Hugh Dickins wrote:
> On Thu, 8 Feb 2007, David Chinner wrote:
> > On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> > > 
> > > I'm worried about concurrent truncation.  Isn't it the case that
> > > i_mutex is held when prepare_write and commit_write are normally
> > > called?  But not here when page_mkwrite is called.
> > 
> > I'm not holding i_mutex. I assumed that it was probably safe to do
> > because we are likely to be reading the page off disk just before we
> > call mkwrite and that has to be synchronised with truncate in some
> > manner....
> 
> "assumed"..."probably"..."likely"..."just before"..."in some manner"
> doesn't sound very safe, does it :-?

You're right, it doesn't sound safe, does it? Why do you think I
posted the patches for comment?

> But page_mkwrite is something new: so far, it's up to the implementor
> (the filesystem) to work out how to guard against truncation.

Ok. I can do that now I know I have to.

> > So, do I need to grab the i_mutex here? Is that safe to do that in
> > the middle of a page fault?
> 
> It's certainly easier to think about if you don't grab i_mutex there:
> sys_msync used to take i_mutex within down_read of mmap_sem, but we
> were quite happy to get rid of that, because usually it's down_read
> of mmap_sem within i_mutex (page fault when writing from userspace
> to file).  I can't at this moment put my finger on an actual deadlock
> if you take i_mutex in page_mkwrite, but it feels wrong: hmm, if you
> add another thread waiting to down_write the mmap_sem, I think you
> would be able to deadlock.

Right, so i_mutex is out. That needs to be commented in big flashing
neon lights somewhere in the code.

> You don't need to lock out all truncation, but you do need to lock
> out truncation of the page in question.  Instead of your i_size
> checks, check page->mapping isn't NULL after the lock_page?

Yes, that can be done, but we still need to know if part of
the page is beyond EOF for when we call block_commit_write()
and mark buffers dirty. Hence we need to check the inode size.

I guess if we block the truncate with the page lock, then the
inode size is not going to change until we unlock the page.
If the inode size has already been changed but the page not yet
removed from the mapping we'll be beyond EOF.

So it seems to me that we can get away with not using the i_mutex
in the generic code here.

> But aside from the truncation issue, if prepare_write and commit_write
> are always called with i_mutex held at present, I'm doubtful whether
> you can provide a generic default page_mkwrite which calls them without.
> Which would take us back to grabbing i_mutex within page_mkwrite.  Ugh.

The only thing that is asserted as a requirement for
block_prepare_write is that the page is locked. Apart fromteh page
truncation issue, it is safe to do this at least on XFS because it
has internal locks that ensure sanity even when the i_mutex is not
held.  If a particular filesystem has different locking
requirements, they can be met in the filesystem wrapper function
(e.g.  xfs_vm_page_mkwrite()) which calls block_page_mkwrite().

> > If we do race with a truncate and the
> > page is now beyond EOF, what am I supposed to return?
> 
> Something negative.  Nothing presently reports the error code in
> question, it just does SIGBUS; but it would be better for the
> truncation case to avoid -ENOMEM and -ENOSPC, which could easily
> have meanings here.  I don't see a good choice, so maybe -EINVAL.

Ok.

> > I'm fishing for what I'm supposed to be doing here because there's
> > zero implementations of this callout in the kernel and the comments
> > in the code explaining the interface constraints are
> > non-existant....
> 
> Well, you seem to be the first to implement it.  Hmm, that's not true,
> David was: what magic saved him from addressing the truncation issue?

No idea. His code is not in mainline....

> Don't be surprised if it turns out page_mkwrite needs more thought.

I'll add a patch to the series adding some comments on the restrictions
placed on implementers of this function.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-07 22:50         ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-07 22:50 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: xfs, linux-fsdevel, linux-mm

On Wed, Feb 07, 2007 at 03:56:15PM +0000, Hugh Dickins wrote:
> On Thu, 8 Feb 2007, David Chinner wrote:
> > On Wed, Feb 07, 2007 at 01:00:28PM +0000, Hugh Dickins wrote:
> > > 
> > > I'm worried about concurrent truncation.  Isn't it the case that
> > > i_mutex is held when prepare_write and commit_write are normally
> > > called?  But not here when page_mkwrite is called.
> > 
> > I'm not holding i_mutex. I assumed that it was probably safe to do
> > because we are likely to be reading the page off disk just before we
> > call mkwrite and that has to be synchronised with truncate in some
> > manner....
> 
> "assumed"..."probably"..."likely"..."just before"..."in some manner"
> doesn't sound very safe, does it :-?

You're right, it doesn't sound safe, does it? Why do you think I
posted the patches for comment?

> But page_mkwrite is something new: so far, it's up to the implementor
> (the filesystem) to work out how to guard against truncation.

Ok. I can do that now I know I have to.

> > So, do I need to grab the i_mutex here? Is that safe to do that in
> > the middle of a page fault?
> 
> It's certainly easier to think about if you don't grab i_mutex there:
> sys_msync used to take i_mutex within down_read of mmap_sem, but we
> were quite happy to get rid of that, because usually it's down_read
> of mmap_sem within i_mutex (page fault when writing from userspace
> to file).  I can't at this moment put my finger on an actual deadlock
> if you take i_mutex in page_mkwrite, but it feels wrong: hmm, if you
> add another thread waiting to down_write the mmap_sem, I think you
> would be able to deadlock.

Right, so i_mutex is out. That needs to be commented in big flashing
neon lights somewhere in the code.

> You don't need to lock out all truncation, but you do need to lock
> out truncation of the page in question.  Instead of your i_size
> checks, check page->mapping isn't NULL after the lock_page?

Yes, that can be done, but we still need to know if part of
the page is beyond EOF for when we call block_commit_write()
and mark buffers dirty. Hence we need to check the inode size.

I guess if we block the truncate with the page lock, then the
inode size is not going to change until we unlock the page.
If the inode size has already been changed but the page not yet
removed from the mapping we'll be beyond EOF.

So it seems to me that we can get away with not using the i_mutex
in the generic code here.

> But aside from the truncation issue, if prepare_write and commit_write
> are always called with i_mutex held at present, I'm doubtful whether
> you can provide a generic default page_mkwrite which calls them without.
> Which would take us back to grabbing i_mutex within page_mkwrite.  Ugh.

The only thing that is asserted as a requirement for
block_prepare_write is that the page is locked. Apart fromteh page
truncation issue, it is safe to do this at least on XFS because it
has internal locks that ensure sanity even when the i_mutex is not
held.  If a particular filesystem has different locking
requirements, they can be met in the filesystem wrapper function
(e.g.  xfs_vm_page_mkwrite()) which calls block_page_mkwrite().

> > If we do race with a truncate and the
> > page is now beyond EOF, what am I supposed to return?
> 
> Something negative.  Nothing presently reports the error code in
> question, it just does SIGBUS; but it would be better for the
> truncation case to avoid -ENOMEM and -ENOSPC, which could easily
> have meanings here.  I don't see a good choice, so maybe -EINVAL.

Ok.

> > I'm fishing for what I'm supposed to be doing here because there's
> > zero implementations of this callout in the kernel and the comments
> > in the code explaining the interface constraints are
> > non-existant....
> 
> Well, you seem to be the first to implement it.  Hmm, that's not true,
> David was: what magic saved him from addressing the truncation issue?

No idea. His code is not in mainline....

> Don't be surprised if it turns out page_mkwrite needs more thought.

I'll add a patch to the series adding some comments on the restrictions
placed on implementers of this function.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 15:52       ` Chris Mason
@ 2007-02-08  2:34         ` Nick Piggin
  -1 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2007-02-08  2:34 UTC (permalink / raw)
  To: Chris Mason; +Cc: David Chinner, Hugh Dickins, xfs, linux-fsdevel, linux-mm

Chris Mason wrote:
> On Thu, Feb 08, 2007 at 01:44:15AM +1100, David Chinner wrote:

>>So, do I need to grab the i_mutex here? Is that safe to do that in
>>the middle of a page fault? If we do race with a truncate and the
>>page is now beyond EOF, what am I supposed to return?
> 
> 
> Should it check to make sure the page is still in the address space
> after locking it?

Yes. If the page was truncated/invalidated, then you can just return
and the pagefault handler should notice that it has been removed from
the page tables.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-08  2:34         ` Nick Piggin
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2007-02-08  2:34 UTC (permalink / raw)
  To: Chris Mason; +Cc: David Chinner, Hugh Dickins, xfs, linux-fsdevel, linux-mm

Chris Mason wrote:
> On Thu, Feb 08, 2007 at 01:44:15AM +1100, David Chinner wrote:

>>So, do I need to grab the i_mutex here? Is that safe to do that in
>>the middle of a page fault? If we do race with a truncate and the
>>page is now beyond EOF, what am I supposed to return?
> 
> 
> Should it check to make sure the page is still in the address space
> after locking it?

Yes. If the page was truncated/invalidated, then you can just return
and the pagefault handler should notice that it has been removed from
the page tables.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-07 22:50         ` David Chinner
@ 2007-02-08 13:11           ` Chris Mason
  -1 siblings, 0 replies; 20+ messages in thread
From: Chris Mason @ 2007-02-08 13:11 UTC (permalink / raw)
  To: David Chinner; +Cc: Hugh Dickins, xfs, linux-fsdevel, linux-mm

On Thu, Feb 08, 2007 at 09:50:13AM +1100, David Chinner wrote:
> > You don't need to lock out all truncation, but you do need to lock
> > out truncation of the page in question.  Instead of your i_size
> > checks, check page->mapping isn't NULL after the lock_page?
> 
> Yes, that can be done, but we still need to know if part of
> the page is beyond EOF for when we call block_commit_write()
> and mark buffers dirty. Hence we need to check the inode size.
> 
> I guess if we block the truncate with the page lock, then the
> inode size is not going to change until we unlock the page.
> If the inode size has already been changed but the page not yet
> removed from the mapping we'll be beyond EOF.
> 
> So it seems to me that we can get away with not using the i_mutex
> in the generic code here.

vmtruncate changes the inode size before waiting on any pages.  So,
i_size could change any time during page_mkwrite.

Since the patch does:
       if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
               end = i_size_read(inode) & ~PAGE_CACHE_MASK;
       else
               end = PAGE_CACHE_SIZE;

It would be a good idea to read i_size once and put it in a local var
instead.

The FS truncate op should be locking the last page in the file to make
sure it is properly zero filled.  The worst case should be that we zero
too many bytes in page_mkwrite (expanding truncate past our current
i_size), but at least it won't expose stale data.

-chris


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-08 13:11           ` Chris Mason
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Mason @ 2007-02-08 13:11 UTC (permalink / raw)
  To: David Chinner; +Cc: Hugh Dickins, xfs, linux-fsdevel, linux-mm

On Thu, Feb 08, 2007 at 09:50:13AM +1100, David Chinner wrote:
> > You don't need to lock out all truncation, but you do need to lock
> > out truncation of the page in question.  Instead of your i_size
> > checks, check page->mapping isn't NULL after the lock_page?
> 
> Yes, that can be done, but we still need to know if part of
> the page is beyond EOF for when we call block_commit_write()
> and mark buffers dirty. Hence we need to check the inode size.
> 
> I guess if we block the truncate with the page lock, then the
> inode size is not going to change until we unlock the page.
> If the inode size has already been changed but the page not yet
> removed from the mapping we'll be beyond EOF.
> 
> So it seems to me that we can get away with not using the i_mutex
> in the generic code here.

vmtruncate changes the inode size before waiting on any pages.  So,
i_size could change any time during page_mkwrite.

Since the patch does:
       if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode))
               end = i_size_read(inode) & ~PAGE_CACHE_MASK;
       else
               end = PAGE_CACHE_SIZE;

It would be a good idea to read i_size once and put it in a local var
instead.

The FS truncate op should be locking the last page in the file to make
sure it is properly zero filled.  The worst case should be that we zero
too many bytes in page_mkwrite (expanding truncate past our current
i_size), but at least it won't expose stale data.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
  2007-02-08 13:11           ` Chris Mason
@ 2007-02-08 22:30             ` David Chinner
  -1 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-08 22:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: David Chinner, Hugh Dickins, xfs, linux-fsdevel, linux-mm

On Thu, Feb 08, 2007 at 08:11:00AM -0500, Chris Mason wrote:
> On Thu, Feb 08, 2007 at 09:50:13AM +1100, David Chinner wrote:
> > > You don't need to lock out all truncation, but you do need to lock
> > > out truncation of the page in question.  Instead of your i_size
> > > checks, check page->mapping isn't NULL after the lock_page?
> > 
> > Yes, that can be done, but we still need to know if part of
> > the page is beyond EOF for when we call block_commit_write()
> > and mark buffers dirty. Hence we need to check the inode size.
> > 
> > I guess if we block the truncate with the page lock, then the
> > inode size is not going to change until we unlock the page.
> > If the inode size has already been changed but the page not yet
> > removed from the mapping we'll be beyond EOF.
> > 
> > So it seems to me that we can get away with not using the i_mutex
> > in the generic code here.
> 
> vmtruncate changes the inode size before waiting on any pages.  So,
> i_size could change any time during page_mkwrite.

Which would put us beyond EOF. Ok.

> It would be a good idea to read i_size once and put it in a local var
> instead.

Will do - I'll snap it once the page is locked....

Thanks Chris.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality
@ 2007-02-08 22:30             ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-08 22:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: David Chinner, Hugh Dickins, xfs, linux-fsdevel, linux-mm

On Thu, Feb 08, 2007 at 08:11:00AM -0500, Chris Mason wrote:
> On Thu, Feb 08, 2007 at 09:50:13AM +1100, David Chinner wrote:
> > > You don't need to lock out all truncation, but you do need to lock
> > > out truncation of the page in question.  Instead of your i_size
> > > checks, check page->mapping isn't NULL after the lock_page?
> > 
> > Yes, that can be done, but we still need to know if part of
> > the page is beyond EOF for when we call block_commit_write()
> > and mark buffers dirty. Hence we need to check the inode size.
> > 
> > I guess if we block the truncate with the page lock, then the
> > inode size is not going to change until we unlock the page.
> > If the inode size has already been changed but the page not yet
> > removed from the mapping we'll be beyond EOF.
> > 
> > So it seems to me that we can get away with not using the i_mutex
> > in the generic code here.
> 
> vmtruncate changes the inode size before waiting on any pages.  So,
> i_size could change any time during page_mkwrite.

Which would put us beyond EOF. Ok.

> It would be a good idea to read i_size once and put it in a local var
> instead.

Will do - I'll snap it once the page is locked....

Thanks Chris.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-02-08 22:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-07 12:49 [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality David Chinner
2007-02-07 12:49 ` David Chinner
2007-02-07 12:55 ` David Chinner
2007-02-07 12:55   ` David Chinner
2007-02-07 13:00 ` Hugh Dickins
2007-02-07 13:00   ` Hugh Dickins
2007-02-07 14:44   ` David Chinner
2007-02-07 14:44     ` David Chinner
2007-02-07 15:52     ` Chris Mason
2007-02-07 15:52       ` Chris Mason
2007-02-08  2:34       ` Nick Piggin
2007-02-08  2:34         ` Nick Piggin
2007-02-07 15:56     ` Hugh Dickins
2007-02-07 15:56       ` Hugh Dickins
2007-02-07 22:50       ` David Chinner
2007-02-07 22:50         ` David Chinner
2007-02-08 13:11         ` Chris Mason
2007-02-08 13:11           ` Chris Mason
2007-02-08 22:30           ` David Chinner
2007-02-08 22:30             ` David Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.