* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 4) @ 2007-02-15 7:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management Various little cleanups and commenting fixes. Fixed up the patchset so each one, incrementally, should give a properly compiling and running kernel. I'd still like Hugh to ack the anon/swap changes when he can find the time. It would be desirable to get at least one ack as to the overall problem and design of the fix (Martin's ack is just for the s390 changes at this stage). Meanwhile, can it go into -mm for wider testing, if it isn't too much trouble? Thanks, Nick -- SuSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 4) @ 2007-02-15 7:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management Various little cleanups and commenting fixes. Fixed up the patchset so each one, incrementally, should give a properly compiling and running kernel. I'd still like Hugh to ack the anon/swap changes when he can find the time. It would be desirable to get at least one ack as to the overall problem and design of the fix (Martin's ack is just for the s390 changes at this stage). Meanwhile, can it go into -mm for wider testing, if it isn't too much trouble? Thanks, Nick -- SuSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 1/3] mm: make read_cache_page synchronous 2007-02-15 7:31 ` Nick Piggin @ 2007-02-15 7:31 ` Nick Piggin -1 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> drivers/mtd/devices/block2mtd.c | 3 -- fs/afs/dir.c | 3 -- fs/afs/mntpt.c | 11 +++----- fs/cramfs/inode.c | 3 +- fs/ecryptfs/mmap.c | 11 -------- fs/ext2/dir.c | 3 -- fs/freevxfs/vxfs_subr.c | 3 -- fs/minix/dir.c | 1 fs/namei.c | 12 --------- fs/nfs/dir.c | 5 ---- fs/nfs/symlink.c | 6 ---- fs/ntfs/aops.h | 3 -- fs/ntfs/attrib.c | 18 +------------- fs/ntfs/file.c | 3 -- fs/ntfs/super.c | 30 +++--------------------- fs/ocfs2/symlink.c | 7 ----- fs/partitions/check.c | 3 -- fs/reiserfs/xattr.c | 4 --- fs/sysv/dir.c | 10 -------- fs/ufs/dir.c | 6 ---- fs/ufs/util.c | 6 +--- include/linux/pagemap.h | 11 ++++++++ mm/filemap.c | 49 +++++++++++++++++++++++++++++++--------- mm/swapfile.c | 3 -- 24 files changed, 70 insertions(+), 144 deletions(-) Index: linux-2.6/fs/afs/dir.c =================================================================== --- linux-2.6.orig/fs/afs/dir.c +++ linux-2.6/fs/afs/dir.c @@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str page = read_mapping_page(dir->i_mapping, index, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) afs_dir_check_page(dir, page); if (PageError(page)) Index: linux-2.6/fs/afs/mntpt.c =================================================================== --- linux-2.6.orig/fs/afs/mntpt.c +++ linux-2.6/fs/afs/mntpt.c @@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v } ret = -EIO; - wait_on_page_locked(page); - buf = kmap(page); - if (!PageUptodate(page)) - goto out_free; if (PageError(page)) goto out_free; + buf = kmap(page); + /* examine the symlink's contents */ size = vnode->status.size; _debug("symlink to %*.*s", size, (int) size, buf); @@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v ret = 0; - out_free: kunmap(page); + out_free: page_cache_release(page); out: _leave(" = %d", ret); @@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut } ret = -EIO; - wait_on_page_locked(page); - if (!PageUptodate(page) || PageError(page)) + if (PageError(page)) goto error; buf = kmap(page); Index: linux-2.6/fs/cramfs/inode.c =================================================================== --- linux-2.6.orig/fs/cramfs/inode.c +++ linux-2.6/fs/cramfs/inode.c @@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl struct page *page = NULL; if (blocknr + i < devsize) { - page = read_mapping_page(mapping, blocknr + i, NULL); + page = read_mapping_page_async(mapping, blocknr + i, + NULL); /* synchronous error? */ if (IS_ERR(page)) page = NULL; Index: linux-2.6/fs/ext2/dir.c =================================================================== --- linux-2.6.orig/fs/ext2/dir.c +++ linux-2.6/fs/ext2/dir.c @@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) ext2_check_page(page); if (PageError(page)) Index: linux-2.6/fs/freevxfs/vxfs_subr.c =================================================================== --- linux-2.6.orig/fs/freevxfs/vxfs_subr.c +++ linux-2.6/fs/freevxfs/vxfs_subr.c @@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp pp = read_mapping_page(mapping, n, NULL); if (!IS_ERR(pp)) { - wait_on_page_locked(pp); kmap(pp); - if (!PageUptodate(pp)) - goto fail; /** if (!PageChecked(pp)) **/ /** vxfs_check_page(pp); **/ if (PageError(pp)) Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1726,7 +1726,7 @@ int generic_file_readonly_mmap(struct fi EXPORT_SYMBOL(generic_file_mmap); EXPORT_SYMBOL(generic_file_readonly_mmap); -static inline struct page *__read_cache_page(struct address_space *mapping, +static struct page *__read_cache_page(struct address_space *mapping, unsigned long index, int (*filler)(void *,struct page*), void *data) @@ -1763,17 +1763,11 @@ repeat: return page; } -/** - * read_cache_page - read into page cache, fill it if needed - * @mapping: the page's address_space - * @index: the page index - * @filler: function to perform the read - * @data: destination for read data - * - * Read into the page cache. If a page already exists, - * and PageUptodate() is not set, try to fill the page. +/* + * Same as read_cache_page, but don't wait for page to become unlocked + * after submitting it to the filler. */ -struct page *read_cache_page(struct address_space *mapping, +struct page *read_cache_page_async(struct address_space *mapping, unsigned long index, int (*filler)(void *,struct page*), void *data) @@ -1805,6 +1799,39 @@ retry: page = ERR_PTR(err); } out: + mark_page_accessed(page); + return page; +} +EXPORT_SYMBOL(read_cache_page_async); + +/** + * read_cache_page - read into page cache, fill it if needed + * @mapping: the page's address_space + * @index: the page index + * @filler: function to perform the read + * @data: destination for read data + * + * Read into the page cache. If a page already exists, and PageUptodate() is + * not set, try to fill the page then wait for it to become unlocked. + * + * If the page does not get brought uptodate, return -EIO. + */ +struct page *read_cache_page(struct address_space *mapping, + unsigned long index, + int (*filler)(void *,struct page*), + void *data) +{ + struct page *page; + + page = read_cache_page_async(mapping, index, filler, data); + if (IS_ERR(page)) + goto out; + wait_on_page_locked(page); + if (!PageUptodate(page)) { + page_cache_release(page); + page = ERR_PTR(-EIO); + } + out: return page; } EXPORT_SYMBOL(read_cache_page); Index: linux-2.6/fs/minix/dir.c =================================================================== --- linux-2.6.orig/fs/minix/dir.c +++ linux-2.6/fs/minix/dir.c @@ -62,7 +62,6 @@ static struct page * dir_get_page(struct struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); if (!PageUptodate(page)) goto fail; Index: linux-2.6/fs/namei.c =================================================================== --- linux-2.6.orig/fs/namei.c +++ linux-2.6/fs/namei.c @@ -2639,19 +2639,9 @@ static char *page_getlink(struct dentry struct address_space *mapping = dentry->d_inode->i_mapping; page = read_mapping_page(mapping, 0, NULL); if (IS_ERR(page)) - goto sync_fail; - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto async_fail; + return (char*)page; *ppage = page; return kmap(page); - -async_fail: - page_cache_release(page); - return ERR_PTR(-EIO); - -sync_fail: - return (char*)page; } int page_readlink(struct dentry *dentry, char __user *buffer, int buflen) Index: linux-2.6/fs/ntfs/aops.h =================================================================== --- linux-2.6.orig/fs/ntfs/aops.h +++ linux-2.6/fs/ntfs/aops.h @@ -89,9 +89,8 @@ static inline struct page *ntfs_map_page struct page *page = read_mapping_page(mapping, index, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (PageUptodate(page) && !PageError(page)) + if (!PageError(page)) return page; ntfs_unmap_page(page); return ERR_PTR(-EIO); Index: linux-2.6/fs/ntfs/attrib.c =================================================================== --- linux-2.6.orig/fs/ntfs/attrib.c +++ linux-2.6/fs/ntfs/attrib.c @@ -2532,14 +2532,7 @@ int ntfs_attr_set(ntfs_inode *ni, const page = read_mapping_page(mapping, idx, NULL); if (IS_ERR(page)) { ntfs_error(vol->sb, "Failed to read first partial " - "page (sync error, index 0x%lx).", idx); - return PTR_ERR(page); - } - wait_on_page_locked(page); - if (unlikely(!PageUptodate(page))) { - ntfs_error(vol->sb, "Failed to read first partial page " - "(async error, index 0x%lx).", idx); - page_cache_release(page); + "page (error, index 0x%lx).", idx); return PTR_ERR(page); } /* @@ -2602,14 +2595,7 @@ int ntfs_attr_set(ntfs_inode *ni, const page = read_mapping_page(mapping, idx, NULL); if (IS_ERR(page)) { ntfs_error(vol->sb, "Failed to read last partial page " - "(sync error, index 0x%lx).", idx); - return PTR_ERR(page); - } - wait_on_page_locked(page); - if (unlikely(!PageUptodate(page))) { - ntfs_error(vol->sb, "Failed to read last partial page " - "(async error, index 0x%lx).", idx); - page_cache_release(page); + "(error, index 0x%lx).", idx); return PTR_ERR(page); } kaddr = kmap_atomic(page, KM_USER0); Index: linux-2.6/fs/ntfs/file.c =================================================================== --- linux-2.6.orig/fs/ntfs/file.c +++ linux-2.6/fs/ntfs/file.c @@ -236,8 +236,7 @@ do_non_resident_extend: err = PTR_ERR(page); goto init_err_out; } - wait_on_page_locked(page); - if (unlikely(!PageUptodate(page) || PageError(page))) { + if (unlikely(PageError(page))) { page_cache_release(page); err = -EIO; goto init_err_out; Index: linux-2.6/fs/ocfs2/symlink.c =================================================================== --- linux-2.6.orig/fs/ocfs2/symlink.c +++ linux-2.6/fs/ocfs2/symlink.c @@ -67,16 +67,9 @@ static char *ocfs2_page_getlink(struct d page = read_mapping_page(mapping, 0, NULL); if (IS_ERR(page)) goto sync_fail; - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto async_fail; *ppage = page; return kmap(page); -async_fail: - page_cache_release(page); - return ERR_PTR(-EIO); - sync_fail: return (char*)page; } Index: linux-2.6/fs/partitions/check.c =================================================================== --- linux-2.6.orig/fs/partitions/check.c +++ linux-2.6/fs/partitions/check.c @@ -561,9 +561,6 @@ unsigned char *read_dev_sector(struct bl page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)), NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto fail; if (PageError(page)) goto fail; p->v = page; Index: linux-2.6/fs/reiserfs/xattr.c =================================================================== --- linux-2.6.orig/fs/reiserfs/xattr.c +++ linux-2.6/fs/reiserfs/xattr.c @@ -454,11 +454,7 @@ static struct page *reiserfs_get_page(st mapping_set_gfp_mask(mapping, GFP_NOFS); page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; - if (PageError(page)) goto fail; } Index: linux-2.6/fs/sysv/dir.c =================================================================== --- linux-2.6.orig/fs/sysv/dir.c +++ linux-2.6/fs/sysv/dir.c @@ -54,17 +54,9 @@ static struct page * dir_get_page(struct { struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); - if (!IS_ERR(page)) { - wait_on_page_locked(page); + if (!IS_ERR(page)) kmap(page); - if (!PageUptodate(page)) - goto fail; - } return page; - -fail: - dir_put_page(page); - return ERR_PTR(-EIO); } static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir) Index: linux-2.6/mm/swapfile.c =================================================================== --- linux-2.6.orig/mm/swapfile.c +++ linux-2.6/mm/swapfile.c @@ -1531,9 +1531,6 @@ asmlinkage long sys_swapon(const char __ error = PTR_ERR(page); goto bad_swap; } - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto bad_swap; kmap(page); swap_header = page_address(page); Index: linux-2.6/drivers/mtd/devices/block2mtd.c =================================================================== --- linux-2.6.orig/drivers/mtd/devices/block2mtd.c +++ linux-2.6/drivers/mtd/devices/block2mtd.c @@ -88,9 +88,8 @@ static void cache_readahead(struct addre static struct page* page_readahead(struct address_space *mapping, int index) { - filler_t *filler = (filler_t*)mapping->a_ops->readpage; cache_readahead(mapping, index); - return read_cache_page(mapping, index, filler, NULL); + return read_mapping_page(mapping, index, NULL); } Index: linux-2.6/fs/ecryptfs/mmap.c =================================================================== --- linux-2.6.orig/fs/ecryptfs/mmap.c +++ linux-2.6/fs/ecryptfs/mmap.c @@ -46,7 +46,6 @@ struct kmem_cache *ecryptfs_lower_page_c */ static struct page *ecryptfs_get1page(struct file *file, int index) { - struct page *page; struct dentry *dentry; struct inode *inode; struct address_space *mapping; @@ -54,14 +53,7 @@ static struct page *ecryptfs_get1page(st dentry = file->f_path.dentry; inode = dentry->d_inode; mapping = inode->i_mapping; - page = read_cache_page(mapping, index, - (filler_t *)mapping->a_ops->readpage, - (void *)file); - if (IS_ERR(page)) - goto out; - wait_on_page_locked(page); -out: - return page; + return read_mapping_page(mapping, index, (void *)file); } static @@ -233,7 +225,6 @@ int ecryptfs_do_readpage(struct file *fi ecryptfs_printk(KERN_ERR, "Error reading from page cache\n"); goto out; } - wait_on_page_locked(lower_page); page_data = (char *)kmap(page); if (!page_data) { rc = -ENOMEM; Index: linux-2.6/fs/nfs/dir.c =================================================================== --- linux-2.6.orig/fs/nfs/dir.c +++ linux-2.6/fs/nfs/dir.c @@ -322,8 +322,6 @@ int find_dirent_page(nfs_readdir_descrip status = PTR_ERR(page); goto out; } - if (!PageUptodate(page)) - goto read_error; /* NOTE: Someone else may have changed the READDIRPLUS flag */ desc->page = page; @@ -337,9 +335,6 @@ int find_dirent_page(nfs_readdir_descrip out: dfprintk(DIRCACHE, "NFS: %s: returns %d\n", __FUNCTION__, status); return status; - read_error: - page_cache_release(page); - return -EIO; } /* Index: linux-2.6/fs/nfs/symlink.c =================================================================== --- linux-2.6.orig/fs/nfs/symlink.c +++ linux-2.6/fs/nfs/symlink.c @@ -61,15 +61,9 @@ static void *nfs_follow_link(struct dent err = page; goto read_failed; } - if (!PageUptodate(page)) { - err = ERR_PTR(-EIO); - goto getlink_read_error; - } nd_set_link(nd, kmap(page)); return page; -getlink_read_error: - page_cache_release(page); read_failed: nd_set_link(nd, err); return NULL; Index: linux-2.6/fs/ntfs/super.c =================================================================== --- linux-2.6.orig/fs/ntfs/super.c +++ linux-2.6/fs/ntfs/super.c @@ -2471,7 +2471,6 @@ static s64 get_nr_free_clusters(ntfs_vol s64 nr_free = vol->nr_clusters; u32 *kaddr; struct address_space *mapping = vol->lcnbmp_ino->i_mapping; - filler_t *readpage = (filler_t*)mapping->a_ops->readpage; struct page *page; pgoff_t index, max_index; @@ -2494,24 +2493,14 @@ static s64 get_nr_free_clusters(ntfs_vol * Read the page from page cache, getting it from backing store * if necessary, and increment the use count. */ - page = read_cache_page(mapping, index, (filler_t*)readpage, - NULL); + page = read_mapping_page(mapping, index, NULL); /* Ignore pages which errored synchronously. */ if (IS_ERR(page)) { - ntfs_debug("Sync read_cache_page() error. Skipping " + ntfs_debug("read_mapping_page() error. Skipping " "page (index 0x%lx).", index); nr_free -= PAGE_CACHE_SIZE * 8; continue; } - wait_on_page_locked(page); - /* Ignore pages which errored asynchronously. */ - if (!PageUptodate(page)) { - ntfs_debug("Async read_cache_page() error. Skipping " - "page (index 0x%lx).", index); - page_cache_release(page); - nr_free -= PAGE_CACHE_SIZE * 8; - continue; - } kaddr = (u32*)kmap_atomic(page, KM_USER0); /* * For each 4 bytes, subtract the number of set bits. If this @@ -2562,7 +2551,6 @@ static unsigned long __get_nr_free_mft_r { u32 *kaddr; struct address_space *mapping = vol->mftbmp_ino->i_mapping; - filler_t *readpage = (filler_t*)mapping->a_ops->readpage; struct page *page; pgoff_t index; @@ -2576,21 +2564,11 @@ static unsigned long __get_nr_free_mft_r * Read the page from page cache, getting it from backing store * if necessary, and increment the use count. */ - page = read_cache_page(mapping, index, (filler_t*)readpage, - NULL); + page = read_mapping_page(mapping, index, NULL); /* Ignore pages which errored synchronously. */ if (IS_ERR(page)) { - ntfs_debug("Sync read_cache_page() error. Skipping " - "page (index 0x%lx).", index); - nr_free -= PAGE_CACHE_SIZE * 8; - continue; - } - wait_on_page_locked(page); - /* Ignore pages which errored asynchronously. */ - if (!PageUptodate(page)) { - ntfs_debug("Async read_cache_page() error. Skipping " + ntfs_debug("read_mapping_page() error. Skipping " "page (index 0x%lx).", index); - page_cache_release(page); nr_free -= PAGE_CACHE_SIZE * 8; continue; } Index: linux-2.6/fs/ufs/dir.c =================================================================== --- linux-2.6.orig/fs/ufs/dir.c +++ linux-2.6/fs/ufs/dir.c @@ -180,13 +180,9 @@ fail: static struct page *ufs_get_page(struct inode *dir, unsigned long n) { struct address_space *mapping = dir->i_mapping; - struct page *page = read_cache_page(mapping, n, - (filler_t*)mapping->a_ops->readpage, NULL); + struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) ufs_check_page(page); if (PageError(page)) Index: linux-2.6/fs/ufs/util.c =================================================================== --- linux-2.6.orig/fs/ufs/util.c +++ linux-2.6/fs/ufs/util.c @@ -251,13 +251,11 @@ struct page *ufs_get_locked_page(struct page = find_lock_page(mapping, index); if (!page) { - page = read_cache_page(mapping, index, - (filler_t*)mapping->a_ops->readpage, - NULL); + page = read_mapping_page(mapping, index, NULL); if (IS_ERR(page)) { printk(KERN_ERR "ufs_change_blocknr: " - "read_cache_page error: ino %lu, index: %lu\n", + "read_mapping_page error: ino %lu, index: %lu\n", mapping->host->i_ino, index); goto out; } Index: linux-2.6/include/linux/pagemap.h =================================================================== --- linux-2.6.orig/include/linux/pagemap.h +++ linux-2.6/include/linux/pagemap.h @@ -95,12 +95,23 @@ static inline struct page *grab_cache_pa extern struct page * grab_cache_page_nowait(struct address_space *mapping, unsigned long index); +extern struct page * read_cache_page_async(struct address_space *mapping, + unsigned long index, filler_t *filler, + void *data); extern struct page * read_cache_page(struct address_space *mapping, unsigned long index, filler_t *filler, void *data); extern int read_cache_pages(struct address_space *mapping, struct list_head *pages, filler_t *filler, void *data); +static inline struct page *read_mapping_page_async( + struct address_space *mapping, + unsigned long index, void *data) +{ + filler_t *filler = (filler_t *)mapping->a_ops->readpage; + return read_cache_page_async(mapping, index, filler, data); +} + static inline struct page *read_mapping_page(struct address_space *mapping, unsigned long index, void *data) { ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 1/3] mm: make read_cache_page synchronous @ 2007-02-15 7:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> drivers/mtd/devices/block2mtd.c | 3 -- fs/afs/dir.c | 3 -- fs/afs/mntpt.c | 11 +++----- fs/cramfs/inode.c | 3 +- fs/ecryptfs/mmap.c | 11 -------- fs/ext2/dir.c | 3 -- fs/freevxfs/vxfs_subr.c | 3 -- fs/minix/dir.c | 1 fs/namei.c | 12 --------- fs/nfs/dir.c | 5 ---- fs/nfs/symlink.c | 6 ---- fs/ntfs/aops.h | 3 -- fs/ntfs/attrib.c | 18 +------------- fs/ntfs/file.c | 3 -- fs/ntfs/super.c | 30 +++--------------------- fs/ocfs2/symlink.c | 7 ----- fs/partitions/check.c | 3 -- fs/reiserfs/xattr.c | 4 --- fs/sysv/dir.c | 10 -------- fs/ufs/dir.c | 6 ---- fs/ufs/util.c | 6 +--- include/linux/pagemap.h | 11 ++++++++ mm/filemap.c | 49 +++++++++++++++++++++++++++++++--------- mm/swapfile.c | 3 -- 24 files changed, 70 insertions(+), 144 deletions(-) Index: linux-2.6/fs/afs/dir.c =================================================================== --- linux-2.6.orig/fs/afs/dir.c +++ linux-2.6/fs/afs/dir.c @@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str page = read_mapping_page(dir->i_mapping, index, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) afs_dir_check_page(dir, page); if (PageError(page)) Index: linux-2.6/fs/afs/mntpt.c =================================================================== --- linux-2.6.orig/fs/afs/mntpt.c +++ linux-2.6/fs/afs/mntpt.c @@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v } ret = -EIO; - wait_on_page_locked(page); - buf = kmap(page); - if (!PageUptodate(page)) - goto out_free; if (PageError(page)) goto out_free; + buf = kmap(page); + /* examine the symlink's contents */ size = vnode->status.size; _debug("symlink to %*.*s", size, (int) size, buf); @@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v ret = 0; - out_free: kunmap(page); + out_free: page_cache_release(page); out: _leave(" = %d", ret); @@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut } ret = -EIO; - wait_on_page_locked(page); - if (!PageUptodate(page) || PageError(page)) + if (PageError(page)) goto error; buf = kmap(page); Index: linux-2.6/fs/cramfs/inode.c =================================================================== --- linux-2.6.orig/fs/cramfs/inode.c +++ linux-2.6/fs/cramfs/inode.c @@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl struct page *page = NULL; if (blocknr + i < devsize) { - page = read_mapping_page(mapping, blocknr + i, NULL); + page = read_mapping_page_async(mapping, blocknr + i, + NULL); /* synchronous error? */ if (IS_ERR(page)) page = NULL; Index: linux-2.6/fs/ext2/dir.c =================================================================== --- linux-2.6.orig/fs/ext2/dir.c +++ linux-2.6/fs/ext2/dir.c @@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) ext2_check_page(page); if (PageError(page)) Index: linux-2.6/fs/freevxfs/vxfs_subr.c =================================================================== --- linux-2.6.orig/fs/freevxfs/vxfs_subr.c +++ linux-2.6/fs/freevxfs/vxfs_subr.c @@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp pp = read_mapping_page(mapping, n, NULL); if (!IS_ERR(pp)) { - wait_on_page_locked(pp); kmap(pp); - if (!PageUptodate(pp)) - goto fail; /** if (!PageChecked(pp)) **/ /** vxfs_check_page(pp); **/ if (PageError(pp)) Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1726,7 +1726,7 @@ int generic_file_readonly_mmap(struct fi EXPORT_SYMBOL(generic_file_mmap); EXPORT_SYMBOL(generic_file_readonly_mmap); -static inline struct page *__read_cache_page(struct address_space *mapping, +static struct page *__read_cache_page(struct address_space *mapping, unsigned long index, int (*filler)(void *,struct page*), void *data) @@ -1763,17 +1763,11 @@ repeat: return page; } -/** - * read_cache_page - read into page cache, fill it if needed - * @mapping: the page's address_space - * @index: the page index - * @filler: function to perform the read - * @data: destination for read data - * - * Read into the page cache. If a page already exists, - * and PageUptodate() is not set, try to fill the page. +/* + * Same as read_cache_page, but don't wait for page to become unlocked + * after submitting it to the filler. */ -struct page *read_cache_page(struct address_space *mapping, +struct page *read_cache_page_async(struct address_space *mapping, unsigned long index, int (*filler)(void *,struct page*), void *data) @@ -1805,6 +1799,39 @@ retry: page = ERR_PTR(err); } out: + mark_page_accessed(page); + return page; +} +EXPORT_SYMBOL(read_cache_page_async); + +/** + * read_cache_page - read into page cache, fill it if needed + * @mapping: the page's address_space + * @index: the page index + * @filler: function to perform the read + * @data: destination for read data + * + * Read into the page cache. If a page already exists, and PageUptodate() is + * not set, try to fill the page then wait for it to become unlocked. + * + * If the page does not get brought uptodate, return -EIO. + */ +struct page *read_cache_page(struct address_space *mapping, + unsigned long index, + int (*filler)(void *,struct page*), + void *data) +{ + struct page *page; + + page = read_cache_page_async(mapping, index, filler, data); + if (IS_ERR(page)) + goto out; + wait_on_page_locked(page); + if (!PageUptodate(page)) { + page_cache_release(page); + page = ERR_PTR(-EIO); + } + out: return page; } EXPORT_SYMBOL(read_cache_page); Index: linux-2.6/fs/minix/dir.c =================================================================== --- linux-2.6.orig/fs/minix/dir.c +++ linux-2.6/fs/minix/dir.c @@ -62,7 +62,6 @@ static struct page * dir_get_page(struct struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); if (!PageUptodate(page)) goto fail; Index: linux-2.6/fs/namei.c =================================================================== --- linux-2.6.orig/fs/namei.c +++ linux-2.6/fs/namei.c @@ -2639,19 +2639,9 @@ static char *page_getlink(struct dentry struct address_space *mapping = dentry->d_inode->i_mapping; page = read_mapping_page(mapping, 0, NULL); if (IS_ERR(page)) - goto sync_fail; - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto async_fail; + return (char*)page; *ppage = page; return kmap(page); - -async_fail: - page_cache_release(page); - return ERR_PTR(-EIO); - -sync_fail: - return (char*)page; } int page_readlink(struct dentry *dentry, char __user *buffer, int buflen) Index: linux-2.6/fs/ntfs/aops.h =================================================================== --- linux-2.6.orig/fs/ntfs/aops.h +++ linux-2.6/fs/ntfs/aops.h @@ -89,9 +89,8 @@ static inline struct page *ntfs_map_page struct page *page = read_mapping_page(mapping, index, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (PageUptodate(page) && !PageError(page)) + if (!PageError(page)) return page; ntfs_unmap_page(page); return ERR_PTR(-EIO); Index: linux-2.6/fs/ntfs/attrib.c =================================================================== --- linux-2.6.orig/fs/ntfs/attrib.c +++ linux-2.6/fs/ntfs/attrib.c @@ -2532,14 +2532,7 @@ int ntfs_attr_set(ntfs_inode *ni, const page = read_mapping_page(mapping, idx, NULL); if (IS_ERR(page)) { ntfs_error(vol->sb, "Failed to read first partial " - "page (sync error, index 0x%lx).", idx); - return PTR_ERR(page); - } - wait_on_page_locked(page); - if (unlikely(!PageUptodate(page))) { - ntfs_error(vol->sb, "Failed to read first partial page " - "(async error, index 0x%lx).", idx); - page_cache_release(page); + "page (error, index 0x%lx).", idx); return PTR_ERR(page); } /* @@ -2602,14 +2595,7 @@ int ntfs_attr_set(ntfs_inode *ni, const page = read_mapping_page(mapping, idx, NULL); if (IS_ERR(page)) { ntfs_error(vol->sb, "Failed to read last partial page " - "(sync error, index 0x%lx).", idx); - return PTR_ERR(page); - } - wait_on_page_locked(page); - if (unlikely(!PageUptodate(page))) { - ntfs_error(vol->sb, "Failed to read last partial page " - "(async error, index 0x%lx).", idx); - page_cache_release(page); + "(error, index 0x%lx).", idx); return PTR_ERR(page); } kaddr = kmap_atomic(page, KM_USER0); Index: linux-2.6/fs/ntfs/file.c =================================================================== --- linux-2.6.orig/fs/ntfs/file.c +++ linux-2.6/fs/ntfs/file.c @@ -236,8 +236,7 @@ do_non_resident_extend: err = PTR_ERR(page); goto init_err_out; } - wait_on_page_locked(page); - if (unlikely(!PageUptodate(page) || PageError(page))) { + if (unlikely(PageError(page))) { page_cache_release(page); err = -EIO; goto init_err_out; Index: linux-2.6/fs/ocfs2/symlink.c =================================================================== --- linux-2.6.orig/fs/ocfs2/symlink.c +++ linux-2.6/fs/ocfs2/symlink.c @@ -67,16 +67,9 @@ static char *ocfs2_page_getlink(struct d page = read_mapping_page(mapping, 0, NULL); if (IS_ERR(page)) goto sync_fail; - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto async_fail; *ppage = page; return kmap(page); -async_fail: - page_cache_release(page); - return ERR_PTR(-EIO); - sync_fail: return (char*)page; } Index: linux-2.6/fs/partitions/check.c =================================================================== --- linux-2.6.orig/fs/partitions/check.c +++ linux-2.6/fs/partitions/check.c @@ -561,9 +561,6 @@ unsigned char *read_dev_sector(struct bl page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)), NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto fail; if (PageError(page)) goto fail; p->v = page; Index: linux-2.6/fs/reiserfs/xattr.c =================================================================== --- linux-2.6.orig/fs/reiserfs/xattr.c +++ linux-2.6/fs/reiserfs/xattr.c @@ -454,11 +454,7 @@ static struct page *reiserfs_get_page(st mapping_set_gfp_mask(mapping, GFP_NOFS); page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; - if (PageError(page)) goto fail; } Index: linux-2.6/fs/sysv/dir.c =================================================================== --- linux-2.6.orig/fs/sysv/dir.c +++ linux-2.6/fs/sysv/dir.c @@ -54,17 +54,9 @@ static struct page * dir_get_page(struct { struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); - if (!IS_ERR(page)) { - wait_on_page_locked(page); + if (!IS_ERR(page)) kmap(page); - if (!PageUptodate(page)) - goto fail; - } return page; - -fail: - dir_put_page(page); - return ERR_PTR(-EIO); } static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir) Index: linux-2.6/mm/swapfile.c =================================================================== --- linux-2.6.orig/mm/swapfile.c +++ linux-2.6/mm/swapfile.c @@ -1531,9 +1531,6 @@ asmlinkage long sys_swapon(const char __ error = PTR_ERR(page); goto bad_swap; } - wait_on_page_locked(page); - if (!PageUptodate(page)) - goto bad_swap; kmap(page); swap_header = page_address(page); Index: linux-2.6/drivers/mtd/devices/block2mtd.c =================================================================== --- linux-2.6.orig/drivers/mtd/devices/block2mtd.c +++ linux-2.6/drivers/mtd/devices/block2mtd.c @@ -88,9 +88,8 @@ static void cache_readahead(struct addre static struct page* page_readahead(struct address_space *mapping, int index) { - filler_t *filler = (filler_t*)mapping->a_ops->readpage; cache_readahead(mapping, index); - return read_cache_page(mapping, index, filler, NULL); + return read_mapping_page(mapping, index, NULL); } Index: linux-2.6/fs/ecryptfs/mmap.c =================================================================== --- linux-2.6.orig/fs/ecryptfs/mmap.c +++ linux-2.6/fs/ecryptfs/mmap.c @@ -46,7 +46,6 @@ struct kmem_cache *ecryptfs_lower_page_c */ static struct page *ecryptfs_get1page(struct file *file, int index) { - struct page *page; struct dentry *dentry; struct inode *inode; struct address_space *mapping; @@ -54,14 +53,7 @@ static struct page *ecryptfs_get1page(st dentry = file->f_path.dentry; inode = dentry->d_inode; mapping = inode->i_mapping; - page = read_cache_page(mapping, index, - (filler_t *)mapping->a_ops->readpage, - (void *)file); - if (IS_ERR(page)) - goto out; - wait_on_page_locked(page); -out: - return page; + return read_mapping_page(mapping, index, (void *)file); } static @@ -233,7 +225,6 @@ int ecryptfs_do_readpage(struct file *fi ecryptfs_printk(KERN_ERR, "Error reading from page cache\n"); goto out; } - wait_on_page_locked(lower_page); page_data = (char *)kmap(page); if (!page_data) { rc = -ENOMEM; Index: linux-2.6/fs/nfs/dir.c =================================================================== --- linux-2.6.orig/fs/nfs/dir.c +++ linux-2.6/fs/nfs/dir.c @@ -322,8 +322,6 @@ int find_dirent_page(nfs_readdir_descrip status = PTR_ERR(page); goto out; } - if (!PageUptodate(page)) - goto read_error; /* NOTE: Someone else may have changed the READDIRPLUS flag */ desc->page = page; @@ -337,9 +335,6 @@ int find_dirent_page(nfs_readdir_descrip out: dfprintk(DIRCACHE, "NFS: %s: returns %d\n", __FUNCTION__, status); return status; - read_error: - page_cache_release(page); - return -EIO; } /* Index: linux-2.6/fs/nfs/symlink.c =================================================================== --- linux-2.6.orig/fs/nfs/symlink.c +++ linux-2.6/fs/nfs/symlink.c @@ -61,15 +61,9 @@ static void *nfs_follow_link(struct dent err = page; goto read_failed; } - if (!PageUptodate(page)) { - err = ERR_PTR(-EIO); - goto getlink_read_error; - } nd_set_link(nd, kmap(page)); return page; -getlink_read_error: - page_cache_release(page); read_failed: nd_set_link(nd, err); return NULL; Index: linux-2.6/fs/ntfs/super.c =================================================================== --- linux-2.6.orig/fs/ntfs/super.c +++ linux-2.6/fs/ntfs/super.c @@ -2471,7 +2471,6 @@ static s64 get_nr_free_clusters(ntfs_vol s64 nr_free = vol->nr_clusters; u32 *kaddr; struct address_space *mapping = vol->lcnbmp_ino->i_mapping; - filler_t *readpage = (filler_t*)mapping->a_ops->readpage; struct page *page; pgoff_t index, max_index; @@ -2494,24 +2493,14 @@ static s64 get_nr_free_clusters(ntfs_vol * Read the page from page cache, getting it from backing store * if necessary, and increment the use count. */ - page = read_cache_page(mapping, index, (filler_t*)readpage, - NULL); + page = read_mapping_page(mapping, index, NULL); /* Ignore pages which errored synchronously. */ if (IS_ERR(page)) { - ntfs_debug("Sync read_cache_page() error. Skipping " + ntfs_debug("read_mapping_page() error. Skipping " "page (index 0x%lx).", index); nr_free -= PAGE_CACHE_SIZE * 8; continue; } - wait_on_page_locked(page); - /* Ignore pages which errored asynchronously. */ - if (!PageUptodate(page)) { - ntfs_debug("Async read_cache_page() error. Skipping " - "page (index 0x%lx).", index); - page_cache_release(page); - nr_free -= PAGE_CACHE_SIZE * 8; - continue; - } kaddr = (u32*)kmap_atomic(page, KM_USER0); /* * For each 4 bytes, subtract the number of set bits. If this @@ -2562,7 +2551,6 @@ static unsigned long __get_nr_free_mft_r { u32 *kaddr; struct address_space *mapping = vol->mftbmp_ino->i_mapping; - filler_t *readpage = (filler_t*)mapping->a_ops->readpage; struct page *page; pgoff_t index; @@ -2576,21 +2564,11 @@ static unsigned long __get_nr_free_mft_r * Read the page from page cache, getting it from backing store * if necessary, and increment the use count. */ - page = read_cache_page(mapping, index, (filler_t*)readpage, - NULL); + page = read_mapping_page(mapping, index, NULL); /* Ignore pages which errored synchronously. */ if (IS_ERR(page)) { - ntfs_debug("Sync read_cache_page() error. Skipping " - "page (index 0x%lx).", index); - nr_free -= PAGE_CACHE_SIZE * 8; - continue; - } - wait_on_page_locked(page); - /* Ignore pages which errored asynchronously. */ - if (!PageUptodate(page)) { - ntfs_debug("Async read_cache_page() error. Skipping " + ntfs_debug("read_mapping_page() error. Skipping " "page (index 0x%lx).", index); - page_cache_release(page); nr_free -= PAGE_CACHE_SIZE * 8; continue; } Index: linux-2.6/fs/ufs/dir.c =================================================================== --- linux-2.6.orig/fs/ufs/dir.c +++ linux-2.6/fs/ufs/dir.c @@ -180,13 +180,9 @@ fail: static struct page *ufs_get_page(struct inode *dir, unsigned long n) { struct address_space *mapping = dir->i_mapping; - struct page *page = read_cache_page(mapping, n, - (filler_t*)mapping->a_ops->readpage, NULL); + struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) ufs_check_page(page); if (PageError(page)) Index: linux-2.6/fs/ufs/util.c =================================================================== --- linux-2.6.orig/fs/ufs/util.c +++ linux-2.6/fs/ufs/util.c @@ -251,13 +251,11 @@ struct page *ufs_get_locked_page(struct page = find_lock_page(mapping, index); if (!page) { - page = read_cache_page(mapping, index, - (filler_t*)mapping->a_ops->readpage, - NULL); + page = read_mapping_page(mapping, index, NULL); if (IS_ERR(page)) { printk(KERN_ERR "ufs_change_blocknr: " - "read_cache_page error: ino %lu, index: %lu\n", + "read_mapping_page error: ino %lu, index: %lu\n", mapping->host->i_ino, index); goto out; } Index: linux-2.6/include/linux/pagemap.h =================================================================== --- linux-2.6.orig/include/linux/pagemap.h +++ linux-2.6/include/linux/pagemap.h @@ -95,12 +95,23 @@ static inline struct page *grab_cache_pa extern struct page * grab_cache_page_nowait(struct address_space *mapping, unsigned long index); +extern struct page * read_cache_page_async(struct address_space *mapping, + unsigned long index, filler_t *filler, + void *data); extern struct page * read_cache_page(struct address_space *mapping, unsigned long index, filler_t *filler, void *data); extern int read_cache_pages(struct address_space *mapping, struct list_head *pages, filler_t *filler, void *data); +static inline struct page *read_mapping_page_async( + struct address_space *mapping, + unsigned long index, void *data) +{ + filler_t *filler = (filler_t *)mapping->a_ops->readpage; + return read_cache_page_async(mapping, index, filler, data); +} + static inline struct page *read_mapping_page(struct address_space *mapping, unsigned long index, void *data) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked 2007-02-15 7:31 ` Nick Piggin @ 2007-02-15 7:31 ` Nick Piggin -1 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> fs/buffer.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked @ 2007-02-15 7:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> fs/buffer.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 3/3] mm: fix PageUptodate memorder 2007-02-15 7:31 ` Nick Piggin @ 2007-02-15 7:31 ` Nick Piggin -1 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by ensuring SetPageUptodate is always called with the page locked (except in the case of a new page that cannot be visible to other CPUs), and requiring PageUptodate be checked only when the page is locked. To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order preceeding stores before the store to page flags, and a new PageUptodate_NoLock is introduced, which issues a smp_rmb after the page flags are loaded for the test. DMA memory barrier is not required, because the driver / IO subsystem must bring that into order before telling the core kernel that the read has completed. One thing I like about it is that it unifies the anonymous page handling with the rest of the page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing this let me get rid of the smp_wmb's in the page copying functions which, specially added for anonymous pages for a closely related issue, didn't quite match file backed page handling. Convert core code to use PageUptodate_NoLock. Filesystems are unaffected thanks to the change to read_cache_page. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> fs/splice.c | 4 +-- include/linux/highmem.h | 4 --- include/linux/page-flags.h | 57 +++++++++++++++++++++++++++++++++++++++++---- mm/filemap.c | 20 +++++++-------- mm/hugetlb.c | 2 + mm/memory.c | 9 +++---- mm/page_io.c | 2 - mm/swap_state.c | 2 - 8 files changed, 74 insertions(+), 26 deletions(-) Index: linux-2.6/include/linux/highmem.h =================================================================== --- linux-2.6.orig/include/linux/highmem.h +++ linux-2.6/include/linux/highmem.h @@ -57,8 +57,6 @@ static inline void clear_user_highpage(s void *addr = kmap_atomic(page, KM_USER0); clear_user_page(addr, vaddr, page); kunmap_atomic(addr, KM_USER0); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE @@ -108,8 +106,6 @@ static inline void copy_user_highpage(st copy_user_page(vto, vfrom, vaddr, to); kunmap_atomic(vfrom, KM_USER0); kunmap_atomic(vto, KM_USER1); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #endif Index: linux-2.6/include/linux/page-flags.h =================================================================== --- linux-2.6.orig/include/linux/page-flags.h +++ linux-2.6/include/linux/page-flags.h @@ -126,16 +126,65 @@ #define ClearPageReferenced(page) clear_bit(PG_referenced, &(page)->flags) #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags) -#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags) -#ifdef CONFIG_S390 +static inline int PageUptodate(struct page *page) +{ + WARN_ON(!PageLocked(page)); + return test_bit(PG_uptodate, &(page)->flags); +} + +/* + * PageUptodate to be used when not holding the page lock. + */ +static inline int PageUptodate_NoLock(struct page *page) +{ + int ret = test_bit(PG_uptodate, &(page)->flags); + + /* + * Must ensure that the data we read out of the page is loaded + * _after_ we've loaded page->flags and found that it is uptodate. + * See SetPageUptodate() for the other side of the story. + */ + if (ret) + smp_rmb(); + + return ret; +} + static inline void SetPageUptodate(struct page *page) { + WARN_ON(!PageLocked(page)); +#ifdef CONFIG_S390 if (!test_and_set_bit(PG_uptodate, &page->flags)) page_test_and_clear_dirty(page); -} #else -#define SetPageUptodate(page) set_bit(PG_uptodate, &(page)->flags) + /* + * Memory barrier must be issued before setting the PG_uptodate bit, + * so all previous writes that served to bring the page uptodate are + * visible before PageUptodate becomes true. + * + * S390 is guaranteed to have a barrier in the test_and_set operation + * (see Documentation/atomic_ops.txt). + * + * This memory barrier should not need to provide ordering against + * DMA writes into the page, because the IO completion should really + * be doing that. + */ + smp_wmb(); + set_bit(PG_uptodate, &(page)->flags); #endif +} + +static inline void SetNewPageUptodate(struct page *page) +{ + /* + * S390 sets page dirty bit on IO operations, which is why it is + * cleared in SetPageUptodate. This is not an issue for newly + * allocated pages that are brought uptodate by zeroing memory. + */ + smp_wmb(); + __set_bit(PG_uptodate, &(page)->flags); +} + #define ClearPageUptodate(page) clear_bit(PG_uptodate, &(page)->flags) #define PageDirty(page) test_bit(PG_dirty, &(page)->flags) Index: linux-2.6/mm/hugetlb.c =================================================================== --- linux-2.6.orig/mm/hugetlb.c +++ linux-2.6/mm/hugetlb.c @@ -443,6 +443,7 @@ static int hugetlb_cow(struct mm_struct spin_unlock(&mm->page_table_lock); copy_huge_page(new_page, old_page, address, vma); + SetNewPageUptodate(new_page); spin_lock(&mm->page_table_lock); ptep = huge_pte_offset(mm, address & HPAGE_MASK); @@ -506,6 +507,7 @@ retry: } else lock_page(page); } + SetNewPageUptodate(page); spin_lock(&mm->page_table_lock); size = i_size_read(mapping->host) >> HPAGE_SHIFT; Index: linux-2.6/mm/memory.c =================================================================== --- linux-2.6.orig/mm/memory.c +++ linux-2.6/mm/memory.c @@ -1463,10 +1463,8 @@ static inline void cow_user_page(struct memset(kaddr, 0, PAGE_SIZE); kunmap_atomic(kaddr, KM_USER0); flush_dcache_page(dst); - return; - - } - copy_user_highpage(dst, src, va, vma); + } else + copy_user_highpage(dst, src, va, vma); } /* @@ -1579,6 +1577,7 @@ gotten: goto oom; cow_user_page(new_page, old_page, address, vma); } + SetNewPageUptodate(new_page); /* * Re-check the pte - we dropped the lock @@ -2097,6 +2096,7 @@ static int do_anonymous_page(struct mm_s page = alloc_zeroed_user_highpage(vma, address); if (!page) goto oom; + SetNewPageUptodate(page); entry = mk_pte(page, vma->vm_page_prot); entry = maybe_mkwrite(pte_mkdirty(entry), vma); @@ -2203,6 +2203,7 @@ retry: copy_user_highpage(page, new_page, address, vma); page_cache_release(new_page); new_page = page; + SetNewPageUptodate(new_page); anon = 1; } else { Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -912,7 +912,7 @@ find_page: handle_ra_miss(mapping, &ra, index); goto no_cached_page; } - if (!PageUptodate(page)) + if (!PageUptodate_NoLock(page)) goto page_not_up_to_date; page_ok: @@ -980,7 +980,7 @@ readpage: goto readpage_error; } - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { lock_page(page); if (!PageUptodate(page)) { if (page->mapping == NULL) { @@ -1397,7 +1397,7 @@ retry_find: * Ok, found a page in the page cache, now we need to check * that it's up-to-date. */ - if (!PageUptodate(page)) + if (!PageUptodate_NoLock(page)) goto page_not_uptodate; success: @@ -1464,7 +1464,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1495,7 +1495,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1534,7 +1534,7 @@ retry_find: * Ok, found a page in the page cache, now we need to check * that it's up-to-date. */ - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { if (nonblock) { page_cache_release(page); return NULL; @@ -1585,7 +1585,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1615,7 +1615,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1780,7 +1780,7 @@ retry: if (IS_ERR(page)) goto out; mark_page_accessed(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto out; lock_page(page); @@ -1827,7 +1827,7 @@ struct page *read_cache_page(struct addr if (IS_ERR(page)) goto out; wait_on_page_locked(page); - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { page_cache_release(page); page = ERR_PTR(-EIO); } Index: linux-2.6/mm/page_io.c =================================================================== --- linux-2.6.orig/mm/page_io.c +++ linux-2.6/mm/page_io.c @@ -134,7 +134,7 @@ int swap_readpage(struct file *file, str int ret = 0; BUG_ON(!PageLocked(page)); - ClearPageUptodate(page); + BUG_ON(PageUptodate(page)); bio = get_swap_bio(GFP_KERNEL, page_private(page), page, end_swap_bio_read); if (bio == NULL) { Index: linux-2.6/mm/swap_state.c =================================================================== --- linux-2.6.orig/mm/swap_state.c +++ linux-2.6/mm/swap_state.c @@ -149,6 +149,7 @@ int add_to_swap(struct page * page, gfp_ int err; BUG_ON(!PageLocked(page)); + BUG_ON(!PageUptodate(page)); for (;;) { entry = get_swap_page(); @@ -171,7 +172,6 @@ int add_to_swap(struct page * page, gfp_ switch (err) { case 0: /* Success */ - SetPageUptodate(page); SetPageDirty(page); INC_CACHE_INFO(add_total); return 1; Index: linux-2.6/fs/splice.c =================================================================== --- linux-2.6.orig/fs/splice.c +++ linux-2.6/fs/splice.c @@ -107,7 +107,7 @@ static int page_cache_pipe_buf_pin(struc struct page *page = buf->page; int err; - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { lock_page(page); /* @@ -373,7 +373,7 @@ __generic_file_splice_read(struct file * /* * If the page isn't uptodate, we may need to start io on it */ - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { /* * If in nonblock mode then dont block on waiting * for an in-flight io page ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 3/3] mm: fix PageUptodate memorder @ 2007-02-15 7:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-15 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by ensuring SetPageUptodate is always called with the page locked (except in the case of a new page that cannot be visible to other CPUs), and requiring PageUptodate be checked only when the page is locked. To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order preceeding stores before the store to page flags, and a new PageUptodate_NoLock is introduced, which issues a smp_rmb after the page flags are loaded for the test. DMA memory barrier is not required, because the driver / IO subsystem must bring that into order before telling the core kernel that the read has completed. One thing I like about it is that it unifies the anonymous page handling with the rest of the page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing this let me get rid of the smp_wmb's in the page copying functions which, specially added for anonymous pages for a closely related issue, didn't quite match file backed page handling. Convert core code to use PageUptodate_NoLock. Filesystems are unaffected thanks to the change to read_cache_page. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> fs/splice.c | 4 +-- include/linux/highmem.h | 4 --- include/linux/page-flags.h | 57 +++++++++++++++++++++++++++++++++++++++++---- mm/filemap.c | 20 +++++++-------- mm/hugetlb.c | 2 + mm/memory.c | 9 +++---- mm/page_io.c | 2 - mm/swap_state.c | 2 - 8 files changed, 74 insertions(+), 26 deletions(-) Index: linux-2.6/include/linux/highmem.h =================================================================== --- linux-2.6.orig/include/linux/highmem.h +++ linux-2.6/include/linux/highmem.h @@ -57,8 +57,6 @@ static inline void clear_user_highpage(s void *addr = kmap_atomic(page, KM_USER0); clear_user_page(addr, vaddr, page); kunmap_atomic(addr, KM_USER0); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE @@ -108,8 +106,6 @@ static inline void copy_user_highpage(st copy_user_page(vto, vfrom, vaddr, to); kunmap_atomic(vfrom, KM_USER0); kunmap_atomic(vto, KM_USER1); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #endif Index: linux-2.6/include/linux/page-flags.h =================================================================== --- linux-2.6.orig/include/linux/page-flags.h +++ linux-2.6/include/linux/page-flags.h @@ -126,16 +126,65 @@ #define ClearPageReferenced(page) clear_bit(PG_referenced, &(page)->flags) #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags) -#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags) -#ifdef CONFIG_S390 +static inline int PageUptodate(struct page *page) +{ + WARN_ON(!PageLocked(page)); + return test_bit(PG_uptodate, &(page)->flags); +} + +/* + * PageUptodate to be used when not holding the page lock. + */ +static inline int PageUptodate_NoLock(struct page *page) +{ + int ret = test_bit(PG_uptodate, &(page)->flags); + + /* + * Must ensure that the data we read out of the page is loaded + * _after_ we've loaded page->flags and found that it is uptodate. + * See SetPageUptodate() for the other side of the story. + */ + if (ret) + smp_rmb(); + + return ret; +} + static inline void SetPageUptodate(struct page *page) { + WARN_ON(!PageLocked(page)); +#ifdef CONFIG_S390 if (!test_and_set_bit(PG_uptodate, &page->flags)) page_test_and_clear_dirty(page); -} #else -#define SetPageUptodate(page) set_bit(PG_uptodate, &(page)->flags) + /* + * Memory barrier must be issued before setting the PG_uptodate bit, + * so all previous writes that served to bring the page uptodate are + * visible before PageUptodate becomes true. + * + * S390 is guaranteed to have a barrier in the test_and_set operation + * (see Documentation/atomic_ops.txt). + * + * This memory barrier should not need to provide ordering against + * DMA writes into the page, because the IO completion should really + * be doing that. + */ + smp_wmb(); + set_bit(PG_uptodate, &(page)->flags); #endif +} + +static inline void SetNewPageUptodate(struct page *page) +{ + /* + * S390 sets page dirty bit on IO operations, which is why it is + * cleared in SetPageUptodate. This is not an issue for newly + * allocated pages that are brought uptodate by zeroing memory. + */ + smp_wmb(); + __set_bit(PG_uptodate, &(page)->flags); +} + #define ClearPageUptodate(page) clear_bit(PG_uptodate, &(page)->flags) #define PageDirty(page) test_bit(PG_dirty, &(page)->flags) Index: linux-2.6/mm/hugetlb.c =================================================================== --- linux-2.6.orig/mm/hugetlb.c +++ linux-2.6/mm/hugetlb.c @@ -443,6 +443,7 @@ static int hugetlb_cow(struct mm_struct spin_unlock(&mm->page_table_lock); copy_huge_page(new_page, old_page, address, vma); + SetNewPageUptodate(new_page); spin_lock(&mm->page_table_lock); ptep = huge_pte_offset(mm, address & HPAGE_MASK); @@ -506,6 +507,7 @@ retry: } else lock_page(page); } + SetNewPageUptodate(page); spin_lock(&mm->page_table_lock); size = i_size_read(mapping->host) >> HPAGE_SHIFT; Index: linux-2.6/mm/memory.c =================================================================== --- linux-2.6.orig/mm/memory.c +++ linux-2.6/mm/memory.c @@ -1463,10 +1463,8 @@ static inline void cow_user_page(struct memset(kaddr, 0, PAGE_SIZE); kunmap_atomic(kaddr, KM_USER0); flush_dcache_page(dst); - return; - - } - copy_user_highpage(dst, src, va, vma); + } else + copy_user_highpage(dst, src, va, vma); } /* @@ -1579,6 +1577,7 @@ gotten: goto oom; cow_user_page(new_page, old_page, address, vma); } + SetNewPageUptodate(new_page); /* * Re-check the pte - we dropped the lock @@ -2097,6 +2096,7 @@ static int do_anonymous_page(struct mm_s page = alloc_zeroed_user_highpage(vma, address); if (!page) goto oom; + SetNewPageUptodate(page); entry = mk_pte(page, vma->vm_page_prot); entry = maybe_mkwrite(pte_mkdirty(entry), vma); @@ -2203,6 +2203,7 @@ retry: copy_user_highpage(page, new_page, address, vma); page_cache_release(new_page); new_page = page; + SetNewPageUptodate(new_page); anon = 1; } else { Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -912,7 +912,7 @@ find_page: handle_ra_miss(mapping, &ra, index); goto no_cached_page; } - if (!PageUptodate(page)) + if (!PageUptodate_NoLock(page)) goto page_not_up_to_date; page_ok: @@ -980,7 +980,7 @@ readpage: goto readpage_error; } - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { lock_page(page); if (!PageUptodate(page)) { if (page->mapping == NULL) { @@ -1397,7 +1397,7 @@ retry_find: * Ok, found a page in the page cache, now we need to check * that it's up-to-date. */ - if (!PageUptodate(page)) + if (!PageUptodate_NoLock(page)) goto page_not_uptodate; success: @@ -1464,7 +1464,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1495,7 +1495,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1534,7 +1534,7 @@ retry_find: * Ok, found a page in the page cache, now we need to check * that it's up-to-date. */ - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { if (nonblock) { page_cache_release(page); return NULL; @@ -1585,7 +1585,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1615,7 +1615,7 @@ page_not_uptodate: error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto success; } else if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); @@ -1780,7 +1780,7 @@ retry: if (IS_ERR(page)) goto out; mark_page_accessed(page); - if (PageUptodate(page)) + if (PageUptodate_NoLock(page)) goto out; lock_page(page); @@ -1827,7 +1827,7 @@ struct page *read_cache_page(struct addr if (IS_ERR(page)) goto out; wait_on_page_locked(page); - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { page_cache_release(page); page = ERR_PTR(-EIO); } Index: linux-2.6/mm/page_io.c =================================================================== --- linux-2.6.orig/mm/page_io.c +++ linux-2.6/mm/page_io.c @@ -134,7 +134,7 @@ int swap_readpage(struct file *file, str int ret = 0; BUG_ON(!PageLocked(page)); - ClearPageUptodate(page); + BUG_ON(PageUptodate(page)); bio = get_swap_bio(GFP_KERNEL, page_private(page), page, end_swap_bio_read); if (bio == NULL) { Index: linux-2.6/mm/swap_state.c =================================================================== --- linux-2.6.orig/mm/swap_state.c +++ linux-2.6/mm/swap_state.c @@ -149,6 +149,7 @@ int add_to_swap(struct page * page, gfp_ int err; BUG_ON(!PageLocked(page)); + BUG_ON(!PageUptodate(page)); for (;;) { entry = get_swap_page(); @@ -171,7 +172,6 @@ int add_to_swap(struct page * page, gfp_ switch (err) { case 0: /* Success */ - SetPageUptodate(page); SetPageDirty(page); INC_CACHE_INFO(add_total); return 1; Index: linux-2.6/fs/splice.c =================================================================== --- linux-2.6.orig/fs/splice.c +++ linux-2.6/fs/splice.c @@ -107,7 +107,7 @@ static int page_cache_pipe_buf_pin(struc struct page *page = buf->page; int err; - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { lock_page(page); /* @@ -373,7 +373,7 @@ __generic_file_splice_read(struct file * /* * If the page isn't uptodate, we may need to start io on it */ - if (!PageUptodate(page)) { + if (!PageUptodate_NoLock(page)) { /* * If in nonblock mode then dont block on waiting * for an in-flight io page -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 3/3] mm: fix PageUptodate memorder 2007-02-15 7:31 ` Nick Piggin @ 2007-02-25 12:06 ` Andrew Morton -1 siblings, 0 replies; 22+ messages in thread From: Andrew Morton @ 2007-02-25 12:06 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-kernel, npiggin, linux-mm What an unpleasing patchset. I really really hope we really have a bug in there, and that all this crap isn't pointless uglification. We _do_ need a flush_dcaceh_page() in all cases which you're concerned about. Perhaps we should stick the appropriate barriers in there. > On Thu, 15 Feb 2007 08:31:31 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > +static inline void SetNewPageUptodate(struct page *page) > +{ > + /* > + * S390 sets page dirty bit on IO operations, which is why it is > + * cleared in SetPageUptodate. This is not an issue for newly > + * allocated pages that are brought uptodate by zeroing memory. > + */ > + smp_wmb(); > + __set_bit(PG_uptodate, &(page)->flags); > +} __SetPageUptodate() might be more conventional. Boy we'd better get the callers of this little handgrenade right. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 3/3] mm: fix PageUptodate memorder @ 2007-02-25 12:06 ` Andrew Morton 0 siblings, 0 replies; 22+ messages in thread From: Andrew Morton @ 2007-02-25 12:06 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-kernel, linux-mm What an unpleasing patchset. I really really hope we really have a bug in there, and that all this crap isn't pointless uglification. We _do_ need a flush_dcaceh_page() in all cases which you're concerned about. Perhaps we should stick the appropriate barriers in there. > On Thu, 15 Feb 2007 08:31:31 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > +static inline void SetNewPageUptodate(struct page *page) > +{ > + /* > + * S390 sets page dirty bit on IO operations, which is why it is > + * cleared in SetPageUptodate. This is not an issue for newly > + * allocated pages that are brought uptodate by zeroing memory. > + */ > + smp_wmb(); > + __set_bit(PG_uptodate, &(page)->flags); > +} __SetPageUptodate() might be more conventional. Boy we'd better get the callers of this little handgrenade right. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 3/3] mm: fix PageUptodate memorder 2007-02-25 12:06 ` Andrew Morton @ 2007-02-26 2:37 ` Nick Piggin -1 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-26 2:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm On Sun, Feb 25, 2007 at 04:06:57AM -0800, Andrew Morton wrote: > > What an unpleasing patchset. I really really hope we really have a bug in > there, and that all this crap isn't pointless uglification. It's the same bug for file pages as we had for anonymous pages, which the POWER guys actually hit. Do you disagree? I like this patch much better than the smp_wmb that we currently do just for anon pages. > We _do_ need a flush_dcaceh_page() in all cases which you're concerned > about. Perhaps we should stick the appropriate barriers in there. I think the memorder problem is conceptually a page data vs PG_uptodate one, because the read-side assumes that the data will be initialised before PG_uptodate is set. After the page is uptodate, you don't need subsequent barriers (that you would get via flush_dcache_page), because we've never really tried to impose any synchronisation on parallel read vs write. A memory barrier in flush_dcache_page would do the trick as well, I think, but it is not really any better. It is misleading because it is not the canonical fix. And we'd still need the smp_rmb in the PageUptodate read-side. > > On Thu, 15 Feb 2007 08:31:31 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > > +static inline void SetNewPageUptodate(struct page *page) > > +{ > > + /* > > + * S390 sets page dirty bit on IO operations, which is why it is > > + * cleared in SetPageUptodate. This is not an issue for newly > > + * allocated pages that are brought uptodate by zeroing memory. > > + */ > > + smp_wmb(); > > + __set_bit(PG_uptodate, &(page)->flags); > > +} > > __SetPageUptodate() might be more conventional. I guess so. I guess that the __ variants *can* only be used on new pages anyway. I wanted to make it clear that it wasn't a non-atomic version of exactly the same operation, but __SetPageUptodate probably would be fine. > Boy we'd better get the callers of this little handgrenade right. Newly initialised pages, before they become visible to anyone else. We could put a BUG_ON(page_count(page) != 1); in there? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 3/3] mm: fix PageUptodate memorder @ 2007-02-26 2:37 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-26 2:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm On Sun, Feb 25, 2007 at 04:06:57AM -0800, Andrew Morton wrote: > > What an unpleasing patchset. I really really hope we really have a bug in > there, and that all this crap isn't pointless uglification. It's the same bug for file pages as we had for anonymous pages, which the POWER guys actually hit. Do you disagree? I like this patch much better than the smp_wmb that we currently do just for anon pages. > We _do_ need a flush_dcaceh_page() in all cases which you're concerned > about. Perhaps we should stick the appropriate barriers in there. I think the memorder problem is conceptually a page data vs PG_uptodate one, because the read-side assumes that the data will be initialised before PG_uptodate is set. After the page is uptodate, you don't need subsequent barriers (that you would get via flush_dcache_page), because we've never really tried to impose any synchronisation on parallel read vs write. A memory barrier in flush_dcache_page would do the trick as well, I think, but it is not really any better. It is misleading because it is not the canonical fix. And we'd still need the smp_rmb in the PageUptodate read-side. > > On Thu, 15 Feb 2007 08:31:31 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > > +static inline void SetNewPageUptodate(struct page *page) > > +{ > > + /* > > + * S390 sets page dirty bit on IO operations, which is why it is > > + * cleared in SetPageUptodate. This is not an issue for newly > > + * allocated pages that are brought uptodate by zeroing memory. > > + */ > > + smp_wmb(); > > + __set_bit(PG_uptodate, &(page)->flags); > > +} > > __SetPageUptodate() might be more conventional. I guess so. I guess that the __ variants *can* only be used on new pages anyway. I wanted to make it clear that it wasn't a non-atomic version of exactly the same operation, but __SetPageUptodate probably would be fine. > Boy we'd better get the callers of this little handgrenade right. Newly initialised pages, before they become visible to anyone else. We could put a BUG_ON(page_count(page) != 1); in there? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3) @ 2007-02-10 2:31 Nick Piggin 2007-02-10 2:31 ` Nick Piggin 0 siblings, 1 reply; 22+ messages in thread From: Nick Piggin @ 2007-02-10 2:31 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Nick Piggin, Linux Memory Management, Martin Schwidefsky, Linux Kernel OK, I have got rid of SetPageUptodate_nowarn, and removed the atomic op from SetNewPageUptodate. Made PageUptodate_NoLock only issue the memory barrier is the page was uptodate (hopefully the compiler can thread the branch into the caller's branch). SetNewPageUptodate does not do the S390 page_test_and_clear_dirty, so I'd like to make sure that's OK. Rearranged the patch series so we don't have the first patch introducing a lot of WARN_ONs that are solved in the next two patches (rather, solve those issues first). Thanks, Nick -- SuSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked 2007-02-10 2:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3) Nick Piggin @ 2007-02-10 2:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-10 2:31 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Nick Piggin, Linux Memory Management, Martin Schwidefsky, Linux Kernel __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> fs/buffer.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked @ 2007-02-10 2:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-10 2:31 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Nick Piggin, Linux Memory Management, Martin Schwidefsky, Linux Kernel __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> fs/buffer.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) @ 2007-02-08 13:26 Nick Piggin 2007-02-08 13:27 ` Nick Piggin 0 siblings, 1 reply; 22+ messages in thread From: Nick Piggin @ 2007-02-08 13:26 UTC (permalink / raw) To: Andrew Morton Cc: Linux Memory Management, Hugh Dickins, Nick Piggin, Benjamin Herrenschmidt, Linux Kernel Still no independent confirmation as to whether this is a problem or not. Updated some comments, added diffstats to patches, don't use __SetPageUptodate as an internal page-flags.h private function. I would like to eventually get an ack from Hugh regarding the anon memory and especially swap side of the equation, and a glance from whoever put the smp_wmb()s into the copy functions (Was it Ben H or Anton maybe?) Thanks, Nick -- SuSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked 2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin @ 2007-02-08 13:27 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-08 13:27 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Linux Memory Management, Benjamin Herrenschmidt, Nick Piggin, Linux Kernel __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However with the previous patch, this is now a problem: so don't bother setting the page uptodate in this case (it is weird that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> fs/buffer.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked @ 2007-02-08 13:27 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-08 13:27 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Linux Memory Management, Benjamin Herrenschmidt, Nick Piggin, Linux Kernel __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However with the previous patch, this is now a problem: so don't bother setting the page uptodate in this case (it is weird that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> fs/buffer.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem @ 2007-02-06 8:02 Nick Piggin 2007-02-06 8:02 ` Nick Piggin 0 siblings, 1 reply; 22+ messages in thread From: Nick Piggin @ 2007-02-06 8:02 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Andrew Morton, Linux Memory Management, Linux Kernel, Nick Piggin, Linux Filesystems Still no independent confirmation as to whether this is a problem or not. I think it is, so I'll propose this patchset to fix it. Patch 1/3 has a reasonable description of the problem. Thanks, Nick -- SuSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked 2007-02-06 8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin @ 2007-02-06 8:02 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-06 8:02 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Andrew Morton, Nick Piggin, Linux Kernel, Linux Memory Management, Linux Filesystems __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However with the previous patch, this is now a problem: so don't bother setting the page uptodate in this case (it is weird that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc */ BUG_ON(PageWriteback(page)); set_page_writeback(page); + unlock_page(page); do { struct buffer_head *next = bh->b_this_page; @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc } bh = next; } while (bh != head); - unlock_page(page); err = 0; done: @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 2/3] fs: buffer don't PageUptodate without page locked @ 2007-02-06 8:02 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-06 8:02 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Andrew Morton, Nick Piggin, Linux Kernel, Linux Memory Management, Linux Filesystems __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However with the previous patch, this is now a problem: so don't bother setting the page uptodate in this case (it is weird that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc */ BUG_ON(PageWriteback(page)); set_page_writeback(page); + unlock_page(page); do { struct buffer_head *next = bh->b_this_page; @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc } bh = next; } while (bh != head); - unlock_page(page); err = 0; done: @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 2/3] fs: buffer don't PageUptodate without page locked 2007-02-06 8:02 ` Nick Piggin @ 2007-02-06 8:21 ` Andrew Morton -1 siblings, 0 replies; 22+ messages in thread From: Andrew Morton @ 2007-02-06 8:21 UTC (permalink / raw) To: Nick Piggin Cc: Linus Torvalds, Hugh Dickins, Linux Kernel, Linux Memory Management, Linux Filesystems On Tue, 6 Feb 2007 09:02:23 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > __block_write_full_page is calling SetPageUptodate without the page locked. > This is unusual, but not incorrect, as PG_writeback is still set. > > However with the previous patch, this is now a problem: so don't bother > setting the page uptodate in this case (it is weird that the write path > does such a thing anyway). Instead just leave it to the read side to bring > the page uptodate when it notices that all buffers are uptodate. > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > Index: linux-2.6/fs/buffer.c > =================================================================== > --- linux-2.6.orig/fs/buffer.c > +++ linux-2.6/fs/buffer.c > @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc > */ > BUG_ON(PageWriteback(page)); > set_page_writeback(page); > + unlock_page(page); > > do { > struct buffer_head *next = bh->b_this_page; > @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc > } > bh = next; > } while (bh != head); > - unlock_page(page); > > err = 0; > done: Why this change? Without looking at it too hard, it seems that if submit_bh() completes synchronously, this thread can end up playing with the buffers on a non-locked, non-PageWriteback page. Someone else could whip the buffers away and oops? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 2/3] fs: buffer don't PageUptodate without page locked @ 2007-02-06 8:21 ` Andrew Morton 0 siblings, 0 replies; 22+ messages in thread From: Andrew Morton @ 2007-02-06 8:21 UTC (permalink / raw) To: Nick Piggin Cc: Linus Torvalds, Hugh Dickins, Linux Kernel, Linux Memory Management, Linux Filesystems On Tue, 6 Feb 2007 09:02:23 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > __block_write_full_page is calling SetPageUptodate without the page locked. > This is unusual, but not incorrect, as PG_writeback is still set. > > However with the previous patch, this is now a problem: so don't bother > setting the page uptodate in this case (it is weird that the write path > does such a thing anyway). Instead just leave it to the read side to bring > the page uptodate when it notices that all buffers are uptodate. > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > Index: linux-2.6/fs/buffer.c > =================================================================== > --- linux-2.6.orig/fs/buffer.c > +++ linux-2.6/fs/buffer.c > @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc > */ > BUG_ON(PageWriteback(page)); > set_page_writeback(page); > + unlock_page(page); > > do { > struct buffer_head *next = bh->b_this_page; > @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc > } > bh = next; > } while (bh != head); > - unlock_page(page); > > err = 0; > done: Why this change? Without looking at it too hard, it seems that if submit_bh() completes synchronously, this thread can end up playing with the buffers on a non-locked, non-PageWriteback page. Someone else could whip the buffers away and oops? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 2/3] fs: buffer don't PageUptodate without page locked 2007-02-06 8:21 ` Andrew Morton @ 2007-02-06 8:31 ` Nick Piggin -1 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-06 8:31 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Hugh Dickins, Linux Kernel, Linux Memory Management, Linux Filesystems On Tue, Feb 06, 2007 at 12:21:40AM -0800, Andrew Morton wrote: > On Tue, 6 Feb 2007 09:02:23 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > > > __block_write_full_page is calling SetPageUptodate without the page locked. > > This is unusual, but not incorrect, as PG_writeback is still set. > > > > However with the previous patch, this is now a problem: so don't bother > > setting the page uptodate in this case (it is weird that the write path > > does such a thing anyway). Instead just leave it to the read side to bring > > the page uptodate when it notices that all buffers are uptodate. > > > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > > > Index: linux-2.6/fs/buffer.c > > =================================================================== > > --- linux-2.6.orig/fs/buffer.c > > +++ linux-2.6/fs/buffer.c > > @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc > > */ > > BUG_ON(PageWriteback(page)); > > set_page_writeback(page); > > + unlock_page(page); > > > > do { > > struct buffer_head *next = bh->b_this_page; > > @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc > > } > > bh = next; > > } while (bh != head); > > - unlock_page(page); > > > > err = 0; > > done: > > Why this change? Without looking at it too hard, it seems that if > submit_bh() completes synchronously, this thread can end up playing with > the buffers on a non-locked, non-PageWriteback page. Someone else could > whip the buffers away and oops? Hmm, it definitely shouldn't be there, it leaked in from another patch to bring partiy with the error handling... Here is an updated patch. -- __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However with the previous patch, this is now a problem: so don't bother setting the page uptodate in this case (it is weird that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 2/3] fs: buffer don't PageUptodate without page locked @ 2007-02-06 8:31 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2007-02-06 8:31 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Hugh Dickins, Linux Kernel, Linux Memory Management, Linux Filesystems On Tue, Feb 06, 2007 at 12:21:40AM -0800, Andrew Morton wrote: > On Tue, 6 Feb 2007 09:02:23 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote: > > > __block_write_full_page is calling SetPageUptodate without the page locked. > > This is unusual, but not incorrect, as PG_writeback is still set. > > > > However with the previous patch, this is now a problem: so don't bother > > setting the page uptodate in this case (it is weird that the write path > > does such a thing anyway). Instead just leave it to the read side to bring > > the page uptodate when it notices that all buffers are uptodate. > > > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > > > Index: linux-2.6/fs/buffer.c > > =================================================================== > > --- linux-2.6.orig/fs/buffer.c > > +++ linux-2.6/fs/buffer.c > > @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc > > */ > > BUG_ON(PageWriteback(page)); > > set_page_writeback(page); > > + unlock_page(page); > > > > do { > > struct buffer_head *next = bh->b_this_page; > > @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc > > } > > bh = next; > > } while (bh != head); > > - unlock_page(page); > > > > err = 0; > > done: > > Why this change? Without looking at it too hard, it seems that if > submit_bh() completes synchronously, this thread can end up playing with > the buffers on a non-locked, non-PageWriteback page. Someone else could > whip the buffers away and oops? Hmm, it definitely shouldn't be there, it leaked in from another patch to bring partiy with the error handling... Here is an updated patch. -- __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However with the previous patch, this is now a problem: so don't bother setting the page uptodate in this case (it is weird that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-02-26 2:37 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-15 7:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 4) Nick Piggin 2007-02-15 7:31 ` Nick Piggin 2007-02-15 7:31 ` [patch 1/3] mm: make read_cache_page synchronous Nick Piggin 2007-02-15 7:31 ` Nick Piggin 2007-02-15 7:31 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin 2007-02-15 7:31 ` Nick Piggin 2007-02-15 7:31 ` [patch 3/3] mm: fix PageUptodate memorder Nick Piggin 2007-02-15 7:31 ` Nick Piggin 2007-02-25 12:06 ` Andrew Morton 2007-02-25 12:06 ` Andrew Morton 2007-02-26 2:37 ` Nick Piggin 2007-02-26 2:37 ` Nick Piggin -- strict thread matches above, loose matches on Subject: below -- 2007-02-10 2:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3) Nick Piggin 2007-02-10 2:31 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin 2007-02-10 2:31 ` Nick Piggin 2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin 2007-02-08 13:27 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin 2007-02-08 13:27 ` Nick Piggin 2007-02-06 8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin 2007-02-06 8:02 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin 2007-02-06 8:02 ` Nick Piggin 2007-02-06 8:21 ` Andrew Morton 2007-02-06 8:21 ` Andrew Morton 2007-02-06 8:31 ` Nick Piggin 2007-02-06 8:31 ` Nick Piggin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.