* [PATCH] swap 01/13 no SWAP_ERROR
@ 2003-03-25 22:10 Hugh Dickins
2003-03-25 22:12 ` [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap Hugh Dickins
` (11 more replies)
0 siblings, 12 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
First of thirteen miscellaneous independent patches in the swap/rmap
area, several touching the same files therefore numbered in sequence.
Many but not all fixes extracted from the reverted anobjrmap patch.
Based upon 2.5.65-mm4, the aggregate has diffstat:
include/linux/rmap-locking.h | 2
include/linux/swap.h | 18 ++----
mm/fremap.c | 12 ++--
mm/mmap.c | 8 --
mm/rmap.c | 68 ++++++++++++++-----------
mm/shmem.c | 26 ++++++---
mm/swap_state.c | 4 -
mm/swapfile.c | 115 ++++++++++++++++++++++++++-----------------
mm/vmscan.c | 14 ++---
9 files changed, 151 insertions(+), 116 deletions(-)
swap 01/13 no SWAP_ERROR
Delete unused SWAP_ERROR and non-existent page_over_rsslimit().
--- 2.5.65-mm4/include/linux/swap.h Wed Mar 5 07:26:34 2003
+++ swap01/include/linux/swap.h Tue Mar 25 20:42:56 2003
@@ -174,13 +174,11 @@
struct pte_chain *));
void FASTCALL(page_remove_rmap(struct page *, pte_t *));
int FASTCALL(try_to_unmap(struct page *));
-int FASTCALL(page_over_rsslimit(struct page *));
/* return values of try_to_unmap */
#define SWAP_SUCCESS 0
#define SWAP_AGAIN 1
#define SWAP_FAIL 2
-#define SWAP_ERROR 3
/* linux/mm/shmem.c */
extern int shmem_unuse(swp_entry_t entry, struct page *page);
--- 2.5.65-mm4/mm/rmap.c Sun Mar 23 10:30:15 2003
+++ swap01/mm/rmap.c Tue Mar 25 20:42:56 2003
@@ -677,7 +677,6 @@
* SWAP_SUCCESS - we succeeded in removing all mappings
* SWAP_AGAIN - we missed a trylock, try again later
* SWAP_FAIL - the page is unswappable
- * SWAP_ERROR - an error occurred
*/
int try_to_unmap(struct page * page)
{
@@ -754,9 +753,6 @@
case SWAP_FAIL:
ret = SWAP_FAIL;
goto out;
- case SWAP_ERROR:
- ret = SWAP_ERROR;
- goto out;
}
}
}
--- 2.5.65-mm4/mm/vmscan.c Tue Feb 18 02:14:32 2003
+++ swap01/mm/vmscan.c Tue Mar 25 20:42:56 2003
@@ -284,7 +284,6 @@
*/
if (page_mapped(page) && mapping) {
switch (try_to_unmap(page)) {
- case SWAP_ERROR:
case SWAP_FAIL:
pte_chain_unlock(page);
goto activate_locked;
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
@ 2003-03-25 22:12 ` Hugh Dickins
2003-03-25 22:13 ` [PATCH] swap 03/13 add_to_swap_cache Hugh Dickins
` (10 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Raised #endif CONFIG_SWAP in shrink_list, it was excluding
try_to_unmap of file pages. Suspect !CONFIG_MMU relied on
that to suppress try_to_unmap, added SWAP_FAIL stub for it.
--- swap01/include/linux/swap.h Tue Mar 25 20:42:56 2003
+++ swap02/include/linux/swap.h Tue Mar 25 20:43:07 2003
@@ -175,19 +175,18 @@
void FASTCALL(page_remove_rmap(struct page *, pte_t *));
int FASTCALL(try_to_unmap(struct page *));
-/* return values of try_to_unmap */
-#define SWAP_SUCCESS 0
-#define SWAP_AGAIN 1
-#define SWAP_FAIL 2
-
/* linux/mm/shmem.c */
extern int shmem_unuse(swp_entry_t entry, struct page *page);
-
#else
-#define page_referenced(page) \
- TestClearPageReferenced(page)
+#define page_referenced(page) TestClearPageReferenced(page)
+#define try_to_unmap(page) SWAP_FAIL
#endif /* CONFIG_MMU */
+/* return values of try_to_unmap */
+#define SWAP_SUCCESS 0
+#define SWAP_AGAIN 1
+#define SWAP_FAIL 2
+
#ifdef CONFIG_SWAP
/* linux/mm/page_io.c */
extern int swap_readpage(struct file *, struct page *);
--- swap01/mm/vmscan.c Tue Mar 25 20:42:56 2003
+++ swap02/mm/vmscan.c Tue Mar 25 20:43:07 2003
@@ -277,6 +277,7 @@
pte_chain_lock(page);
mapping = page->mapping;
}
+#endif /* CONFIG_SWAP */
/*
* The page is mapped into the page tables of one or more
@@ -294,7 +295,6 @@
; /* try to free the page below */
}
}
-#endif /* CONFIG_SWAP */
pte_chain_unlock(page);
/*
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 03/13 add_to_swap_cache
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
2003-03-25 22:12 ` [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap Hugh Dickins
@ 2003-03-25 22:13 ` Hugh Dickins
2003-03-25 22:13 ` [PATCH] swap 04/13 page_convert_anon -ENOMEM Hugh Dickins
` (9 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Make add_to_swap_cache static, it's only used by read_swap_cache_async;
and since that has just done a GFP_HIGHUSER allocation, surely it's
better for add_to_swap_cache to use GFP_KERNEL than GFP_ATOMIC.
--- swap02/include/linux/swap.h Tue Mar 25 20:43:07 2003
+++ swap03/include/linux/swap.h Tue Mar 25 20:43:18 2003
@@ -197,7 +197,6 @@
extern struct address_space swapper_space;
#define total_swapcache_pages swapper_space.nrpages
extern void show_swap_cache_info(void);
-extern int add_to_swap_cache(struct page *, swp_entry_t);
extern int add_to_swap(struct page *);
extern void __delete_from_swap_cache(struct page *);
extern void delete_from_swap_cache(struct page *);
--- swap02/mm/swap_state.c Wed Mar 5 07:26:34 2003
+++ swap03/mm/swap_state.c Tue Mar 25 20:43:18 2003
@@ -68,7 +68,7 @@
swap_cache_info.noent_race, swap_cache_info.exist_race);
}
-int add_to_swap_cache(struct page *page, swp_entry_t entry)
+static int add_to_swap_cache(struct page *page, swp_entry_t entry)
{
int error;
@@ -78,7 +78,7 @@
INC_CACHE_INFO(noent_race);
return -ENOENT;
}
- error = add_to_page_cache(page, &swapper_space, entry.val, GFP_ATOMIC);
+ error = add_to_page_cache(page, &swapper_space, entry.val, GFP_KERNEL);
/*
* Anon pages are already on the LRU, we don't run lru_cache_add here.
*/
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 04/13 page_convert_anon -ENOMEM
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
2003-03-25 22:12 ` [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap Hugh Dickins
2003-03-25 22:13 ` [PATCH] swap 03/13 add_to_swap_cache Hugh Dickins
@ 2003-03-25 22:13 ` Hugh Dickins
2003-03-25 22:14 ` [PATCH] swap 05/13 page_convert_anon unlocking Hugh Dickins
` (8 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
objrmap page_convert_anon tries to allocate pte_chains, so must allow
for failure (even GFP_KERNEL can fail when OOM-killed); and move its
prototype to join the rest of the rmap.c prototypes in swap.h.
--- swap03/include/linux/rmap-locking.h Sun Mar 23 10:30:14 2003
+++ swap04/include/linux/rmap-locking.h Tue Mar 25 20:43:29 2003
@@ -45,5 +45,3 @@
if (pte_chain)
__pte_chain_free(pte_chain);
}
-
-void page_convert_anon(struct page *page);
--- swap03/include/linux/swap.h Tue Mar 25 20:43:18 2003
+++ swap04/include/linux/swap.h Tue Mar 25 20:43:29 2003
@@ -175,6 +175,8 @@
void FASTCALL(page_remove_rmap(struct page *, pte_t *));
int FASTCALL(try_to_unmap(struct page *));
+int page_convert_anon(struct page *);
+
/* linux/mm/shmem.c */
extern int shmem_unuse(swp_entry_t entry, struct page *page);
#else
--- swap03/mm/fremap.c Sun Mar 23 10:30:15 2003
+++ swap04/mm/fremap.c Tue Mar 25 20:43:29 2003
@@ -69,8 +69,10 @@
pgidx = (addr - vma->vm_start) >> PAGE_SHIFT;
pgidx += vma->vm_pgoff;
pgidx >>= PAGE_CACHE_SHIFT - PAGE_SHIFT;
- if (!PageAnon(page) && (page->index != pgidx))
- page_convert_anon(page);
+ if (!PageAnon(page) && (page->index != pgidx)) {
+ if (page_convert_anon(page) < 0)
+ goto err_free;
+ }
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);
@@ -95,12 +97,10 @@
if (flush)
flush_tlb_page(vma, addr);
- spin_unlock(&mm->page_table_lock);
- pte_chain_free(pte_chain);
- return 0;
-
+ err = 0;
err_unlock:
spin_unlock(&mm->page_table_lock);
+err_free:
pte_chain_free(pte_chain);
err:
return err;
--- swap03/mm/rmap.c Tue Mar 25 20:42:56 2003
+++ swap04/mm/rmap.c Tue Mar 25 20:43:29 2003
@@ -774,7 +774,7 @@
* of pte_chain structures to ensure that it can complete without releasing
* the lock.
*/
-void page_convert_anon(struct page *page)
+int page_convert_anon(struct page *page)
{
struct address_space *mapping = page->mapping;
struct vm_area_struct *vma;
@@ -783,6 +783,7 @@
pte_addr_t pte_paddr;
int mapcount;
int index = 0;
+ int err = 0;
if (PageAnon(page))
goto out;
@@ -795,6 +796,15 @@
if (mapcount > 1) {
for (; index < mapcount; index += NRPTE) {
ptec = pte_chain_alloc(GFP_KERNEL);
+ if (!ptec) {
+ while (pte_chain) {
+ ptec = pte_chain->next;
+ pte_chain_free(pte_chain);
+ pte_chain = ptec;
+ }
+ err = -ENOMEM;
+ goto out;
+ }
ptec->next = pte_chain;
pte_chain = ptec;
}
@@ -867,7 +877,7 @@
pte_chain_unlock(page);
up(&mapping->i_shared_sem);
out:
- return;
+ return err;
}
/**
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 05/13 page_convert_anon unlocking
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (2 preceding siblings ...)
2003-03-25 22:13 ` [PATCH] swap 04/13 page_convert_anon -ENOMEM Hugh Dickins
@ 2003-03-25 22:14 ` Hugh Dickins
2003-03-25 22:15 ` [PATCH] swap 06/13 wrap below vm_start Hugh Dickins
` (7 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
objrmap page_convert_anon was missing an unlock and an upsem.
--- swap04/mm/rmap.c Tue Mar 25 20:43:29 2003
+++ swap05/mm/rmap.c Tue Mar 25 20:43:40 2003
@@ -818,6 +818,7 @@
*/
if (mapcount < page->pte.mapcount) {
pte_chain_unlock(page);
+ up(&mapping->i_shared_sem);
goto retry;
} else if ((mapcount > page->pte.mapcount) && (mapcount > 1)) {
mapcount = page->pte.mapcount;
@@ -833,7 +834,7 @@
SetPageAnon(page);
if (mapcount == 0)
- goto out;
+ goto out_unlock;
else if (mapcount == 1) {
SetPageDirect(page);
page->pte.direct = 0;
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 06/13 wrap below vm_start
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (3 preceding siblings ...)
2003-03-25 22:14 ` [PATCH] swap 05/13 page_convert_anon unlocking Hugh Dickins
@ 2003-03-25 22:15 ` Hugh Dickins
2003-03-25 22:17 ` [PATCH] swap 07/13 objrmap page_table_lock Hugh Dickins
` (6 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
objrmap must check lest address wrapped below vma->vm_start.
--- swap05/mm/rmap.c Tue Mar 25 20:43:40 2003
+++ swap06/mm/rmap.c Tue Mar 25 20:43:51 2003
@@ -100,12 +100,8 @@
unsigned long address;
loffset = (page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT));
- if (loffset < vma->vm_pgoff)
- goto out;
-
address = vma->vm_start + ((loffset - vma->vm_pgoff) << PAGE_SHIFT);
-
- if (address >= vma->vm_end)
+ if (address < vma->vm_start || address >= vma->vm_end)
goto out;
pgd = pgd_offset(mm, address);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 07/13 objrmap page_table_lock
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (4 preceding siblings ...)
2003-03-25 22:15 ` [PATCH] swap 06/13 wrap below vm_start Hugh Dickins
@ 2003-03-25 22:17 ` Hugh Dickins
2003-03-25 22:18 ` [PATCH] swap 08/13 rmap comments Hugh Dickins
` (5 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
A recent update removed the page_table_lock from objrmap. That was
simply wrong: it's necessary to guard mm->rss (in try_to_unmap), and
it's necessary to ensure the page tables don't get freed beneath us
(in which case any modification of the pte, by page_referenced or
by try_to_unmap, might be corrupting a freed and reused page).
Then restore the original SWAP_AGAIN semantics in try_to_unmap:
simplest for inner levels to return SWAP_AGAIN or SWAP_FAILED,
outer level to decide SWAP_SUCCESS if all pages were unmapped.
Stop searching when all have been unmapped.
page_convert_anon left for the moment with FIXMEs rather than
page_table_lock. I believe it can be done without page_table_lock
(which would be problematic there), but have noticed several other
bugs there, more patches will follow at a later date (though I'd
much rather anobjrmap, which handles all this so much more cleanly).
--- swap06/mm/rmap.c Tue Mar 25 20:43:51 2003
+++ swap07/mm/rmap.c Tue Mar 25 20:44:02 2003
@@ -143,9 +143,13 @@
static int
page_referenced_obj_one(struct vm_area_struct *vma, struct page *page)
{
+ struct mm_struct *mm = vma->vm_mm;
pte_t *pte;
int referenced = 0;
+ if (!spin_trylock(&mm->page_table_lock))
+ return 1;
+
pte = find_pte(vma, page, NULL);
if (pte) {
if (ptep_test_and_clear_young(pte))
@@ -153,6 +157,7 @@
pte_unmap(pte);
}
+ spin_unlock(&mm->page_table_lock);
return referenced;
}
@@ -489,7 +494,10 @@
unsigned long address;
pte_t *pte;
pte_t pteval;
- int ret = SWAP_SUCCESS;
+ int ret = SWAP_AGAIN;
+
+ if (!spin_trylock(&mm->page_table_lock))
+ return ret;
pte = find_pte(vma, page, &address);
if (!pte)
@@ -518,6 +526,7 @@
pte_unmap(pte);
out:
+ spin_unlock(&mm->page_table_lock);
return ret;
}
@@ -538,7 +547,7 @@
{
struct address_space *mapping = page->mapping;
struct vm_area_struct *vma;
- int ret = SWAP_SUCCESS;
+ int ret = SWAP_AGAIN;
if (!mapping)
BUG();
@@ -547,23 +556,20 @@
BUG();
if (down_trylock(&mapping->i_shared_sem))
- return SWAP_AGAIN;
+ return ret;
list_for_each_entry(vma, &mapping->i_mmap, shared) {
ret = try_to_unmap_obj_one(vma, page);
- if (ret != SWAP_SUCCESS)
+ if (ret == SWAP_FAIL || !page->pte.mapcount)
goto out;
}
list_for_each_entry(vma, &mapping->i_mmap_shared, shared) {
ret = try_to_unmap_obj_one(vma, page);
- if (ret != SWAP_SUCCESS)
+ if (ret == SWAP_FAIL || !page->pte.mapcount)
goto out;
}
- if (page->pte.mapcount)
- BUG();
-
out:
up(&mapping->i_shared_sem);
return ret;
@@ -753,8 +759,10 @@
}
}
out:
- if (!page_mapped(page))
+ if (!page_mapped(page)) {
dec_page_state(nr_mapped);
+ ret = SWAP_SUCCESS;
+ }
return ret;
}
@@ -839,6 +847,7 @@
index = NRPTE-1;
list_for_each_entry(vma, &mapping->i_mmap, shared) {
+ /* FIXME: unsafe without page_table_lock */
pte = find_pte(vma, page, NULL);
if (pte) {
pte_paddr = ptep_to_paddr(pte);
@@ -855,6 +864,7 @@
}
}
list_for_each_entry(vma, &mapping->i_mmap_shared, shared) {
+ /* FIXME: unsafe without page_table_lock */
pte = find_pte(vma, page, NULL);
if (pte) {
pte_paddr = ptep_to_paddr(pte);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 08/13 rmap comments
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (5 preceding siblings ...)
2003-03-25 22:17 ` [PATCH] swap 07/13 objrmap page_table_lock Hugh Dickins
@ 2003-03-25 22:18 ` Hugh Dickins
2003-03-25 22:19 ` [PATCH] swap 09/13 tmpfs truncation Hugh Dickins
` (4 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Update a few locking comments in rmap.c.
--- swap07/mm/rmap.c Tue Mar 25 20:44:02 2003
+++ swap08/mm/rmap.c Tue Mar 25 20:44:13 2003
@@ -14,8 +14,8 @@
/*
* Locking:
* - the page->pte.chain is protected by the PG_chainlock bit,
- * which nests within the zone->lru_lock, then the
- * mm->page_table_lock, and then the page lock.
+ * which nests within the the mm->page_table_lock,
+ * which nests within the page lock.
* - because swapout locking is opposite to the locking order
* in the page fault path, the swapout path uses trylocks
* on the mm->page_table_lock
@@ -584,9 +584,8 @@
* table entry mapping a page. Because locking order here is opposite
* to the locking order used by the page fault path, we use trylocks.
* Locking:
- * zone->lru_lock page_launder()
- * page lock page_launder(), trylock
- * pte_chain_lock page_launder()
+ * page lock shrink_list(), trylock
+ * pte_chain_lock shrink_list()
* mm->page_table_lock try_to_unmap_one(), trylock
*/
static int FASTCALL(try_to_unmap_one(struct page *, pte_addr_t));
@@ -673,8 +672,8 @@
* @page: the page to get unmapped
*
* Tries to remove all the page table entries which are mapping this
- * page, used in the pageout path. Caller must hold zone->lru_lock
- * and the page lock. Return values are:
+ * page, used in the pageout path. Caller must hold the page lock
+ * and its pte chain lock. Return values are:
*
* SWAP_SUCCESS - we succeeded in removing all mappings
* SWAP_AGAIN - we missed a trylock, try again later
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 09/13 tmpfs truncation
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (6 preceding siblings ...)
2003-03-25 22:18 ` [PATCH] swap 08/13 rmap comments Hugh Dickins
@ 2003-03-25 22:19 ` Hugh Dickins
2003-03-25 22:20 ` [PATCH] swap 10/13 tmpfs atomics Hugh Dickins
` (3 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Recent testing has shown that swapoff can sneak a page back into the
tmpfs page cache after truncate_inode_pages has cleaned it, before
shmem_truncate resets next_index to stop that: BUG_ON(inode->i_blocks)
in shmem_delete_inode. So call truncate_inode_pages again to be safe.
--- swap08/mm/shmem.c Sun Mar 23 10:30:15 2003
+++ swap09/mm/shmem.c Tue Mar 25 20:44:24 2003
@@ -486,6 +486,16 @@
}
done2:
BUG_ON(info->swapped > info->next_index);
+ if (inode->i_mapping->nrpages) {
+ /*
+ * Call truncate_inode_pages again: racing shmem_unuse_inode
+ * may have swizzled a page in from swap since vmtruncate or
+ * generic_delete_inode did it, before we lowered next_index.
+ */
+ spin_unlock(&info->lock);
+ truncate_inode_pages(inode->i_mapping, inode->i_size);
+ spin_lock(&info->lock);
+ }
shmem_recalc_inode(inode);
spin_unlock(&info->lock);
}
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 10/13 tmpfs atomics
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (7 preceding siblings ...)
2003-03-25 22:19 ` [PATCH] swap 09/13 tmpfs truncation Hugh Dickins
@ 2003-03-25 22:20 ` Hugh Dickins
2003-03-26 0:50 ` Andrew Morton
2003-03-25 22:21 ` [PATCH] swap 11/13 fix unuse_pmd fixme Hugh Dickins
` (2 subsequent siblings)
11 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
move_from_swap_cache and add_to_page_cache_lru are using GFP_ATOMIC,
which can easily fail in an intermittent way. Rude if shmem_getpage
then fails with -ENOMEM: cond_resched to let kswapd in, and repeat.
--- swap09/mm/shmem.c Tue Mar 25 20:44:24 2003
+++ swap10/mm/shmem.c Tue Mar 25 20:44:35 2003
@@ -838,8 +838,7 @@
SetPageUptodate(filepage);
set_page_dirty(filepage);
swap_free(swap);
- } else if (!(error = move_from_swap_cache(
- swappage, idx, mapping))) {
+ } else if (move_from_swap_cache(swappage, idx, mapping) == 0) {
shmem_swp_set(info, entry, 0);
shmem_swp_unmap(entry);
spin_unlock(&info->lock);
@@ -850,8 +849,8 @@
spin_unlock(&info->lock);
unlock_page(swappage);
page_cache_release(swappage);
- if (error != -EEXIST)
- goto failed;
+ /* let kswapd refresh zone for GFP_ATOMICs */
+ cond_resched();
goto repeat;
}
} else if (sgp == SGP_READ && !filepage) {
@@ -897,15 +896,16 @@
swap = *entry;
shmem_swp_unmap(entry);
}
- if (error || swap.val ||
- (error = add_to_page_cache_lru(
- filepage, mapping, idx, GFP_ATOMIC))) {
+ if (error || swap.val || 0 != add_to_page_cache_lru(
+ filepage, mapping, idx, GFP_ATOMIC)) {
spin_unlock(&info->lock);
page_cache_release(filepage);
shmem_free_block(inode);
filepage = NULL;
- if (error != -EEXIST)
+ if (error)
goto failed;
+ /* let kswapd refresh zone for GFP_ATOMICs */
+ cond_resched();
goto repeat;
}
}
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 11/13 fix unuse_pmd fixme
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (8 preceding siblings ...)
2003-03-25 22:20 ` [PATCH] swap 10/13 tmpfs atomics Hugh Dickins
@ 2003-03-25 22:21 ` Hugh Dickins
2003-03-25 22:38 ` William Lee Irwin III
2003-03-25 22:22 ` [PATCH] swap 12/13 vm_enough_memory double counts Hugh Dickins
2003-03-25 22:23 ` [PATCH] swap 13/13 may_enter_fs? Hugh Dickins
11 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
try_to_unuse drop mmlist_lock across unuse_process (with pretty dance
of atomic_incs and mmputs of various mmlist markers, and a polite new
cond_resched there), so unuse_process can pte_chain_alloc(GFP_KERNEL)
and pass that down and down and down and down to unuse_pte: which
cannot succeed more than once on a given mm (make that explicit by
returning back up once succeeded). Preliminary checks moved up from
unuse_pte to unuse_pmd, and done more efficiently (avoid that extra
pte_file test added recently), swapoff spends far too long in here.
Updated locking comments and references to try_to_swap_out.
--- swap10/mm/swapfile.c Sun Mar 23 10:30:15 2003
+++ swap11/mm/swapfile.c Tue Mar 25 20:44:46 2003
@@ -377,42 +377,34 @@
* share this swap entry, so be cautious and let do_wp_page work out
* what to do if a write is requested later.
*/
-/* mmlist_lock and vma->vm_mm->page_table_lock are held */
+/* vma->vm_mm->page_table_lock is held */
static void
unuse_pte(struct vm_area_struct *vma, unsigned long address, pte_t *dir,
swp_entry_t entry, struct page *page, struct pte_chain **pte_chainp)
{
- pte_t pte = *dir;
-
- if (pte_file(pte))
- return;
- if (likely(pte_to_swp_entry(pte).val != entry.val))
- return;
- if (unlikely(pte_none(pte) || pte_present(pte)))
- return;
+ vma->vm_mm->rss++;
get_page(page);
set_pte(dir, pte_mkold(mk_pte(page, vma->vm_page_prot)));
SetPageAnon(page);
*pte_chainp = page_add_rmap(page, dir, *pte_chainp);
swap_free(entry);
- ++vma->vm_mm->rss;
}
-/* mmlist_lock and vma->vm_mm->page_table_lock are held */
-static void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir,
+/* vma->vm_mm->page_table_lock is held */
+static int unuse_pmd(struct vm_area_struct * vma, pmd_t *dir,
unsigned long address, unsigned long size, unsigned long offset,
- swp_entry_t entry, struct page* page)
+ swp_entry_t entry, struct page *page, struct pte_chain **pte_chainp)
{
pte_t * pte;
unsigned long end;
- struct pte_chain *pte_chain = NULL;
+ pte_t swp_pte = swp_entry_to_pte(entry);
if (pmd_none(*dir))
- return;
+ return 0;
if (pmd_bad(*dir)) {
pmd_ERROR(*dir);
pmd_clear(dir);
- return;
+ return 0;
}
pte = pte_offset_map(dir, address);
offset += address & PMD_MASK;
@@ -422,33 +414,36 @@
end = PMD_SIZE;
do {
/*
- * FIXME: handle pte_chain_alloc() failures
+ * swapoff spends a _lot_ of time in this loop!
+ * Test inline before going to call unuse_pte.
*/
- if (pte_chain == NULL)
- pte_chain = pte_chain_alloc(GFP_ATOMIC);
- unuse_pte(vma, offset+address-vma->vm_start,
- pte, entry, page, &pte_chain);
+ if (unlikely(pte_same(*pte, swp_pte))) {
+ unuse_pte(vma, offset + address, pte,
+ entry, page, pte_chainp);
+ pte_unmap(pte);
+ return 1;
+ }
address += PAGE_SIZE;
pte++;
} while (address && (address < end));
pte_unmap(pte - 1);
- pte_chain_free(pte_chain);
+ return 0;
}
-/* mmlist_lock and vma->vm_mm->page_table_lock are held */
-static void unuse_pgd(struct vm_area_struct * vma, pgd_t *dir,
+/* vma->vm_mm->page_table_lock is held */
+static int unuse_pgd(struct vm_area_struct * vma, pgd_t *dir,
unsigned long address, unsigned long size,
- swp_entry_t entry, struct page* page)
+ swp_entry_t entry, struct page *page, struct pte_chain **pte_chainp)
{
pmd_t * pmd;
unsigned long offset, end;
if (pgd_none(*dir))
- return;
+ return 0;
if (pgd_bad(*dir)) {
pgd_ERROR(*dir);
pgd_clear(dir);
- return;
+ return 0;
}
pmd = pmd_offset(dir, address);
offset = address & PGDIR_MASK;
@@ -459,32 +454,42 @@
if (address >= end)
BUG();
do {
- unuse_pmd(vma, pmd, address, end - address, offset, entry,
- page);
+ if (unuse_pmd(vma, pmd, address, end - address,
+ offset, entry, page, pte_chainp))
+ return 1;
address = (address + PMD_SIZE) & PMD_MASK;
pmd++;
} while (address && (address < end));
+ return 0;
}
-/* mmlist_lock and vma->vm_mm->page_table_lock are held */
-static void unuse_vma(struct vm_area_struct * vma, pgd_t *pgdir,
- swp_entry_t entry, struct page* page)
+/* vma->vm_mm->page_table_lock is held */
+static int unuse_vma(struct vm_area_struct * vma, pgd_t *pgdir,
+ swp_entry_t entry, struct page *page, struct pte_chain **pte_chainp)
{
unsigned long start = vma->vm_start, end = vma->vm_end;
if (start >= end)
BUG();
do {
- unuse_pgd(vma, pgdir, start, end - start, entry, page);
+ if (unuse_pgd(vma, pgdir, start, end - start,
+ entry, page, pte_chainp))
+ return 1;
start = (start + PGDIR_SIZE) & PGDIR_MASK;
pgdir++;
} while (start && (start < end));
+ return 0;
}
-static void unuse_process(struct mm_struct * mm,
+static int unuse_process(struct mm_struct * mm,
swp_entry_t entry, struct page* page)
{
struct vm_area_struct* vma;
+ struct pte_chain *pte_chain;
+
+ pte_chain = pte_chain_alloc(GFP_KERNEL);
+ if (!pte_chain)
+ return -ENOMEM;
/*
* Go through process' page directory.
@@ -492,10 +497,12 @@
spin_lock(&mm->page_table_lock);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
pgd_t * pgd = pgd_offset(mm, vma->vm_start);
- unuse_vma(vma, pgd, entry, page);
+ if (unuse_vma(vma, pgd, entry, page, &pte_chain))
+ break;
}
spin_unlock(&mm->page_table_lock);
- return;
+ pte_chain_free(pte_chain);
+ return 0;
}
/*
@@ -639,36 +646,54 @@
if (start_mm == &init_mm)
shmem = shmem_unuse(entry, page);
else
- unuse_process(start_mm, entry, page);
+ retval = unuse_process(start_mm, entry, page);
}
if (*swap_map > 1) {
int set_start_mm = (*swap_map >= swcount);
struct list_head *p = &start_mm->mmlist;
struct mm_struct *new_start_mm = start_mm;
+ struct mm_struct *prev_mm = start_mm;
struct mm_struct *mm;
+ atomic_inc(&new_start_mm->mm_users);
+ atomic_inc(&prev_mm->mm_users);
spin_lock(&mmlist_lock);
- while (*swap_map > 1 &&
+ while (*swap_map > 1 && !retval &&
(p = p->next) != &start_mm->mmlist) {
mm = list_entry(p, struct mm_struct, mmlist);
+ atomic_inc(&mm->mm_users);
+ spin_unlock(&mmlist_lock);
+ mmput(prev_mm);
+ prev_mm = mm;
+
+ cond_resched();
+
swcount = *swap_map;
- if (mm == &init_mm) {
+ if (swcount <= 1)
+ ;
+ else if (mm == &init_mm) {
set_start_mm = 1;
- spin_unlock(&mmlist_lock);
shmem = shmem_unuse(entry, page);
- spin_lock(&mmlist_lock);
} else
- unuse_process(mm, entry, page);
+ retval = unuse_process(mm, entry, page);
if (set_start_mm && *swap_map < swcount) {
+ mmput(new_start_mm);
+ atomic_inc(&mm->mm_users);
new_start_mm = mm;
set_start_mm = 0;
}
+ spin_lock(&mmlist_lock);
}
- atomic_inc(&new_start_mm->mm_users);
spin_unlock(&mmlist_lock);
+ mmput(prev_mm);
mmput(start_mm);
start_mm = new_start_mm;
}
+ if (retval) {
+ unlock_page(page);
+ page_cache_release(page);
+ break;
+ }
/*
* How could swap count reach 0x7fff when the maximum
@@ -692,7 +717,7 @@
/*
* If a reference remains (rare), we would like to leave
- * the page in the swap cache; but try_to_swap_out could
+ * the page in the swap cache; but try_to_unmap could
* then re-duplicate the entry once we drop page lock,
* so we might loop indefinitely; also, that page could
* not be swapped out to other storage meanwhile. So:
@@ -728,7 +753,7 @@
/*
* So we could skip searching mms once swap count went
* to 1, we did not mark any present ptes as dirty: must
- * mark page dirty so try_to_swap_out will preserve it.
+ * mark page dirty so shrink_list will preserve it.
*/
SetPageDirty(page);
unlock_page(page);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 12/13 vm_enough_memory double counts
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (9 preceding siblings ...)
2003-03-25 22:21 ` [PATCH] swap 11/13 fix unuse_pmd fixme Hugh Dickins
@ 2003-03-25 22:22 ` Hugh Dickins
2003-03-25 22:23 ` [PATCH] swap 13/13 may_enter_fs? Hugh Dickins
11 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Stop vm_enough_memory double counting total_swapcache_pages: it dates
from the days when we didn't free swap when freeing swapcache page.
--- swap11/mm/mmap.c Sun Mar 23 10:30:15 2003
+++ swap12/mm/mmap.c Tue Mar 25 20:44:57 2003
@@ -82,14 +82,6 @@
free += nr_swap_pages;
/*
- * This double-counts: the nrpages are both in the
- * page-cache and in the swapper space. At the same time,
- * this compensates for the swap-space over-allocation
- * (ie "nr_swap_pages" being too small).
- */
- free += total_swapcache_pages;
-
- /*
* The code below doesn't account for free space in the
* inode and dentry slab cache, slab cache fragmentation,
* inodes and dentries which will become freeable under
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] swap 13/13 may_enter_fs?
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
` (10 preceding siblings ...)
2003-03-25 22:22 ` [PATCH] swap 12/13 vm_enough_memory double counts Hugh Dickins
@ 2003-03-25 22:23 ` Hugh Dickins
2003-03-26 1:12 ` Andrew Morton
11 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2003-03-25 22:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
shrink_list's may_enter_fs (may_write_page would be a better name)
currently reflects that swapcache page I/O won't suffer from FS
complications, so can be done if __GFP_IO without __GFP_FS; but
the same is true of a tmpfs page (which is just this stage away
from being a swapcache page), so check bdi->memory_backed instead.
--- swap12/mm/vmscan.c Tue Mar 25 20:43:07 2003
+++ swap13/mm/vmscan.c Tue Mar 25 20:45:08 2003
@@ -235,7 +235,6 @@
pagevec_init(&freed_pvec, 1);
while (!list_empty(page_list)) {
struct page *page;
- int may_enter_fs;
page = list_entry(page_list->prev, struct page, lru);
list_del(&page->lru);
@@ -248,8 +247,6 @@
(*nr_mapped)++;
BUG_ON(PageActive(page));
- may_enter_fs = (gfp_mask & __GFP_FS) ||
- (PageSwapCache(page) && (gfp_mask & __GFP_IO));
if (PageWriteback(page))
goto keep_locked;
@@ -315,15 +312,19 @@
* See swapfile.c:page_queue_congested().
*/
if (PageDirty(page)) {
+ struct backing_dev_info *bdi;
+
if (!is_page_cache_freeable(page))
goto keep_locked;
if (!mapping)
goto keep_locked;
if (mapping->a_ops->writepage == NULL)
goto activate_locked;
- if (!may_enter_fs)
+ bdi = mapping->backing_dev_info;
+ if (!(gfp_mask & (bdi->memory_backed?
+ __GFP_IO: __GFP_FS)))
goto keep_locked;
- if (!may_write_to_queue(mapping->backing_dev_info))
+ if (!may_write_to_queue(bdi))
goto keep_locked;
write_lock(&mapping->page_lock);
if (test_clear_page_dirty(page)) {
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 11/13 fix unuse_pmd fixme
2003-03-25 22:21 ` [PATCH] swap 11/13 fix unuse_pmd fixme Hugh Dickins
@ 2003-03-25 22:38 ` William Lee Irwin III
0 siblings, 0 replies; 22+ messages in thread
From: William Lee Irwin III @ 2003-03-25 22:38 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel
On Tue, Mar 25, 2003 at 10:21:51PM +0000, Hugh Dickins wrote:
> try_to_unuse drop mmlist_lock across unuse_process (with pretty dance
> of atomic_incs and mmputs of various mmlist markers, and a polite new
> cond_resched there), so unuse_process can pte_chain_alloc(GFP_KERNEL)
> and pass that down and down and down and down to unuse_pte: which
> cannot succeed more than once on a given mm (make that explicit by
> returning back up once succeeded). Preliminary checks moved up from
> unuse_pte to unuse_pmd, and done more efficiently (avoid that extra
> pte_file test added recently), swapoff spends far too long in here.
> Updated locking comments and references to try_to_swap_out.
Aha, the unlock_page() and the contorted loop breakout were wrong
in my patch. Well, that and the cleanup of fossilized comments dating
back to the Silurian is nice.
-- wli
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 10/13 tmpfs atomics
2003-03-25 22:20 ` [PATCH] swap 10/13 tmpfs atomics Hugh Dickins
@ 2003-03-26 0:50 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2003-03-26 0:50 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-kernel
Hugh Dickins <hugh@veritas.com> wrote:
>
> move_from_swap_cache and add_to_page_cache_lru are using GFP_ATOMIC,
> which can easily fail in an intermittent way. Rude if shmem_getpage
> then fails with -ENOMEM: cond_resched to let kswapd in, and repeat.
I think the preferred way of waiting for the IO system and kswapd to catch up
is blk_congestion_wait(). Because it waits for the "right" amount of time.
I'll make that change.
And yes, the name is silly: "what if it's not block-backed"? It hasn't
caused any problems yet, but maybe one day we'll need to find a way for
network-backed filesystems to deliver a wakeup to sleepers there.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-25 22:23 ` [PATCH] swap 13/13 may_enter_fs? Hugh Dickins
@ 2003-03-26 1:12 ` Andrew Morton
2003-03-26 17:22 ` Hugh Dickins
2003-03-26 18:06 ` Bill Davidsen
0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2003-03-26 1:12 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-kernel
Hugh Dickins <hugh@veritas.com> wrote:
>
> shrink_list's may_enter_fs (may_write_page would be a better name)
> currently reflects that swapcache page I/O won't suffer from FS
> complications, so can be done if __GFP_IO without __GFP_FS; but
> the same is true of a tmpfs page (which is just this stage away
> from being a swapcache page), so check bdi->memory_backed instead.
>
> ...
>
> + if (!(gfp_mask & (bdi->memory_backed?
> + __GFP_IO: __GFP_FS)))
> goto keep_locked;
Barf. I haven't used a question mark operator in ten years, and this is a
fine demonstration of why ;)
I think a feasibly comprehensible transformation would be:
/*
* A comment goes here
*/
if (bdi->memory_backed)
may_enter_fs = gfp_mask & __GFP_IO;
else
may_enter_fs = gfp_mask & __GFP_FS;
That being said, this is a bit presumptuous. PF_MEMALLOC will protect us
from infinite recursion but there are other reasons for GFP_NOFS.
For example, a memory-backed filesystem may be trying to allocate GFP_NOFS
memory while holding filesystem locks which are taken by its writepage.
How about adding a new field to backing_dev_info for this case? Damned if I
can think of a name for it though.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-26 1:12 ` Andrew Morton
@ 2003-03-26 17:22 ` Hugh Dickins
2003-03-27 0:31 ` Andrew Morton
2003-03-26 18:06 ` Bill Davidsen
1 sibling, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2003-03-26 17:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Tue, 25 Mar 2003, Andrew Morton wrote:
>
> For example, a memory-backed filesystem may be trying to allocate GFP_NOFS
> memory while holding filesystem locks which are taken by its writepage.
>
> How about adding a new field to backing_dev_info for this case? Damned if I
> can think of a name for it though.
I think you're overcomplicating it. Of the existing memory_backed bdis
(ramdisk, hugetlbfs, ramfs, sysfs, tmpfs, swap) only two have non-NULL
writepage, and both of those two go (indirectly and directly) to swap
(and neither holds FS lock while waiting to allocate memory).
If we were looking for a correct solution, I don't think backing_dev_info
would be the right place: we're talking about GFP_ needed for writepage,
which should be specified in the struct address_space filled in by the
FS: I think it's more a limitation of the FS than its backing device.
But I don't really want to add another obscure gfp_mask initialization
all over the place. So how about this? As I say, "swap_backed" is
(for the foreseeable future) unnecessary, but does help to correct
mistaken impressions arising from the term "memory_backed".
I've conceded to your delicate sensibility by using if-else rather
than ?: (I used it rather too compactly there, but see nothing wrong
with the construct in general); but you'll be disgusted at my defiance
in not adding a comment in the place you indicated - what would I say?
Hugh
--- 2.5.66-mm1/mm/shmem.c Wed Mar 26 11:51:01 2003
+++ swap13/mm/shmem.c Wed Mar 26 16:35:08 2003
@@ -123,7 +123,8 @@
static struct backing_dev_info shmem_backing_dev_info = {
.ra_pages = 0, /* No readahead */
- .memory_backed = 1, /* Does not contribute to dirty memory */
+ .memory_backed = 1, /* Do not count its dirty pages in nr_dirty */
+ .swap_backed = 1, /* Its memory_backed writepage goes to swap */
};
LIST_HEAD (shmem_inodes);
--- 2.5.66-mm1/mm/swap_state.c Wed Mar 26 11:51:01 2003
+++ swap13/mm/swap_state.c Wed Mar 26 16:35:08 2003
@@ -27,7 +27,8 @@
static struct backing_dev_info swap_backing_dev_info = {
.ra_pages = 0, /* No readahead */
- .memory_backed = 1, /* Does not contribute to dirty memory */
+ .memory_backed = 1, /* Do not count its dirty pages in nr_dirty */
+ .swap_backed = 1, /* Its memory_backed writepage goes to swap */
};
extern struct address_space_operations swap_aops;
--- 2.5.66-mm1/mm/vmscan.c Wed Mar 26 11:51:01 2003
+++ swap13/mm/vmscan.c Wed Mar 26 16:35:08 2003
@@ -235,7 +235,6 @@
pagevec_init(&freed_pvec, 1);
while (!list_empty(page_list)) {
struct page *page;
- int may_enter_fs;
page = list_entry(page_list->prev, struct page, lru);
list_del(&page->lru);
@@ -248,8 +247,6 @@
(*nr_mapped)++;
BUG_ON(PageActive(page));
- may_enter_fs = (gfp_mask & __GFP_FS) ||
- (PageSwapCache(page) && (gfp_mask & __GFP_IO));
if (PageWriteback(page))
goto keep_locked;
@@ -315,15 +312,23 @@
* See swapfile.c:page_queue_congested().
*/
if (PageDirty(page)) {
+ struct backing_dev_info *bdi;
+ unsigned int gfp_needed_for_writepage;
+
if (!is_page_cache_freeable(page))
goto keep_locked;
if (!mapping)
goto keep_locked;
if (mapping->a_ops->writepage == NULL)
goto activate_locked;
- if (!may_enter_fs)
+ bdi = mapping->backing_dev_info;
+ if (bdi->swap_backed)
+ gfp_needed_for_writepage = __GFP_IO;
+ else
+ gfp_needed_for_writepage = __GFP_FS;
+ if (!(gfp_mask & gfp_needed_for_writepage))
goto keep_locked;
- if (!may_write_to_queue(mapping->backing_dev_info))
+ if (!may_write_to_queue(bdi))
goto keep_locked;
write_lock(&mapping->page_lock);
if (test_clear_page_dirty(page)) {
--- 2.5.66-mm1/include/linux/backing-dev.h Wed Mar 26 11:50:36 2003
+++ swap13/include/linux/backing-dev.h Wed Mar 26 16:35:08 2003
@@ -23,7 +23,9 @@
struct backing_dev_info {
unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */
unsigned long state; /* Always use atomic bitops on this */
- int memory_backed; /* Cannot clean pages with writepage */
+ unsigned int
+ memory_backed:1,/* Do not count its dirty pages in nr_dirty */
+ swap_backed:1; /* Its memory_backed writepage goes to swap */
};
extern struct backing_dev_info default_backing_dev_info;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-26 1:12 ` Andrew Morton
2003-03-26 17:22 ` Hugh Dickins
@ 2003-03-26 18:06 ` Bill Davidsen
2003-03-26 18:42 ` Hugh Dickins
1 sibling, 1 reply; 22+ messages in thread
From: Bill Davidsen @ 2003-03-26 18:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Hugh Dickins, linux-kernel
On Tue, 25 Mar 2003, Andrew Morton wrote:
> Hugh Dickins <hugh@veritas.com> wrote:
> >
> > shrink_list's may_enter_fs (may_write_page would be a better name)
> > currently reflects that swapcache page I/O won't suffer from FS
> > complications, so can be done if __GFP_IO without __GFP_FS; but
> > the same is true of a tmpfs page (which is just this stage away
> > from being a swapcache page), so check bdi->memory_backed instead.
> >
> > ...
> >
> > + if (!(gfp_mask & (bdi->memory_backed?
> > + __GFP_IO: __GFP_FS)))
> > goto keep_locked;
>
> Barf. I haven't used a question mark operator in ten years, and this is a
> fine demonstration of why ;)
>
> I think a feasibly comprehensible transformation would be:
>
> /*
> * A comment goes here
> */
> if (bdi->memory_backed)
> may_enter_fs = gfp_mask & __GFP_IO;
> else
> may_enter_fs = gfp_mask & __GFP_FS;
Unless there's a subtle difference in functionality here that I'm missing,
you are doing the same thing in a larger and slower way, and the logic is
less clear.
Is there some benefit I'm missing?
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-26 18:06 ` Bill Davidsen
@ 2003-03-26 18:42 ` Hugh Dickins
2003-03-27 14:55 ` Bill Davidsen
0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2003-03-26 18:42 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Andrew Morton, linux-kernel
On Wed, 26 Mar 2003, Bill Davidsen wrote:
>
> Unless there's a subtle difference in functionality here that I'm missing,
> you are doing the same thing in a larger and slower way, and the logic is
> less clear.
>
> Is there some benefit I'm missing?
No, it's just that Andrew finds the logic clearer when written out his way.
Hugh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-26 17:22 ` Hugh Dickins
@ 2003-03-27 0:31 ` Andrew Morton
2003-03-27 15:21 ` Hugh Dickins
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2003-03-27 0:31 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-kernel
Hugh Dickins <hugh@veritas.com> wrote:
>
> On Tue, 25 Mar 2003, Andrew Morton wrote:
> >
> > For example, a memory-backed filesystem may be trying to allocate GFP_NOFS
> > memory while holding filesystem locks which are taken by its writepage.
> >
> > How about adding a new field to backing_dev_info for this case? Damned if I
> > can think of a name for it though.
>
> I think you're overcomplicating it. Of the existing memory_backed bdis
> (ramdisk, hugetlbfs, ramfs, sysfs, tmpfs, swap) only two have non-NULL
> writepage, and both of those two go (indirectly and directly) to swap
> (and neither holds FS lock while waiting to allocate memory).
But this is a much nicer patch. Thanks for doing all this btw. I was
barfing at ?:, not your code ;)
> If we were looking for a correct solution, I don't think backing_dev_info
> would be the right place: we're talking about GFP_ needed for writepage,
> which should be specified in the struct address_space filled in by the
> FS: I think it's more a limitation of the FS than its backing device.
Good point.
> + bdi = mapping->backing_dev_info;
> + if (bdi->swap_backed)
> + gfp_needed_for_writepage = __GFP_IO;
> + else
> + gfp_needed_for_writepage = __GFP_FS;
> + if (!(gfp_mask & gfp_needed_for_writepage))
This is inaccurate? shmem_writepage() performs no IO and could/should be
called even for GFP_NOIO allocations.
It's probably not very important but if we're going to make a change it may
as well be the right one.
Could you live with
if (bdi->has_special_writepage)
gfp_needed_for_writepage = bfi->gfp_needed_for_writepage;
? So swap_backing_dev_info uses __GFP_IO and shmem_backing_dev_info() (which
is competely atomic) uses zero?
Yeah, it's a bit awkward. I'm OK with the special-casing. Both swap and
tmpfs _are_ special, and unique. Recognising that fact in vmscan.c is
reasonable. ->gfp_needed_for_writepage should probably be in the superblock,
but that's just too far away.
> - int memory_backed; /* Cannot clean pages with writepage */
> + unsigned int
> + memory_backed:1,/* Do not count its dirty pages in nr_dirty */
> + swap_backed:1; /* Its memory_backed writepage goes to swap */
> };
Hard call. It is a tradeoff between icache misses and dcache misses.
Obviously that is trivia in this case.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-26 18:42 ` Hugh Dickins
@ 2003-03-27 14:55 ` Bill Davidsen
0 siblings, 0 replies; 22+ messages in thread
From: Bill Davidsen @ 2003-03-27 14:55 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel
On Wed, 26 Mar 2003, Hugh Dickins wrote:
> On Wed, 26 Mar 2003, Bill Davidsen wrote:
> >
> > Unless there's a subtle difference in functionality here that I'm missing,
> > you are doing the same thing in a larger and slower way, and the logic is
> > less clear.
> >
> > Is there some benefit I'm missing?
>
> No, it's just that Andrew finds the logic clearer when written out his way.
Looking a code generated from fragments, I don't think gcc shares that
opinion ;-) Actually I find it more obvious with the ? notation, and it
prevents someone in the future changing the logic in one line of code when
it needs to change in both.
Oh well, I expect the ? form to stay, since it uses less cache.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] swap 13/13 may_enter_fs?
2003-03-27 0:31 ` Andrew Morton
@ 2003-03-27 15:21 ` Hugh Dickins
0 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2003-03-27 15:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Wed, 26 Mar 2003, Andrew Morton wrote:
> Hugh Dickins <hugh@veritas.com> wrote:
>
> But this is a much nicer patch. Thanks for doing all this btw. I was
> barfing at ?:, not your code ;)
Phew! I'll make more use of the ;) operator in future then.
> > If we were looking for a correct solution, I don't think backing_dev_info
> > would be the right place: we're talking about GFP_ needed for writepage,
> > which should be specified in the struct address_space filled in by the
> > FS: I think it's more a limitation of the FS than its backing device.
>
> Good point.
Well, I'm beginning to wonder.
> > + bdi = mapping->backing_dev_info;
> > + if (bdi->swap_backed)
> > + gfp_needed_for_writepage = __GFP_IO;
> > + else
> > + gfp_needed_for_writepage = __GFP_FS;
> > + if (!(gfp_mask & gfp_needed_for_writepage))
>
> This is inaccurate? shmem_writepage() performs no IO and could/should be
> called even for GFP_NOIO allocations.
Yes, you're right. I was going to quibble that we might sometime
change shmem_writepage to call swap_writepage directly, instead of
giving the page another spin round the lru. But that's not how it
is at present, and I think your GFP_NOIO shmem_writepage is better.
> It's probably not very important but if we're going to make a change it may
> as well be the right one.
>
> Could you live with
>
> if (bdi->has_special_writepage)
> gfp_needed_for_writepage = bfi->gfp_needed_for_writepage;
>
> ? So swap_backing_dev_info uses __GFP_IO and shmem_backing_dev_info() (which
> is competely atomic) uses zero?
>
> Yeah, it's a bit awkward. I'm OK with the special-casing. Both swap and
> tmpfs _are_ special, and unique. Recognising that fact in vmscan.c is
> reasonable. ->gfp_needed_for_writepage should probably be in the superblock,
> but that's just too far away.
As you say, it's not very important, and I could live with it.
But I'm feeling we've not yet arrived at the right (even contingently
right) answer. Let's hold off from making any such change for now.
Does a block device writepage need __GFP_FS? Oh, looks like it uses
bufferheads, that's a sure sign it does, isn't it?
Did mempools make __GFP_IO redundant? Perhaps for some devices and
not for others. (mempool_alloc doesn't try the waiting allocation
if __GFP_FS is not set.)
I've a feeling both filesystem and backing device might like to
contribute to the mask. And it's always worth considering where
loop might fit in with this too (I do have a bdi->over_loop in
one of my loop deadlock patches).
> > - int memory_backed; /* Cannot clean pages with writepage */
> > + unsigned int
> > + memory_backed:1,/* Do not count its dirty pages in nr_dirty */
> > + swap_backed:1; /* Its memory_backed writepage goes to swap */
> > };
>
> Hard call. It is a tradeoff between icache misses and dcache misses.
> Obviously that is trivia in this case.
Now you've lost me so completely, that I can't even tell if your
icache and dcache here are instruction and data or inode and dentry!
Hugh
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2003-03-27 15:08 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-25 22:10 [PATCH] swap 01/13 no SWAP_ERROR Hugh Dickins
2003-03-25 22:12 ` [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap Hugh Dickins
2003-03-25 22:13 ` [PATCH] swap 03/13 add_to_swap_cache Hugh Dickins
2003-03-25 22:13 ` [PATCH] swap 04/13 page_convert_anon -ENOMEM Hugh Dickins
2003-03-25 22:14 ` [PATCH] swap 05/13 page_convert_anon unlocking Hugh Dickins
2003-03-25 22:15 ` [PATCH] swap 06/13 wrap below vm_start Hugh Dickins
2003-03-25 22:17 ` [PATCH] swap 07/13 objrmap page_table_lock Hugh Dickins
2003-03-25 22:18 ` [PATCH] swap 08/13 rmap comments Hugh Dickins
2003-03-25 22:19 ` [PATCH] swap 09/13 tmpfs truncation Hugh Dickins
2003-03-25 22:20 ` [PATCH] swap 10/13 tmpfs atomics Hugh Dickins
2003-03-26 0:50 ` Andrew Morton
2003-03-25 22:21 ` [PATCH] swap 11/13 fix unuse_pmd fixme Hugh Dickins
2003-03-25 22:38 ` William Lee Irwin III
2003-03-25 22:22 ` [PATCH] swap 12/13 vm_enough_memory double counts Hugh Dickins
2003-03-25 22:23 ` [PATCH] swap 13/13 may_enter_fs? Hugh Dickins
2003-03-26 1:12 ` Andrew Morton
2003-03-26 17:22 ` Hugh Dickins
2003-03-27 0:31 ` Andrew Morton
2003-03-27 15:21 ` Hugh Dickins
2003-03-26 18:06 ` Bill Davidsen
2003-03-26 18:42 ` Hugh Dickins
2003-03-27 14:55 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).