* [PATCH v2 0/3] A few cleanup and fixup patches for swap @ 2022-06-08 14:40 Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected Miaohe Lin ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-08 14:40 UTC (permalink / raw) To: akpm; +Cc: david, linux-mm, linux-kernel, linmiaohe Hi everyone, This series contains a cleaup patch to remove unneeded swap_cache_info statistics, and two bugfix patches to avoid possible data races of inuse_pages and so on. More details can be found in the respective changelogs. Thanks! --- v2: collect Reviewed-by tag per David drop patch "mm/swapfile: avoid confusing swap cache statistics" add a new patch to remove swap_cache_info statistics per David Many thanks David for review and comment. --- Miaohe Lin (3): mm/swapfile: make security_vm_enough_memory_mm() work as expected mm/swapfile: fix possible data races of inuse_pages mm/swap: remove swap_cache_info statistics mm/swap_state.c | 17 ----------------- mm/swapfile.c | 14 +++++++++----- 2 files changed, 9 insertions(+), 22 deletions(-) -- 2.23.0 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-08 14:40 [PATCH v2 0/3] A few cleanup and fixup patches for swap Miaohe Lin @ 2022-06-08 14:40 ` Miaohe Lin 2022-06-17 7:33 ` David Hildenbrand 2022-06-20 7:31 ` Huang, Ying 2022-06-08 14:40 ` [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages Miaohe Lin ` (2 subsequent siblings) 3 siblings, 2 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-08 14:40 UTC (permalink / raw) To: akpm; +Cc: david, linux-mm, linux-kernel, linmiaohe security_vm_enough_memory_mm() checks whether a process has enough memory to allocate a new virtual mapping. And total_swap_pages is considered as available memory while swapoff tries to make sure there's enough memory that can hold the swapped out memory. But total_swap_pages contains the swap space that is being swapoff. So security_vm_enough_memory_mm() will success even if there's no memory to hold the swapped out memory because total_swap_pages always greater than or equal to p->pages. In order to fix it, p->pages should be retracted from total_swap_pages first and then check whether there's enough memory for inuse swap pages. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/swapfile.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index ec4c1b276691..d2bead7b8b70 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) struct filename *pathname; int err, found = 0; unsigned int old_block_size; + unsigned int inuse_pages; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_unlock(&swap_lock); goto out_dput; } - if (!security_vm_enough_memory_mm(current->mm, p->pages)) - vm_unacct_memory(p->pages); + + total_swap_pages -= p->pages; + inuse_pages = READ_ONCE(p->inuse_pages); + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) + vm_unacct_memory(inuse_pages); else { + total_swap_pages += p->pages; err = -ENOMEM; spin_unlock(&swap_lock); goto out_dput; @@ -2453,7 +2458,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) } plist_del(&p->list, &swap_active_head); atomic_long_sub(p->pages, &nr_swap_pages); - total_swap_pages -= p->pages; p->flags &= ~SWP_WRITEOK; spin_unlock(&p->lock); spin_unlock(&swap_lock); -- 2.23.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-08 14:40 ` [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected Miaohe Lin @ 2022-06-17 7:33 ` David Hildenbrand 2022-06-18 2:43 ` Miaohe Lin 2022-06-20 7:31 ` Huang, Ying 1 sibling, 1 reply; 31+ messages in thread From: David Hildenbrand @ 2022-06-17 7:33 UTC (permalink / raw) To: Miaohe Lin, akpm; +Cc: linux-mm, linux-kernel On 08.06.22 16:40, Miaohe Lin wrote: > security_vm_enough_memory_mm() checks whether a process has enough memory > to allocate a new virtual mapping. And total_swap_pages is considered as > available memory while swapoff tries to make sure there's enough memory > that can hold the swapped out memory. But total_swap_pages contains the > swap space that is being swapoff. So security_vm_enough_memory_mm() will > success even if there's no memory to hold the swapped out memory because s/success/succeed/ > total_swap_pages always greater than or equal to p->pages. > > In order to fix it, p->pages should be retracted from total_swap_pages s/retracted/subtracted/ > first and then check whether there's enough memory for inuse swap pages. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/swapfile.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index ec4c1b276691..d2bead7b8b70 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) > struct filename *pathname; > int err, found = 0; > unsigned int old_block_size; > + unsigned int inuse_pages; > > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) > spin_unlock(&swap_lock); > goto out_dput; > } > - if (!security_vm_enough_memory_mm(current->mm, p->pages)) > - vm_unacct_memory(p->pages); > + > + total_swap_pages -= p->pages; > + inuse_pages = READ_ONCE(p->inuse_pages); > + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) > + vm_unacct_memory(inuse_pages); > else { > + total_swap_pages += p->pages; That implies that whenever we fail in security_vm_enough_memory_mm(), that other concurrent users might see a wrong total_swap_pages. Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. Temporarily, we'd have CommitLimit 4 GiB Committed_AS 10 GiB Not sure if relevant, but I wonder if it could be avoided somehow? Apart from that, LGTM. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-17 7:33 ` David Hildenbrand @ 2022-06-18 2:43 ` Miaohe Lin 2022-06-18 7:10 ` David Hildenbrand 0 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-18 2:43 UTC (permalink / raw) To: David Hildenbrand, akpm; +Cc: linux-mm, linux-kernel On 2022/6/17 15:33, David Hildenbrand wrote: > On 08.06.22 16:40, Miaohe Lin wrote: >> security_vm_enough_memory_mm() checks whether a process has enough memory >> to allocate a new virtual mapping. And total_swap_pages is considered as >> available memory while swapoff tries to make sure there's enough memory >> that can hold the swapped out memory. But total_swap_pages contains the >> swap space that is being swapoff. So security_vm_enough_memory_mm() will >> success even if there's no memory to hold the swapped out memory because > > s/success/succeed/ OK. Thanks. > >> total_swap_pages always greater than or equal to p->pages. >> >> In order to fix it, p->pages should be retracted from total_swap_pages > > s/retracted/subtracted/ OK. Thanks. > >> first and then check whether there's enough memory for inuse swap pages. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/swapfile.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index ec4c1b276691..d2bead7b8b70 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >> struct filename *pathname; >> int err, found = 0; >> unsigned int old_block_size; >> + unsigned int inuse_pages; >> >> if (!capable(CAP_SYS_ADMIN)) >> return -EPERM; >> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >> spin_unlock(&swap_lock); >> goto out_dput; >> } >> - if (!security_vm_enough_memory_mm(current->mm, p->pages)) >> - vm_unacct_memory(p->pages); >> + >> + total_swap_pages -= p->pages; >> + inuse_pages = READ_ONCE(p->inuse_pages); >> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) >> + vm_unacct_memory(inuse_pages); >> else { >> + total_swap_pages += p->pages; > > That implies that whenever we fail in security_vm_enough_memory_mm(), > that other concurrent users might see a wrong total_swap_pages. > > Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. > > Temporarily, we'd have > > CommitLimit 4 GiB > Committed_AS 10 GiB IIUC, even if without this change, the other concurrent users if come after vm_acct_memory() is done in __vm_enough_memory(), they might see CommitLimit 12 GiB (4 GiB memory + 8GiB total swap) Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff) Or am I miss something? > > Not sure if relevant, but I wonder if it could be avoided somehow? It seems this race exists already and is benign. The worst case is concurrent users might fail to allocate the memory. But that window should be really small and swapoff is a rare ops. Or should I try to fix this race? > > > Apart from that, LGTM. Many thanks for comment! :) > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-18 2:43 ` Miaohe Lin @ 2022-06-18 7:10 ` David Hildenbrand 2022-06-18 7:31 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: David Hildenbrand @ 2022-06-18 7:10 UTC (permalink / raw) To: Miaohe Lin, akpm; +Cc: linux-mm, linux-kernel On 18.06.22 04:43, Miaohe Lin wrote: > On 2022/6/17 15:33, David Hildenbrand wrote: >> On 08.06.22 16:40, Miaohe Lin wrote: >>> security_vm_enough_memory_mm() checks whether a process has enough memory >>> to allocate a new virtual mapping. And total_swap_pages is considered as >>> available memory while swapoff tries to make sure there's enough memory >>> that can hold the swapped out memory. But total_swap_pages contains the >>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>> success even if there's no memory to hold the swapped out memory because >> >> s/success/succeed/ > > OK. Thanks. > >> >>> total_swap_pages always greater than or equal to p->pages. >>> >>> In order to fix it, p->pages should be retracted from total_swap_pages >> >> s/retracted/subtracted/ > > OK. Thanks. > >> >>> first and then check whether there's enough memory for inuse swap pages. >>> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> --- >>> mm/swapfile.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>> index ec4c1b276691..d2bead7b8b70 100644 >>> --- a/mm/swapfile.c >>> +++ b/mm/swapfile.c >>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>> struct filename *pathname; >>> int err, found = 0; >>> unsigned int old_block_size; >>> + unsigned int inuse_pages; >>> >>> if (!capable(CAP_SYS_ADMIN)) >>> return -EPERM; >>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>> spin_unlock(&swap_lock); >>> goto out_dput; >>> } >>> - if (!security_vm_enough_memory_mm(current->mm, p->pages)) >>> - vm_unacct_memory(p->pages); >>> + >>> + total_swap_pages -= p->pages; >>> + inuse_pages = READ_ONCE(p->inuse_pages); >>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) >>> + vm_unacct_memory(inuse_pages); >>> else { >>> + total_swap_pages += p->pages; >> >> That implies that whenever we fail in security_vm_enough_memory_mm(), >> that other concurrent users might see a wrong total_swap_pages. >> >> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. >> >> Temporarily, we'd have >> >> CommitLimit 4 GiB >> Committed_AS 10 GiB > > IIUC, even if without this change, the other concurrent users if come after vm_acct_memory() > is done in __vm_enough_memory(), they might see > > CommitLimit 12 GiB (4 GiB memory + 8GiB total swap) > Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff) > > Or am I miss something? > I think you are right! Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-18 7:10 ` David Hildenbrand @ 2022-06-18 7:31 ` Miaohe Lin 0 siblings, 0 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-18 7:31 UTC (permalink / raw) To: David Hildenbrand, akpm; +Cc: linux-mm, linux-kernel On 2022/6/18 15:10, David Hildenbrand wrote: > On 18.06.22 04:43, Miaohe Lin wrote: >> On 2022/6/17 15:33, David Hildenbrand wrote: >>> On 08.06.22 16:40, Miaohe Lin wrote: >>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>> available memory while swapoff tries to make sure there's enough memory >>>> that can hold the swapped out memory. But total_swap_pages contains the >>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>> success even if there's no memory to hold the swapped out memory because >>> >>> s/success/succeed/ >> >> OK. Thanks. >> >>> >>>> total_swap_pages always greater than or equal to p->pages. >>>> >>>> In order to fix it, p->pages should be retracted from total_swap_pages >>> >>> s/retracted/subtracted/ >> >> OK. Thanks. >> >>> >>>> first and then check whether there's enough memory for inuse swap pages. >>>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> --- >>>> mm/swapfile.c | 10 +++++++--- >>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>>> index ec4c1b276691..d2bead7b8b70 100644 >>>> --- a/mm/swapfile.c >>>> +++ b/mm/swapfile.c >>>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>>> struct filename *pathname; >>>> int err, found = 0; >>>> unsigned int old_block_size; >>>> + unsigned int inuse_pages; >>>> >>>> if (!capable(CAP_SYS_ADMIN)) >>>> return -EPERM; >>>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>>> spin_unlock(&swap_lock); >>>> goto out_dput; >>>> } >>>> - if (!security_vm_enough_memory_mm(current->mm, p->pages)) >>>> - vm_unacct_memory(p->pages); >>>> + >>>> + total_swap_pages -= p->pages; >>>> + inuse_pages = READ_ONCE(p->inuse_pages); >>>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) >>>> + vm_unacct_memory(inuse_pages); >>>> else { >>>> + total_swap_pages += p->pages; >>> >>> That implies that whenever we fail in security_vm_enough_memory_mm(), >>> that other concurrent users might see a wrong total_swap_pages. >>> >>> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. >>> >>> Temporarily, we'd have >>> >>> CommitLimit 4 GiB >>> Committed_AS 10 GiB >> >> IIUC, even if without this change, the other concurrent users if come after vm_acct_memory() >> is done in __vm_enough_memory(), they might see >> >> CommitLimit 12 GiB (4 GiB memory + 8GiB total swap) >> Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff) >> >> Or am I miss something? >> > > I think you are right! > > Reviewed-by: David Hildenbrand <david@redhat.com> Thanks a lot! > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-08 14:40 ` [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected Miaohe Lin 2022-06-17 7:33 ` David Hildenbrand @ 2022-06-20 7:31 ` Huang, Ying 2022-06-20 12:12 ` Miaohe Lin 1 sibling, 1 reply; 31+ messages in thread From: Huang, Ying @ 2022-06-20 7:31 UTC (permalink / raw) To: Miaohe Lin; +Cc: akpm, david, linux-mm, linux-kernel Miaohe Lin <linmiaohe@huawei.com> writes: > security_vm_enough_memory_mm() checks whether a process has enough memory > to allocate a new virtual mapping. And total_swap_pages is considered as > available memory while swapoff tries to make sure there's enough memory > that can hold the swapped out memory. But total_swap_pages contains the > swap space that is being swapoff. So security_vm_enough_memory_mm() will > success even if there's no memory to hold the swapped out memory because > total_swap_pages always greater than or equal to p->pages. Per my understanding, swapoff will not allocate virtual mapping by itself. But after swapoff, the overcommit limit could be exceeded. security_vm_enough_memory_mm() is used to check that. For example, in a system with 4GB memory and 8GB swap, and 10GB is in use, CommitLimit: 4+8 = 12GB Committed_AS: 10GB security_vm_enough_memory_mm() in swapoff() will fail because 10+8 = 18 > 12. This is expected because after swapoff, the overcommit limit will be exceeded. If 3GB is in use, CommitLimit: 4+8 = 12GB Committed_AS: 3GB security_vm_enough_memory_mm() in swapoff() will succeed because 3+8 = 11 < 12. This is expected because after swapoff, the overcommit limit will not be exceeded. So, what's the real problem of the original implementation? Can you show it with an example as above? Best Regards, Huang, Ying > In order to fix it, p->pages should be retracted from total_swap_pages > first and then check whether there's enough memory for inuse swap pages. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [snip] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-20 7:31 ` Huang, Ying @ 2022-06-20 12:12 ` Miaohe Lin 2022-06-21 1:35 ` Huang, Ying 0 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-20 12:12 UTC (permalink / raw) To: Huang, Ying; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/20 15:31, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> security_vm_enough_memory_mm() checks whether a process has enough memory >> to allocate a new virtual mapping. And total_swap_pages is considered as >> available memory while swapoff tries to make sure there's enough memory >> that can hold the swapped out memory. But total_swap_pages contains the >> swap space that is being swapoff. So security_vm_enough_memory_mm() will >> success even if there's no memory to hold the swapped out memory because >> total_swap_pages always greater than or equal to p->pages. > > Per my understanding, swapoff will not allocate virtual mapping by > itself. But after swapoff, the overcommit limit could be exceeded. > security_vm_enough_memory_mm() is used to check that. For example, in a > system with 4GB memory and 8GB swap, and 10GB is in use, > > CommitLimit: 4+8 = 12GB > Committed_AS: 10GB > > security_vm_enough_memory_mm() in swapoff() will fail because > 10+8 = 18 > 12. This is expected because after swapoff, the overcommit > limit will be exceeded. > > If 3GB is in use, > > CommitLimit: 4+8 = 12GB > Committed_AS: 3GB > > security_vm_enough_memory_mm() in swapoff() will succeed because > 3+8 = 11 < 12. This is expected because after swapoff, the overcommit > limit will not be exceeded. In OVERCOMMIT_NEVER scene, I think you're right. > > So, what's the real problem of the original implementation? Can you > show it with an example as above? In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the below case. if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { if (pages > totalram_pages() + total_swap_pages) goto error; return 0; } Or am I miss something? > > Best Regards, > Huang, Ying Thanks! > >> In order to fix it, p->pages should be retracted from total_swap_pages >> first and then check whether there's enough memory for inuse swap pages. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > > [snip] > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-20 12:12 ` Miaohe Lin @ 2022-06-21 1:35 ` Huang, Ying 2022-06-21 7:37 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: Huang, Ying @ 2022-06-21 1:35 UTC (permalink / raw) To: Miaohe Lin; +Cc: akpm, david, linux-mm, linux-kernel Miaohe Lin <linmiaohe@huawei.com> writes: > On 2022/6/20 15:31, Huang, Ying wrote: >> Miaohe Lin <linmiaohe@huawei.com> writes: >> >>> security_vm_enough_memory_mm() checks whether a process has enough memory >>> to allocate a new virtual mapping. And total_swap_pages is considered as >>> available memory while swapoff tries to make sure there's enough memory >>> that can hold the swapped out memory. But total_swap_pages contains the >>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>> success even if there's no memory to hold the swapped out memory because >>> total_swap_pages always greater than or equal to p->pages. >> >> Per my understanding, swapoff will not allocate virtual mapping by >> itself. But after swapoff, the overcommit limit could be exceeded. >> security_vm_enough_memory_mm() is used to check that. For example, in a >> system with 4GB memory and 8GB swap, and 10GB is in use, >> >> CommitLimit: 4+8 = 12GB >> Committed_AS: 10GB >> >> security_vm_enough_memory_mm() in swapoff() will fail because >> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >> limit will be exceeded. >> >> If 3GB is in use, >> >> CommitLimit: 4+8 = 12GB >> Committed_AS: 3GB >> >> security_vm_enough_memory_mm() in swapoff() will succeed because >> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >> limit will not be exceeded. > > In OVERCOMMIT_NEVER scene, I think you're right. > >> >> So, what's the real problem of the original implementation? Can you >> show it with an example as above? > > In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, > pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed > instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the > below case. > > if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { > if (pages > totalram_pages() + total_swap_pages) > goto error; > return 0; > } > > Or am I miss something? Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages isn't checked at all. The only restriction is that the size of the virtual mapping created should be less than total RAM + total swap pages. Because swapoff() will not create virtual mapping, so it's expected that security_vm_enough_memory_mm() in swapoff() always succeeds. Best Regards, Huang, Ying > > Thanks! > >> >>> In order to fix it, p->pages should be retracted from total_swap_pages >>> first and then check whether there's enough memory for inuse swap pages. >>> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> >> [snip] >> >> . >> ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-21 1:35 ` Huang, Ying @ 2022-06-21 7:37 ` Miaohe Lin 2022-06-21 7:42 ` Huang, Ying 0 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-21 7:37 UTC (permalink / raw) To: Huang, Ying; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/21 9:35, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> On 2022/6/20 15:31, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>> available memory while swapoff tries to make sure there's enough memory >>>> that can hold the swapped out memory. But total_swap_pages contains the >>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>> success even if there's no memory to hold the swapped out memory because >>>> total_swap_pages always greater than or equal to p->pages. >>> >>> Per my understanding, swapoff will not allocate virtual mapping by >>> itself. But after swapoff, the overcommit limit could be exceeded. >>> security_vm_enough_memory_mm() is used to check that. For example, in a >>> system with 4GB memory and 8GB swap, and 10GB is in use, >>> >>> CommitLimit: 4+8 = 12GB >>> Committed_AS: 10GB >>> >>> security_vm_enough_memory_mm() in swapoff() will fail because >>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >>> limit will be exceeded. >>> >>> If 3GB is in use, >>> >>> CommitLimit: 4+8 = 12GB >>> Committed_AS: 3GB >>> >>> security_vm_enough_memory_mm() in swapoff() will succeed because >>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >>> limit will not be exceeded. >> >> In OVERCOMMIT_NEVER scene, I think you're right. >> >>> >>> So, what's the real problem of the original implementation? Can you >>> show it with an example as above? >> >> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, >> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed >> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the >> below case. >> >> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { >> if (pages > totalram_pages() + total_swap_pages) >> goto error; >> return 0; >> } >> >> Or am I miss something? > > Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages > isn't checked at all. The only restriction is that the size of the > virtual mapping created should be less than total RAM + total swap Do you mean the only restriction is that the size of the virtual mapping *created every time* should be less than total RAM + total swap pages but *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so, the current behavior should be sane and I will drop this patch. Thanks! > pages. Because swapoff() will not create virtual mapping, so it's > expected that security_vm_enough_memory_mm() in swapoff() always > succeeds. > > Best Regards, > Huang, Ying > >> >> Thanks! >> >>> >>>> In order to fix it, p->pages should be retracted from total_swap_pages >>>> first and then check whether there's enough memory for inuse swap pages. >>>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> >>> [snip] >>> >>> . >>> > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-21 7:37 ` Miaohe Lin @ 2022-06-21 7:42 ` Huang, Ying 2022-06-21 8:20 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: Huang, Ying @ 2022-06-21 7:42 UTC (permalink / raw) To: Miaohe Lin; +Cc: akpm, david, linux-mm, linux-kernel Miaohe Lin <linmiaohe@huawei.com> writes: > On 2022/6/21 9:35, Huang, Ying wrote: >> Miaohe Lin <linmiaohe@huawei.com> writes: >> >>> On 2022/6/20 15:31, Huang, Ying wrote: >>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>> >>>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>>> available memory while swapoff tries to make sure there's enough memory >>>>> that can hold the swapped out memory. But total_swap_pages contains the >>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>>> success even if there's no memory to hold the swapped out memory because >>>>> total_swap_pages always greater than or equal to p->pages. >>>> >>>> Per my understanding, swapoff will not allocate virtual mapping by >>>> itself. But after swapoff, the overcommit limit could be exceeded. >>>> security_vm_enough_memory_mm() is used to check that. For example, in a >>>> system with 4GB memory and 8GB swap, and 10GB is in use, >>>> >>>> CommitLimit: 4+8 = 12GB >>>> Committed_AS: 10GB >>>> >>>> security_vm_enough_memory_mm() in swapoff() will fail because >>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >>>> limit will be exceeded. >>>> >>>> If 3GB is in use, >>>> >>>> CommitLimit: 4+8 = 12GB >>>> Committed_AS: 3GB >>>> >>>> security_vm_enough_memory_mm() in swapoff() will succeed because >>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >>>> limit will not be exceeded. >>> >>> In OVERCOMMIT_NEVER scene, I think you're right. >>> >>>> >>>> So, what's the real problem of the original implementation? Can you >>>> show it with an example as above? >>> >>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, >>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed >>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the >>> below case. >>> >>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { >>> if (pages > totalram_pages() + total_swap_pages) >>> goto error; >>> return 0; >>> } >>> >>> Or am I miss something? >> >> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages >> isn't checked at all. The only restriction is that the size of the >> virtual mapping created should be less than total RAM + total swap > > Do you mean the only restriction is that the size of the virtual mapping > *created every time* should be less than total RAM + total swap pages but > *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so, > the current behavior should be sane and I will drop this patch. Yes. This is my understanding. Best Regards, Huang, Ying > Thanks! > >> pages. Because swapoff() will not create virtual mapping, so it's >> expected that security_vm_enough_memory_mm() in swapoff() always >> succeeds. >> >> Best Regards, >> Huang, Ying >> >>> >>> Thanks! >>> >>>> >>>>> In order to fix it, p->pages should be retracted from total_swap_pages >>>>> first and then check whether there's enough memory for inuse swap pages. >>>>> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> >>>> [snip] >>>> >>>> . >>>> >> >> . >> ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected 2022-06-21 7:42 ` Huang, Ying @ 2022-06-21 8:20 ` Miaohe Lin 0 siblings, 0 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-21 8:20 UTC (permalink / raw) To: Huang, Ying; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/21 15:42, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> On 2022/6/21 9:35, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> On 2022/6/20 15:31, Huang, Ying wrote: >>>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>>> >>>>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>>>> available memory while swapoff tries to make sure there's enough memory >>>>>> that can hold the swapped out memory. But total_swap_pages contains the >>>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>>>> success even if there's no memory to hold the swapped out memory because >>>>>> total_swap_pages always greater than or equal to p->pages. >>>>> >>>>> Per my understanding, swapoff will not allocate virtual mapping by >>>>> itself. But after swapoff, the overcommit limit could be exceeded. >>>>> security_vm_enough_memory_mm() is used to check that. For example, in a >>>>> system with 4GB memory and 8GB swap, and 10GB is in use, >>>>> >>>>> CommitLimit: 4+8 = 12GB >>>>> Committed_AS: 10GB >>>>> >>>>> security_vm_enough_memory_mm() in swapoff() will fail because >>>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >>>>> limit will be exceeded. >>>>> >>>>> If 3GB is in use, >>>>> >>>>> CommitLimit: 4+8 = 12GB >>>>> Committed_AS: 3GB >>>>> >>>>> security_vm_enough_memory_mm() in swapoff() will succeed because >>>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >>>>> limit will not be exceeded. >>>> >>>> In OVERCOMMIT_NEVER scene, I think you're right. >>>> >>>>> >>>>> So, what's the real problem of the original implementation? Can you >>>>> show it with an example as above? >>>> >>>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, >>>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed >>>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the >>>> below case. >>>> >>>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { >>>> if (pages > totalram_pages() + total_swap_pages) >>>> goto error; >>>> return 0; >>>> } >>>> >>>> Or am I miss something? >>> >>> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages >>> isn't checked at all. The only restriction is that the size of the >>> virtual mapping created should be less than total RAM + total swap >> >> Do you mean the only restriction is that the size of the virtual mapping >> *created every time* should be less than total RAM + total swap pages but >> *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so, >> the current behavior should be sane and I will drop this patch. > > Yes. This is my understanding. I see. Thank you. > > Best Regards, > Huang, Ying > >> Thanks! >> >>> pages. Because swapoff() will not create virtual mapping, so it's >>> expected that security_vm_enough_memory_mm() in swapoff() always >>> succeeds. >>> >>> Best Regards, >>> Huang, Ying >>> >>>> >>>> Thanks! >>>> >>>>> >>>>>> In order to fix it, p->pages should be retracted from total_swap_pages >>>>>> first and then check whether there's enough memory for inuse swap pages. >>>>>> >>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>>> >>>>> [snip] >>>>> >>>>> . >>>>> >>> >>> . >>> > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-08 14:40 [PATCH v2 0/3] A few cleanup and fixup patches for swap Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected Miaohe Lin @ 2022-06-08 14:40 ` Miaohe Lin 2022-06-20 7:54 ` Huang, Ying 2022-06-08 14:40 ` [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics Miaohe Lin 2022-06-17 2:37 ` [PATCH v2 0/3] A few cleanup and fixup patches for swap Andrew Morton 3 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-08 14:40 UTC (permalink / raw) To: akpm; +Cc: david, linux-mm, linux-kernel, linmiaohe si->inuse_pages could still be accessed concurrently now. The plain reads outside si->lock critical section, i.e. swap_show and si_swapinfo, which results in data races. But these should be ok because they're just used for showing swap info. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> --- mm/swapfile.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index d2bead7b8b70..3fa26f6971e9 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) } bytes = si->pages << (PAGE_SHIFT - 10); - inuse = si->inuse_pages << (PAGE_SHIFT - 10); + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); file = si->swap_file; len = seq_file_path(swap, file, " \t\n\\"); @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) struct swap_info_struct *si = swap_info[type]; if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) - nr_to_be_unused += si->inuse_pages; + nr_to_be_unused += READ_ONCE(si->inuse_pages); } val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; val->totalswap = total_swap_pages + nr_to_be_unused; -- 2.23.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-08 14:40 ` [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages Miaohe Lin @ 2022-06-20 7:54 ` Huang, Ying 2022-06-20 9:04 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: Huang, Ying @ 2022-06-20 7:54 UTC (permalink / raw) To: Miaohe Lin; +Cc: akpm, david, linux-mm, linux-kernel Miaohe Lin <linmiaohe@huawei.com> writes: > si->inuse_pages could still be accessed concurrently now. The plain reads > outside si->lock critical section, i.e. swap_show and si_swapinfo, which > results in data races. But these should be ok because they're just used > for showing swap info. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > Reviewed-by: David Hildenbrand <david@redhat.com> > --- > mm/swapfile.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index d2bead7b8b70..3fa26f6971e9 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) > } > > bytes = si->pages << (PAGE_SHIFT - 10); > - inuse = si->inuse_pages << (PAGE_SHIFT - 10); > + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); > > file = si->swap_file; > len = seq_file_path(swap, file, " \t\n\\"); > @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) > struct swap_info_struct *si = swap_info[type]; > > if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) > - nr_to_be_unused += si->inuse_pages; > + nr_to_be_unused += READ_ONCE(si->inuse_pages); > } > val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; > val->totalswap = total_swap_pages + nr_to_be_unused; READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer side too? Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 7:54 ` Huang, Ying @ 2022-06-20 9:04 ` Miaohe Lin 2022-06-20 9:23 ` Muchun Song 0 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-20 9:04 UTC (permalink / raw) To: Huang, Ying; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/20 15:54, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> si->inuse_pages could still be accessed concurrently now. The plain reads >> outside si->lock critical section, i.e. swap_show and si_swapinfo, which >> results in data races. But these should be ok because they're just used >> for showing swap info. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> Reviewed-by: David Hildenbrand <david@redhat.com> >> --- >> mm/swapfile.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index d2bead7b8b70..3fa26f6971e9 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) >> } >> >> bytes = si->pages << (PAGE_SHIFT - 10); >> - inuse = si->inuse_pages << (PAGE_SHIFT - 10); >> + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); >> >> file = si->swap_file; >> len = seq_file_path(swap, file, " \t\n\\"); >> @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) >> struct swap_info_struct *si = swap_info[type]; >> >> if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) >> - nr_to_be_unused += si->inuse_pages; >> + nr_to_be_unused += READ_ONCE(si->inuse_pages); >> } >> val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; >> val->totalswap = total_swap_pages + nr_to_be_unused; > > READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer > side too? READ_ONCE() is used to fix the complaint of concurrent accessing to si->inuse_pages from KCSAN here. The similar commit is 218209487c3d ("mm/swapfile: fix data races in try_to_unuse()"). IMHO, it's fine to see a not-uptodate value of si->inuse_pages because it's just used for showing swap info. So WRITE_ONCE() is not obligatory. Or am I miss something? > > Best Regards, > Huang, Ying Thanks! > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 9:04 ` Miaohe Lin @ 2022-06-20 9:23 ` Muchun Song 2022-06-20 12:23 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: Muchun Song @ 2022-06-20 9:23 UTC (permalink / raw) To: Miaohe Lin; +Cc: Huang, Ying, akpm, david, linux-mm, linux-kernel On Mon, Jun 20, 2022 at 05:04:50PM +0800, Miaohe Lin wrote: > On 2022/6/20 15:54, Huang, Ying wrote: > > Miaohe Lin <linmiaohe@huawei.com> writes: > > > >> si->inuse_pages could still be accessed concurrently now. The plain reads > >> outside si->lock critical section, i.e. swap_show and si_swapinfo, which > >> results in data races. But these should be ok because they're just used > >> for showing swap info. > >> > >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > >> Reviewed-by: David Hildenbrand <david@redhat.com> > >> --- > >> mm/swapfile.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/mm/swapfile.c b/mm/swapfile.c > >> index d2bead7b8b70..3fa26f6971e9 100644 > >> --- a/mm/swapfile.c > >> +++ b/mm/swapfile.c > >> @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) > >> } > >> > >> bytes = si->pages << (PAGE_SHIFT - 10); > >> - inuse = si->inuse_pages << (PAGE_SHIFT - 10); > >> + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); > >> > >> file = si->swap_file; > >> len = seq_file_path(swap, file, " \t\n\\"); > >> @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) > >> struct swap_info_struct *si = swap_info[type]; > >> > >> if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) > >> - nr_to_be_unused += si->inuse_pages; > >> + nr_to_be_unused += READ_ONCE(si->inuse_pages); > >> } > >> val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; > >> val->totalswap = total_swap_pages + nr_to_be_unused; > > > > READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer > > side too? > > READ_ONCE() is used to fix the complaint of concurrent accessing to si->inuse_pages from KCSAN here. > The similar commit is 218209487c3d ("mm/swapfile: fix data races in try_to_unuse()"). IMHO, it's fine I think the fix 218209487c3d is incomplete. The write side in swap_range_free() should also be fixed. Otherwise, IIUC, it cannot stop KCSAN complaining. > to see a not-uptodate value of si->inuse_pages because it's just used for showing swap info. So > WRITE_ONCE() is not obligatory. Or am I miss something? > > > > > Best Regards, > > Huang, Ying > > Thanks! > > > . > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 9:23 ` Muchun Song @ 2022-06-20 12:23 ` Miaohe Lin 2022-06-20 12:32 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-20 12:23 UTC (permalink / raw) To: Muchun Song, Huang, Ying, cai; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/20 17:23, Muchun Song wrote: > On Mon, Jun 20, 2022 at 05:04:50PM +0800, Miaohe Lin wrote: >> On 2022/6/20 15:54, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> si->inuse_pages could still be accessed concurrently now. The plain reads >>>> outside si->lock critical section, i.e. swap_show and si_swapinfo, which >>>> results in data races. But these should be ok because they're just used >>>> for showing swap info. >>>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> Reviewed-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> mm/swapfile.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>>> index d2bead7b8b70..3fa26f6971e9 100644 >>>> --- a/mm/swapfile.c >>>> +++ b/mm/swapfile.c >>>> @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) >>>> } >>>> >>>> bytes = si->pages << (PAGE_SHIFT - 10); >>>> - inuse = si->inuse_pages << (PAGE_SHIFT - 10); >>>> + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); >>>> >>>> file = si->swap_file; >>>> len = seq_file_path(swap, file, " \t\n\\"); >>>> @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) >>>> struct swap_info_struct *si = swap_info[type]; >>>> >>>> if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) >>>> - nr_to_be_unused += si->inuse_pages; >>>> + nr_to_be_unused += READ_ONCE(si->inuse_pages); >>>> } >>>> val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; >>>> val->totalswap = total_swap_pages + nr_to_be_unused; >>> >>> READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer >>> side too? >> >> READ_ONCE() is used to fix the complaint of concurrent accessing to si->inuse_pages from KCSAN here. >> The similar commit is 218209487c3d ("mm/swapfile: fix data races in try_to_unuse()"). IMHO, it's fine > > I think the fix 218209487c3d is incomplete. The write side in swap_range_free() should > also be fixed. Otherwise, IIUC, it cannot stop KCSAN complaining. I tend to agree with you. READ_ONCE() should be paired with WRITE_ONCE() theoretically. But WRITTE_ONCE() is ignored while the commit is introduced. Add Qian Cai for helping verify it. It's very kind of @Qian Cai if he could tell us whether WRITTE_ONCE() is ignored deliberately. Thanks all of you. :) > >> to see a not-uptodate value of si->inuse_pages because it's just used for showing swap info. So >> WRITE_ONCE() is not obligatory. Or am I miss something? >> >>> >>> Best Regards, >>> Huang, Ying >> >> Thanks! >> >>> . >>> >> >> > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 12:23 ` Miaohe Lin @ 2022-06-20 12:32 ` Miaohe Lin 2022-06-20 13:46 ` Qian Cai 0 siblings, 1 reply; 31+ messages in thread From: Miaohe Lin @ 2022-06-20 12:32 UTC (permalink / raw) To: Muchun Song, Huang, Ying, Qian Cai; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/20 20:23, Miaohe Lin wrote: > On 2022/6/20 17:23, Muchun Song wrote: >> On Mon, Jun 20, 2022 at 05:04:50PM +0800, Miaohe Lin wrote: >>> On 2022/6/20 15:54, Huang, Ying wrote: >>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>> >>>>> si->inuse_pages could still be accessed concurrently now. The plain reads >>>>> outside si->lock critical section, i.e. swap_show and si_swapinfo, which >>>>> results in data races. But these should be ok because they're just used >>>>> for showing swap info. >>>>> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>>> Reviewed-by: David Hildenbrand <david@redhat.com> >>>>> --- >>>>> mm/swapfile.c | 4 ++-- >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>>>> index d2bead7b8b70..3fa26f6971e9 100644 >>>>> --- a/mm/swapfile.c >>>>> +++ b/mm/swapfile.c >>>>> @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) >>>>> } >>>>> >>>>> bytes = si->pages << (PAGE_SHIFT - 10); >>>>> - inuse = si->inuse_pages << (PAGE_SHIFT - 10); >>>>> + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); >>>>> >>>>> file = si->swap_file; >>>>> len = seq_file_path(swap, file, " \t\n\\"); >>>>> @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) >>>>> struct swap_info_struct *si = swap_info[type]; >>>>> >>>>> if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) >>>>> - nr_to_be_unused += si->inuse_pages; >>>>> + nr_to_be_unused += READ_ONCE(si->inuse_pages); >>>>> } >>>>> val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; >>>>> val->totalswap = total_swap_pages + nr_to_be_unused; >>>> >>>> READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer >>>> side too? >>> >>> READ_ONCE() is used to fix the complaint of concurrent accessing to si->inuse_pages from KCSAN here. >>> The similar commit is 218209487c3d ("mm/swapfile: fix data races in try_to_unuse()"). IMHO, it's fine >> >> I think the fix 218209487c3d is incomplete. The write side in swap_range_free() should >> also be fixed. Otherwise, IIUC, it cannot stop KCSAN complaining. > > I tend to agree with you. READ_ONCE() should be paired with WRITE_ONCE() theoretically. But WRITTE_ONCE() > is ignored while the commit is introduced. Add Qian Cai for helping verify it. It's very kind of @Qian Cai > if he could tell us whether WRITTE_ONCE() is ignored deliberately. Update the email address of Qian Cai. > > Thanks all of you. :) > >> >>> to see a not-uptodate value of si->inuse_pages because it's just used for showing swap info. So >>> WRITE_ONCE() is not obligatory. Or am I miss something? >>> >>>> >>>> Best Regards, >>>> Huang, Ying >>> >>> Thanks! >>> >>>> . >>>> >>> >>> >> . >> > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 12:32 ` Miaohe Lin @ 2022-06-20 13:46 ` Qian Cai 2022-06-20 14:20 ` Muchun Song 0 siblings, 1 reply; 31+ messages in thread From: Qian Cai @ 2022-06-20 13:46 UTC (permalink / raw) To: Miaohe Lin; +Cc: Muchun Song, Huang, Ying, akpm, david, linux-mm, linux-kernel On Mon, Jun 20, 2022 at 08:32:27PM +0800, Miaohe Lin wrote: > >>>>> --- a/mm/swapfile.c > >>>>> +++ b/mm/swapfile.c > >>>>> @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) > >>>>> } > >>>>> > >>>>> bytes = si->pages << (PAGE_SHIFT - 10); > >>>>> - inuse = si->inuse_pages << (PAGE_SHIFT - 10); > >>>>> + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); > >>>>> > >>>>> file = si->swap_file; > >>>>> len = seq_file_path(swap, file, " \t\n\\"); > >>>>> @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) > >>>>> struct swap_info_struct *si = swap_info[type]; > >>>>> > >>>>> if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) > >>>>> - nr_to_be_unused += si->inuse_pages; > >>>>> + nr_to_be_unused += READ_ONCE(si->inuse_pages); > >>>>> } > >>>>> val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; > >>>>> val->totalswap = total_swap_pages + nr_to_be_unused; > >>>> > >>>> READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer > >>>> side too? > >>> > >>> READ_ONCE() is used to fix the complaint of concurrent accessing to si->inuse_pages from KCSAN here. > >>> The similar commit is 218209487c3d ("mm/swapfile: fix data races in try_to_unuse()"). IMHO, it's fine > >> > >> I think the fix 218209487c3d is incomplete. The write side in swap_range_free() should > >> also be fixed. Otherwise, IIUC, it cannot stop KCSAN complaining. > > > > I tend to agree with you. READ_ONCE() should be paired with WRITE_ONCE() theoretically. But WRITTE_ONCE() > > is ignored while the commit is introduced. Add Qian Cai for helping verify it. It's very kind of @Qian Cai > > if he could tell us whether WRITTE_ONCE() is ignored deliberately. The write side should be protected by the lock swap_info_struct::lock. Is that not the case here? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 13:46 ` Qian Cai @ 2022-06-20 14:20 ` Muchun Song 2022-06-20 21:36 ` Qian Cai 0 siblings, 1 reply; 31+ messages in thread From: Muchun Song @ 2022-06-20 14:20 UTC (permalink / raw) To: Qian Cai; +Cc: Miaohe Lin, Huang, Ying, akpm, david, linux-mm, linux-kernel On Mon, Jun 20, 2022 at 09:46:47AM -0400, Qian Cai wrote: > On Mon, Jun 20, 2022 at 08:32:27PM +0800, Miaohe Lin wrote: > > >>>>> --- a/mm/swapfile.c > > >>>>> +++ b/mm/swapfile.c > > >>>>> @@ -2646,7 +2646,7 @@ static int swap_show(struct seq_file *swap, void *v) > > >>>>> } > > >>>>> > > >>>>> bytes = si->pages << (PAGE_SHIFT - 10); > > >>>>> - inuse = si->inuse_pages << (PAGE_SHIFT - 10); > > >>>>> + inuse = READ_ONCE(si->inuse_pages) << (PAGE_SHIFT - 10); > > >>>>> > > >>>>> file = si->swap_file; > > >>>>> len = seq_file_path(swap, file, " \t\n\\"); > > >>>>> @@ -3265,7 +3265,7 @@ void si_swapinfo(struct sysinfo *val) > > >>>>> struct swap_info_struct *si = swap_info[type]; > > >>>>> > > >>>>> if ((si->flags & SWP_USED) && !(si->flags & SWP_WRITEOK)) > > >>>>> - nr_to_be_unused += si->inuse_pages; > > >>>>> + nr_to_be_unused += READ_ONCE(si->inuse_pages); > > >>>>> } > > >>>>> val->freeswap = atomic_long_read(&nr_swap_pages) + nr_to_be_unused; > > >>>>> val->totalswap = total_swap_pages + nr_to_be_unused; > > >>>> > > >>>> READ_ONCE() should be paired with WRITE_ONCE(). So, change the writer > > >>>> side too? > > >>> > > >>> READ_ONCE() is used to fix the complaint of concurrent accessing to si->inuse_pages from KCSAN here. > > >>> The similar commit is 218209487c3d ("mm/swapfile: fix data races in try_to_unuse()"). IMHO, it's fine > > >> > > >> I think the fix 218209487c3d is incomplete. The write side in swap_range_free() should > > >> also be fixed. Otherwise, IIUC, it cannot stop KCSAN complaining. > > > > > > I tend to agree with you. READ_ONCE() should be paired with WRITE_ONCE() theoretically. But WRITTE_ONCE() > > > is ignored while the commit is introduced. Add Qian Cai for helping verify it. It's very kind of @Qian Cai > > > if he could tell us whether WRITTE_ONCE() is ignored deliberately. > > The write side should be protected by the lock swap_info_struct::lock. Is > that not the case here? > The lock does not protect the read sides. So the write side should be fixed by WRITTE_ONCE(). Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 14:20 ` Muchun Song @ 2022-06-20 21:36 ` Qian Cai 2022-06-21 1:14 ` Huang, Ying 0 siblings, 1 reply; 31+ messages in thread From: Qian Cai @ 2022-06-20 21:36 UTC (permalink / raw) To: Muchun Song; +Cc: Miaohe Lin, Huang, Ying, akpm, david, linux-mm, linux-kernel On Mon, Jun 20, 2022 at 10:20:07PM +0800, Muchun Song wrote: > The lock does not protect the read sides. So the write side should be > fixed by WRITTE_ONCE(). https://lwn.net/Articles/816854/ "Unmarked writes (aligned and up to word size) can be treated as if they had used WRITE_ONCE() by building with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y (also selected by default). Experience has shown that compilers are much less likely to destructively optimize in-kernel writes than reads. Some developers might therefore choose to use READ_ONCE() but omit the corresponding WRITE_ONCE(). Other developers might prefer the documentation benefits and long-term peace of mind accruing from explicit use of WRITE_ONCE()..." ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-20 21:36 ` Qian Cai @ 2022-06-21 1:14 ` Huang, Ying 2022-06-21 3:39 ` Muchun Song 0 siblings, 1 reply; 31+ messages in thread From: Huang, Ying @ 2022-06-21 1:14 UTC (permalink / raw) To: Qian Cai, Miaohe Lin; +Cc: Muchun Song, akpm, david, linux-mm, linux-kernel Qian Cai <quic_qiancai@quicinc.com> writes: > On Mon, Jun 20, 2022 at 10:20:07PM +0800, Muchun Song wrote: >> The lock does not protect the read sides. So the write side should be >> fixed by WRITTE_ONCE(). > > https://lwn.net/Articles/816854/ > > "Unmarked writes (aligned and up to word size) can be treated as if they had > used WRITE_ONCE() by building with > CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y (also selected by default). > Experience has shown that compilers are much less likely to destructively > optimize in-kernel writes than reads. Some developers might therefore > choose to use READ_ONCE() but omit the corresponding WRITE_ONCE(). Other > developers might prefer the documentation benefits and long-term peace of > mind accruing from explicit use of WRITE_ONCE()..." Thanks for pointing me to this great article. So although not required by KCSAN strictly, WRITE_ONCE() is still good for documentation, etc. Just like we have done for swap_info_struct->highest_bit, etc. Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-21 1:14 ` Huang, Ying @ 2022-06-21 3:39 ` Muchun Song 2022-06-21 6:40 ` Miaohe Lin 0 siblings, 1 reply; 31+ messages in thread From: Muchun Song @ 2022-06-21 3:39 UTC (permalink / raw) To: Huang, Ying, Qian Cai; +Cc: Miaohe Lin, akpm, david, linux-mm, linux-kernel On Tue, Jun 21, 2022 at 09:14:00AM +0800, Huang, Ying wrote: > Qian Cai <quic_qiancai@quicinc.com> writes: > > > On Mon, Jun 20, 2022 at 10:20:07PM +0800, Muchun Song wrote: > >> The lock does not protect the read sides. So the write side should be > >> fixed by WRITTE_ONCE(). > > > > https://lwn.net/Articles/816854/ > > > > "Unmarked writes (aligned and up to word size) can be treated as if they had > > used WRITE_ONCE() by building with > > CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y (also selected by default). All right, CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC help us avoid KCSAN complaining. > > Experience has shown that compilers are much less likely to destructively > > optimize in-kernel writes than reads. Some developers might therefore > > choose to use READ_ONCE() but omit the corresponding WRITE_ONCE(). Other > > developers might prefer the documentation benefits and long-term peace of > > mind accruing from explicit use of WRITE_ONCE()..." > > Thanks for pointing me to this great article. So although not required > by KCSAN strictly, WRITE_ONCE() is still good for documentation, etc. > Just like we have done for swap_info_struct->highest_bit, etc. > +1 > Best Regards, > Huang, Ying > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages 2022-06-21 3:39 ` Muchun Song @ 2022-06-21 6:40 ` Miaohe Lin 0 siblings, 0 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-21 6:40 UTC (permalink / raw) To: Muchun Song, Huang, Ying, Qian Cai; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/21 11:39, Muchun Song wrote: > On Tue, Jun 21, 2022 at 09:14:00AM +0800, Huang, Ying wrote: >> Qian Cai <quic_qiancai@quicinc.com> writes: >> >>> On Mon, Jun 20, 2022 at 10:20:07PM +0800, Muchun Song wrote: >>>> The lock does not protect the read sides. So the write side should be >>>> fixed by WRITTE_ONCE(). >>> >>> https://lwn.net/Articles/816854/ >>> >>> "Unmarked writes (aligned and up to word size) can be treated as if they had >>> used WRITE_ONCE() by building with >>> CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y (also selected by default). > > All right, CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC help us avoid KCSAN > complaining. > >>> Experience has shown that compilers are much less likely to destructively >>> optimize in-kernel writes than reads. Some developers might therefore >>> choose to use READ_ONCE() but omit the corresponding WRITE_ONCE(). Other >>> developers might prefer the documentation benefits and long-term peace of >>> mind accruing from explicit use of WRITE_ONCE()..." >> >> Thanks for pointing me to this great article. So although not required >> by KCSAN strictly, WRITE_ONCE() is still good for documentation, etc. >> Just like we have done for swap_info_struct->highest_bit, etc. >> > > +1 I tend to agree with Muchun & Huang, Ying. Thanks all of you. > >> Best Regards, >> Huang, Ying >> > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics 2022-06-08 14:40 [PATCH v2 0/3] A few cleanup and fixup patches for swap Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages Miaohe Lin @ 2022-06-08 14:40 ` Miaohe Lin 2022-06-08 15:16 ` David Hildenbrand ` (2 more replies) 2022-06-17 2:37 ` [PATCH v2 0/3] A few cleanup and fixup patches for swap Andrew Morton 3 siblings, 3 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-08 14:40 UTC (permalink / raw) To: akpm; +Cc: david, linux-mm, linux-kernel, linmiaohe swap_cache_info are not statistics that could be easily used to tune system performance because they are not easily accessile. Also they can't provide really useful info when OOM occurs. Remove these statistics can also help mitigate unneeded global swap_cache_info cacheline contention. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/swap_state.c | 17 ----------------- 1 file changed, 17 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 0a2021fc55ad..41c6a6053d5c 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -59,24 +59,11 @@ static bool enable_vma_readahead __read_mostly = true; #define GET_SWAP_RA_VAL(vma) \ (atomic_long_read(&(vma)->swap_readahead_info) ? : 4) -#define INC_CACHE_INFO(x) data_race(swap_cache_info.x++) -#define ADD_CACHE_INFO(x, nr) data_race(swap_cache_info.x += (nr)) - -static struct { - unsigned long add_total; - unsigned long del_total; - unsigned long find_success; - unsigned long find_total; -} swap_cache_info; - static atomic_t swapin_readahead_hits = ATOMIC_INIT(4); void show_swap_cache_info(void) { printk("%lu pages in swap cache\n", total_swapcache_pages()); - printk("Swap cache stats: add %lu, delete %lu, find %lu/%lu\n", - swap_cache_info.add_total, swap_cache_info.del_total, - swap_cache_info.find_success, swap_cache_info.find_total); printk("Free swap = %ldkB\n", get_nr_swap_pages() << (PAGE_SHIFT - 10)); printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10)); @@ -133,7 +120,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, address_space->nrpages += nr; __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); __mod_lruvec_page_state(page, NR_SWAPCACHE, nr); - ADD_CACHE_INFO(add_total, nr); unlock: xas_unlock_irq(&xas); } while (xas_nomem(&xas, gfp)); @@ -172,7 +158,6 @@ void __delete_from_swap_cache(struct page *page, address_space->nrpages -= nr; __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); __mod_lruvec_page_state(page, NR_SWAPCACHE, -nr); - ADD_CACHE_INFO(del_total, nr); } /** @@ -348,12 +333,10 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma, page = find_get_page(swap_address_space(entry), swp_offset(entry)); put_swap_device(si); - INC_CACHE_INFO(find_total); if (page) { bool vma_ra = swap_use_vma_readahead(); bool readahead; - INC_CACHE_INFO(find_success); /* * At the moment, we don't support PG_readahead for anon THP * so let's bail out rather than confusing the readahead stat. -- 2.23.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics 2022-06-08 14:40 ` [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics Miaohe Lin @ 2022-06-08 15:16 ` David Hildenbrand 2022-06-20 8:08 ` Huang, Ying 2022-06-20 9:30 ` Muchun Song 2 siblings, 0 replies; 31+ messages in thread From: David Hildenbrand @ 2022-06-08 15:16 UTC (permalink / raw) To: Miaohe Lin, akpm; +Cc: linux-mm, linux-kernel On 08.06.22 16:40, Miaohe Lin wrote: > swap_cache_info are not statistics that could be easily used to tune system > performance because they are not easily accessile. Also they can't provide > really useful info when OOM occurs. Remove these statistics can also help > mitigate unneeded global swap_cache_info cacheline contention. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics 2022-06-08 14:40 ` [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics Miaohe Lin 2022-06-08 15:16 ` David Hildenbrand @ 2022-06-20 8:08 ` Huang, Ying 2022-06-20 9:05 ` Miaohe Lin 2022-06-20 9:30 ` Muchun Song 2 siblings, 1 reply; 31+ messages in thread From: Huang, Ying @ 2022-06-20 8:08 UTC (permalink / raw) To: Miaohe Lin; +Cc: akpm, david, linux-mm, linux-kernel Miaohe Lin <linmiaohe@huawei.com> writes: > swap_cache_info are not statistics that could be easily used to tune system > performance because they are not easily accessile. Also they can't provide > really useful info when OOM occurs. Remove these statistics can also help > mitigate unneeded global swap_cache_info cacheline contention. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/swap_state.c | 17 ----------------- > 1 file changed, 17 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 0a2021fc55ad..41c6a6053d5c 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -59,24 +59,11 @@ static bool enable_vma_readahead __read_mostly = true; > #define GET_SWAP_RA_VAL(vma) \ > (atomic_long_read(&(vma)->swap_readahead_info) ? : 4) > > -#define INC_CACHE_INFO(x) data_race(swap_cache_info.x++) > -#define ADD_CACHE_INFO(x, nr) data_race(swap_cache_info.x += (nr)) > - > -static struct { > - unsigned long add_total; > - unsigned long del_total; > - unsigned long find_success; > - unsigned long find_total; > -} swap_cache_info; > - > static atomic_t swapin_readahead_hits = ATOMIC_INIT(4); > > void show_swap_cache_info(void) > { > printk("%lu pages in swap cache\n", total_swapcache_pages()); > - printk("Swap cache stats: add %lu, delete %lu, find %lu/%lu\n", > - swap_cache_info.add_total, swap_cache_info.del_total, > - swap_cache_info.find_success, swap_cache_info.find_total); > printk("Free swap = %ldkB\n", > get_nr_swap_pages() << (PAGE_SHIFT - 10)); > printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10)); > @@ -133,7 +120,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, > address_space->nrpages += nr; > __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); > __mod_lruvec_page_state(page, NR_SWAPCACHE, nr); > - ADD_CACHE_INFO(add_total, nr); > unlock: > xas_unlock_irq(&xas); > } while (xas_nomem(&xas, gfp)); > @@ -172,7 +158,6 @@ void __delete_from_swap_cache(struct page *page, > address_space->nrpages -= nr; > __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); > __mod_lruvec_page_state(page, NR_SWAPCACHE, -nr); > - ADD_CACHE_INFO(del_total, nr); > } > > /** > @@ -348,12 +333,10 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma, > page = find_get_page(swap_address_space(entry), swp_offset(entry)); > put_swap_device(si); > > - INC_CACHE_INFO(find_total); > if (page) { > bool vma_ra = swap_use_vma_readahead(); > bool readahead; > > - INC_CACHE_INFO(find_success); > /* > * At the moment, we don't support PG_readahead for anon THP > * so let's bail out rather than confusing the readahead stat. This looks reasonable. And if we want to do some statistics for swap cache in the future, we can use BPF, that is even more convenient. Acked-by: "Huang, Ying" <ying.huang@intel.com> Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics 2022-06-20 8:08 ` Huang, Ying @ 2022-06-20 9:05 ` Miaohe Lin 0 siblings, 0 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-20 9:05 UTC (permalink / raw) To: Huang, Ying; +Cc: akpm, david, linux-mm, linux-kernel On 2022/6/20 16:08, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> swap_cache_info are not statistics that could be easily used to tune system >> performance because they are not easily accessile. Also they can't provide >> really useful info when OOM occurs. Remove these statistics can also help >> mitigate unneeded global swap_cache_info cacheline contention. >> >> Suggested-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/swap_state.c | 17 ----------------- >> 1 file changed, 17 deletions(-) >> >> diff --git a/mm/swap_state.c b/mm/swap_state.c >> index 0a2021fc55ad..41c6a6053d5c 100644 >> --- a/mm/swap_state.c >> +++ b/mm/swap_state.c >> @@ -59,24 +59,11 @@ static bool enable_vma_readahead __read_mostly = true; >> #define GET_SWAP_RA_VAL(vma) \ >> (atomic_long_read(&(vma)->swap_readahead_info) ? : 4) >> >> -#define INC_CACHE_INFO(x) data_race(swap_cache_info.x++) >> -#define ADD_CACHE_INFO(x, nr) data_race(swap_cache_info.x += (nr)) >> - >> -static struct { >> - unsigned long add_total; >> - unsigned long del_total; >> - unsigned long find_success; >> - unsigned long find_total; >> -} swap_cache_info; >> - >> static atomic_t swapin_readahead_hits = ATOMIC_INIT(4); >> >> void show_swap_cache_info(void) >> { >> printk("%lu pages in swap cache\n", total_swapcache_pages()); >> - printk("Swap cache stats: add %lu, delete %lu, find %lu/%lu\n", >> - swap_cache_info.add_total, swap_cache_info.del_total, >> - swap_cache_info.find_success, swap_cache_info.find_total); >> printk("Free swap = %ldkB\n", >> get_nr_swap_pages() << (PAGE_SHIFT - 10)); >> printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10)); >> @@ -133,7 +120,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, >> address_space->nrpages += nr; >> __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); >> __mod_lruvec_page_state(page, NR_SWAPCACHE, nr); >> - ADD_CACHE_INFO(add_total, nr); >> unlock: >> xas_unlock_irq(&xas); >> } while (xas_nomem(&xas, gfp)); >> @@ -172,7 +158,6 @@ void __delete_from_swap_cache(struct page *page, >> address_space->nrpages -= nr; >> __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); >> __mod_lruvec_page_state(page, NR_SWAPCACHE, -nr); >> - ADD_CACHE_INFO(del_total, nr); >> } >> >> /** >> @@ -348,12 +333,10 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma, >> page = find_get_page(swap_address_space(entry), swp_offset(entry)); >> put_swap_device(si); >> >> - INC_CACHE_INFO(find_total); >> if (page) { >> bool vma_ra = swap_use_vma_readahead(); >> bool readahead; >> >> - INC_CACHE_INFO(find_success); >> /* >> * At the moment, we don't support PG_readahead for anon THP >> * so let's bail out rather than confusing the readahead stat. > > This looks reasonable. And if we want to do some statistics for swap > cache in the future, we can use BPF, that is even more convenient. BPF should be very convenient. Many thanks for reviewing! > > Acked-by: "Huang, Ying" <ying.huang@intel.com> > > Best Regards, > Huang, Ying > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics 2022-06-08 14:40 ` [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics Miaohe Lin 2022-06-08 15:16 ` David Hildenbrand 2022-06-20 8:08 ` Huang, Ying @ 2022-06-20 9:30 ` Muchun Song 2 siblings, 0 replies; 31+ messages in thread From: Muchun Song @ 2022-06-20 9:30 UTC (permalink / raw) To: Miaohe Lin; +Cc: akpm, david, linux-mm, linux-kernel On Wed, Jun 08, 2022 at 10:40:31PM +0800, Miaohe Lin wrote: > swap_cache_info are not statistics that could be easily used to tune system > performance because they are not easily accessile. Also they can't provide > really useful info when OOM occurs. Remove these statistics can also help > mitigate unneeded global swap_cache_info cacheline contention. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 0/3] A few cleanup and fixup patches for swap 2022-06-08 14:40 [PATCH v2 0/3] A few cleanup and fixup patches for swap Miaohe Lin ` (2 preceding siblings ...) 2022-06-08 14:40 ` [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics Miaohe Lin @ 2022-06-17 2:37 ` Andrew Morton 2022-06-17 3:00 ` Miaohe Lin 3 siblings, 1 reply; 31+ messages in thread From: Andrew Morton @ 2022-06-17 2:37 UTC (permalink / raw) To: Miaohe Lin; +Cc: david, linux-mm, linux-kernel, Hugh Dickins On Wed, 8 Jun 2022 22:40:28 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > This series contains a cleaup patch to remove unneeded swap_cache_info > statistics, and two bugfix patches to avoid possible data races of > inuse_pages and so on. More details can be found in the respective > changelogs. It would be nice to get [1/3] reviewed please. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 0/3] A few cleanup and fixup patches for swap 2022-06-17 2:37 ` [PATCH v2 0/3] A few cleanup and fixup patches for swap Andrew Morton @ 2022-06-17 3:00 ` Miaohe Lin 0 siblings, 0 replies; 31+ messages in thread From: Miaohe Lin @ 2022-06-17 3:00 UTC (permalink / raw) To: Andrew Morton, David Hildenbrand, Hugh Dickins, Huang Ying Cc: linux-mm, linux-kernel On 2022/6/17 10:37, Andrew Morton wrote: > On Wed, 8 Jun 2022 22:40:28 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > >> This series contains a cleaup patch to remove unneeded swap_cache_info >> statistics, and two bugfix patches to avoid possible data races of >> inuse_pages and so on. More details can be found in the respective >> changelogs. > > It would be nice to get [1/3] reviewed please. I'd like too. Hi David & Hugh & Huang, Ying, It's very kind of you if you can help review this patch! Thanks! BTW: It should be convenient if there are mm/swap reviewers. ;) > > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2022-06-21 8:21 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-08 14:40 [PATCH v2 0/3] A few cleanup and fixup patches for swap Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected Miaohe Lin 2022-06-17 7:33 ` David Hildenbrand 2022-06-18 2:43 ` Miaohe Lin 2022-06-18 7:10 ` David Hildenbrand 2022-06-18 7:31 ` Miaohe Lin 2022-06-20 7:31 ` Huang, Ying 2022-06-20 12:12 ` Miaohe Lin 2022-06-21 1:35 ` Huang, Ying 2022-06-21 7:37 ` Miaohe Lin 2022-06-21 7:42 ` Huang, Ying 2022-06-21 8:20 ` Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 2/3] mm/swapfile: fix possible data races of inuse_pages Miaohe Lin 2022-06-20 7:54 ` Huang, Ying 2022-06-20 9:04 ` Miaohe Lin 2022-06-20 9:23 ` Muchun Song 2022-06-20 12:23 ` Miaohe Lin 2022-06-20 12:32 ` Miaohe Lin 2022-06-20 13:46 ` Qian Cai 2022-06-20 14:20 ` Muchun Song 2022-06-20 21:36 ` Qian Cai 2022-06-21 1:14 ` Huang, Ying 2022-06-21 3:39 ` Muchun Song 2022-06-21 6:40 ` Miaohe Lin 2022-06-08 14:40 ` [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics Miaohe Lin 2022-06-08 15:16 ` David Hildenbrand 2022-06-20 8:08 ` Huang, Ying 2022-06-20 9:05 ` Miaohe Lin 2022-06-20 9:30 ` Muchun Song 2022-06-17 2:37 ` [PATCH v2 0/3] A few cleanup and fixup patches for swap Andrew Morton 2022-06-17 3:00 ` Miaohe Lin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.