* [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() @ 2021-05-13 6:48 Huang Ying 2021-05-13 8:49 ` Miaohe Lin 2021-05-13 12:46 ` Peter Zijlstra 0 siblings, 2 replies; 15+ messages in thread From: Huang Ying @ 2021-05-13 6:48 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Huang Ying, Daniel Jordan, Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array accesses to avoid NULL derefs"), the typical code to reference the swap_info[] is as follows, type = swp_type(swp_entry); if (type >= nr_swapfiles) /* handle invalid swp_entry */; p = swap_info[type]; /* access fields of *p. OOPS! p may be NULL! */ Because the ordering isn't guaranteed, it's possible that "p" is read before checking "type". And that may result in NULL pointer dereference. So in commit c10d38cc8d3e, the code becomes, struct swap_info_struct *swap_type_to_swap_info(int type) { if (type >= READ_ONCE(nr_swapfiles)) return NULL; smp_rmb(); return READ_ONCE(swap_info[type]); } /* users */ type = swp_type(swp_entry); p = swap_type_to_swap_info(type); if (!p) /* handle invalid swp_entry */; /* access fields of *p */ Because "p" is checked to be non-zero before dereference, smp_rmb() isn't needed anymore. We still need to guarantee swap_info[type] is read before dereference. That can be satisfied via the data dependency ordering of READ_ONCE(swap_info[type]). The corresponding smp_wmb() is adjusted in alloc_swap_info() too. And, we don't need to read "nr_swapfiles" too. Because if "type >= nr_swapfiles", swap_info[type] will be NULL. We just need to make sure we will not access out of the boundary of the array. With that change, nr_swapfiles will only be accessed with swap_lock held, except in swapcache_free_entries(). Where the absolute correctness of the value isn't needed, as described in the comments. Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Paul McKenney <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Miaohe Lin <linmiaohe@huawei.com> --- mm/swapfile.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 2aad85751991..4c1fb28bbe0e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); static struct swap_info_struct *swap_type_to_swap_info(int type) { - if (type >= READ_ONCE(nr_swapfiles)) + if (type >= MAX_SWAPFILES) return NULL; - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ + /* + * The data dependency ordering from the READ_ONCE() pairs + * with smp_wmb() in alloc_swap_info() to guarantee the + * swap_info_struct fields are read after swap_info[type]. + */ return READ_ONCE(swap_info[type]); } @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) } if (type >= nr_swapfiles) { p->type = type; - WRITE_ONCE(swap_info[type], p); - /* - * Write swap_info[type] before nr_swapfiles, in case a - * racing procfs swap_start() or swap_next() is reading them. - * (We never shrink nr_swapfiles, we never free this entry.) - */ + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ smp_wmb(); - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); + WRITE_ONCE(swap_info[type], p); + nr_swapfiles++; } else { defer = p; p = swap_info[type]; -- 2.30.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 6:48 [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() Huang Ying @ 2021-05-13 8:49 ` Miaohe Lin 2021-05-13 9:54 ` Muchun Song 2021-05-13 12:46 ` Peter Zijlstra 1 sibling, 1 reply; 15+ messages in thread From: Miaohe Lin @ 2021-05-13 8:49 UTC (permalink / raw) To: Huang Ying, Andrew Morton Cc: linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon On 2021/5/13 14:48, Huang Ying wrote: > Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array > accesses to avoid NULL derefs"), the typical code to reference the > swap_info[] is as follows, > > type = swp_type(swp_entry); > if (type >= nr_swapfiles) > /* handle invalid swp_entry */; > p = swap_info[type]; > /* access fields of *p. OOPS! p may be NULL! */ > > Because the ordering isn't guaranteed, it's possible that "p" is read > before checking "type". And that may result in NULL pointer > dereference. > > So in commit c10d38cc8d3e, the code becomes, > > struct swap_info_struct *swap_type_to_swap_info(int type) > { > if (type >= READ_ONCE(nr_swapfiles)) > return NULL; > smp_rmb(); > return READ_ONCE(swap_info[type]); > } > > /* users */ > type = swp_type(swp_entry); > p = swap_type_to_swap_info(type); > if (!p) > /* handle invalid swp_entry */; > /* access fields of *p */ > > Because "p" is checked to be non-zero before dereference, smp_rmb() > isn't needed anymore. > > We still need to guarantee swap_info[type] is read before dereference. > That can be satisfied via the data dependency ordering of > READ_ONCE(swap_info[type]). The corresponding smp_wmb() is adjusted > in alloc_swap_info() too. > > And, we don't need to read "nr_swapfiles" too. Because if > "type >= nr_swapfiles", swap_info[type] will be NULL. We just need > to make sure we will not access out of the boundary of the array. > With that change, nr_swapfiles will only be accessed with swap_lock > held, except in swapcache_free_entries(). Where the absolute > correctness of the value isn't needed, as described in the comments. > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > Cc: Daniel Jordan <daniel.m.jordan@oracle.com> > Cc: Dan Carpenter <dan.carpenter@oracle.com> > Cc: Andrea Parri <andrea.parri@amarulasolutions.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Andi Kleen <ak@linux.intel.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Omar Sandoval <osandov@fb.com> > Cc: Paul McKenney <paulmck@kernel.org> > Cc: Tejun Heo <tj@kernel.org> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/swapfile.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 2aad85751991..4c1fb28bbe0e 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > static struct swap_info_struct *swap_type_to_swap_info(int type) > { > - if (type >= READ_ONCE(nr_swapfiles)) > + if (type >= MAX_SWAPFILES) > return NULL; > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > + /* > + * The data dependency ordering from the READ_ONCE() pairs > + * with smp_wmb() in alloc_swap_info() to guarantee the > + * swap_info_struct fields are read after swap_info[type]. > + */ > return READ_ONCE(swap_info[type]); > } > > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) > } > if (type >= nr_swapfiles) { > p->type = type; > - WRITE_ONCE(swap_info[type], p); > - /* > - * Write swap_info[type] before nr_swapfiles, in case a > - * racing procfs swap_start() or swap_next() is reading them. > - * (We never shrink nr_swapfiles, we never free this entry.) > - */ > + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ > smp_wmb(); Many thank for your patch. The patch looks fine to me. There is one question: There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ? Could you please have a explanation ? Thanks again! > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > + WRITE_ONCE(swap_info[type], p); > + nr_swapfiles++; > } else { > defer = p; > p = swap_info[type]; > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 8:49 ` Miaohe Lin @ 2021-05-13 9:54 ` Muchun Song 0 siblings, 0 replies; 15+ messages in thread From: Muchun Song @ 2021-05-13 9:54 UTC (permalink / raw) To: Miaohe Lin Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > On 2021/5/13 14:48, Huang Ying wrote: > > Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array > > accesses to avoid NULL derefs"), the typical code to reference the > > swap_info[] is as follows, > > > > type = swp_type(swp_entry); > > if (type >= nr_swapfiles) > > /* handle invalid swp_entry */; > > p = swap_info[type]; > > /* access fields of *p. OOPS! p may be NULL! */ > > > > Because the ordering isn't guaranteed, it's possible that "p" is read > > before checking "type". And that may result in NULL pointer > > dereference. > > > > So in commit c10d38cc8d3e, the code becomes, > > > > struct swap_info_struct *swap_type_to_swap_info(int type) > > { > > if (type >= READ_ONCE(nr_swapfiles)) > > return NULL; > > smp_rmb(); > > return READ_ONCE(swap_info[type]); > > } > > > > /* users */ > > type = swp_type(swp_entry); > > p = swap_type_to_swap_info(type); > > if (!p) > > /* handle invalid swp_entry */; > > /* access fields of *p */ > > > > Because "p" is checked to be non-zero before dereference, smp_rmb() > > isn't needed anymore. > > > > We still need to guarantee swap_info[type] is read before dereference. > > That can be satisfied via the data dependency ordering of > > READ_ONCE(swap_info[type]). The corresponding smp_wmb() is adjusted > > in alloc_swap_info() too. > > > > And, we don't need to read "nr_swapfiles" too. Because if > > "type >= nr_swapfiles", swap_info[type] will be NULL. We just need > > to make sure we will not access out of the boundary of the array. > > With that change, nr_swapfiles will only be accessed with swap_lock > > held, except in swapcache_free_entries(). Where the absolute > > correctness of the value isn't needed, as described in the comments. > > > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > > Cc: Daniel Jordan <daniel.m.jordan@oracle.com> > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > Cc: Andrea Parri <andrea.parri@amarulasolutions.com> > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > > Cc: Andi Kleen <ak@linux.intel.com> > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > > Cc: Omar Sandoval <osandov@fb.com> > > Cc: Paul McKenney <paulmck@kernel.org> > > Cc: Tejun Heo <tj@kernel.org> > > Cc: Will Deacon <will.deacon@arm.com> > > Cc: Miaohe Lin <linmiaohe@huawei.com> > > --- > > mm/swapfile.c | 18 +++++++++--------- > > 1 file changed, 9 insertions(+), 9 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 2aad85751991..4c1fb28bbe0e 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) > > { > > - if (type >= READ_ONCE(nr_swapfiles)) > > + if (type >= MAX_SWAPFILES) > > return NULL; > > > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > > + /* > > + * The data dependency ordering from the READ_ONCE() pairs > > + * with smp_wmb() in alloc_swap_info() to guarantee the > > + * swap_info_struct fields are read after swap_info[type]. > > + */ > > return READ_ONCE(swap_info[type]); > > } > > > > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) > > } > > if (type >= nr_swapfiles) { > > p->type = type; > > - WRITE_ONCE(swap_info[type], p); > > - /* > > - * Write swap_info[type] before nr_swapfiles, in case a > > - * racing procfs swap_start() or swap_next() is reading them. > > - * (We never shrink nr_swapfiles, we never free this entry.) > > - */ > > + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ > > smp_wmb(); > > Many thank for your patch. The patch looks fine to me. There is one question: > > There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ? > Could you please have a explanation ? The comment is very clear, it matches READ_ONCE() which implies a data dependence barrier on some archs. Thanks. > > Thanks again! > > > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > > + WRITE_ONCE(swap_info[type], p); > > + nr_swapfiles++; > > } else { > > defer = p; > > p = swap_info[type]; > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() @ 2021-05-13 9:54 ` Muchun Song 0 siblings, 0 replies; 15+ messages in thread From: Muchun Song @ 2021-05-13 9:54 UTC (permalink / raw) To: Miaohe Lin Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > On 2021/5/13 14:48, Huang Ying wrote: > > Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array > > accesses to avoid NULL derefs"), the typical code to reference the > > swap_info[] is as follows, > > > > type = swp_type(swp_entry); > > if (type >= nr_swapfiles) > > /* handle invalid swp_entry */; > > p = swap_info[type]; > > /* access fields of *p. OOPS! p may be NULL! */ > > > > Because the ordering isn't guaranteed, it's possible that "p" is read > > before checking "type". And that may result in NULL pointer > > dereference. > > > > So in commit c10d38cc8d3e, the code becomes, > > > > struct swap_info_struct *swap_type_to_swap_info(int type) > > { > > if (type >= READ_ONCE(nr_swapfiles)) > > return NULL; > > smp_rmb(); > > return READ_ONCE(swap_info[type]); > > } > > > > /* users */ > > type = swp_type(swp_entry); > > p = swap_type_to_swap_info(type); > > if (!p) > > /* handle invalid swp_entry */; > > /* access fields of *p */ > > > > Because "p" is checked to be non-zero before dereference, smp_rmb() > > isn't needed anymore. > > > > We still need to guarantee swap_info[type] is read before dereference. > > That can be satisfied via the data dependency ordering of > > READ_ONCE(swap_info[type]). The corresponding smp_wmb() is adjusted > > in alloc_swap_info() too. > > > > And, we don't need to read "nr_swapfiles" too. Because if > > "type >= nr_swapfiles", swap_info[type] will be NULL. We just need > > to make sure we will not access out of the boundary of the array. > > With that change, nr_swapfiles will only be accessed with swap_lock > > held, except in swapcache_free_entries(). Where the absolute > > correctness of the value isn't needed, as described in the comments. > > > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > > Cc: Daniel Jordan <daniel.m.jordan@oracle.com> > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > Cc: Andrea Parri <andrea.parri@amarulasolutions.com> > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > > Cc: Andi Kleen <ak@linux.intel.com> > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > > Cc: Omar Sandoval <osandov@fb.com> > > Cc: Paul McKenney <paulmck@kernel.org> > > Cc: Tejun Heo <tj@kernel.org> > > Cc: Will Deacon <will.deacon@arm.com> > > Cc: Miaohe Lin <linmiaohe@huawei.com> > > --- > > mm/swapfile.c | 18 +++++++++--------- > > 1 file changed, 9 insertions(+), 9 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 2aad85751991..4c1fb28bbe0e 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) > > { > > - if (type >= READ_ONCE(nr_swapfiles)) > > + if (type >= MAX_SWAPFILES) > > return NULL; > > > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > > + /* > > + * The data dependency ordering from the READ_ONCE() pairs > > + * with smp_wmb() in alloc_swap_info() to guarantee the > > + * swap_info_struct fields are read after swap_info[type]. > > + */ > > return READ_ONCE(swap_info[type]); > > } > > > > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) > > } > > if (type >= nr_swapfiles) { > > p->type = type; > > - WRITE_ONCE(swap_info[type], p); > > - /* > > - * Write swap_info[type] before nr_swapfiles, in case a > > - * racing procfs swap_start() or swap_next() is reading them. > > - * (We never shrink nr_swapfiles, we never free this entry.) > > - */ > > + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ > > smp_wmb(); > > Many thank for your patch. The patch looks fine to me. There is one question: > > There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ? > Could you please have a explanation ? The comment is very clear, it matches READ_ONCE() which implies a data dependence barrier on some archs. Thanks. > > Thanks again! > > > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > > + WRITE_ONCE(swap_info[type], p); > > + nr_swapfiles++; > > } else { > > defer = p; > > p = swap_info[type]; > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 9:54 ` Muchun Song (?) @ 2021-05-13 11:27 ` Miaohe Lin -1 siblings, 0 replies; 15+ messages in thread From: Miaohe Lin @ 2021-05-13 11:27 UTC (permalink / raw) To: Muchun Song Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon On 2021/5/13 17:54, Muchun Song wrote: > On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote: >> >> On 2021/5/13 14:48, Huang Ying wrote: >>> Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array >>> accesses to avoid NULL derefs"), the typical code to reference the >>> swap_info[] is as follows, >>> >>> type = swp_type(swp_entry); >>> if (type >= nr_swapfiles) >>> /* handle invalid swp_entry */; >>> p = swap_info[type]; >>> /* access fields of *p. OOPS! p may be NULL! */ >>> >>> Because the ordering isn't guaranteed, it's possible that "p" is read >>> before checking "type". And that may result in NULL pointer >>> dereference. >>> >>> So in commit c10d38cc8d3e, the code becomes, >>> >>> struct swap_info_struct *swap_type_to_swap_info(int type) >>> { >>> if (type >= READ_ONCE(nr_swapfiles)) >>> return NULL; >>> smp_rmb(); >>> return READ_ONCE(swap_info[type]); >>> } >>> >>> /* users */ >>> type = swp_type(swp_entry); >>> p = swap_type_to_swap_info(type); >>> if (!p) >>> /* handle invalid swp_entry */; >>> /* access fields of *p */ >>> >>> Because "p" is checked to be non-zero before dereference, smp_rmb() >>> isn't needed anymore. >>> >>> We still need to guarantee swap_info[type] is read before dereference. >>> That can be satisfied via the data dependency ordering of >>> READ_ONCE(swap_info[type]). The corresponding smp_wmb() is adjusted >>> in alloc_swap_info() too. >>> >>> And, we don't need to read "nr_swapfiles" too. Because if >>> "type >= nr_swapfiles", swap_info[type] will be NULL. We just need >>> to make sure we will not access out of the boundary of the array. >>> With that change, nr_swapfiles will only be accessed with swap_lock >>> held, except in swapcache_free_entries(). Where the absolute >>> correctness of the value isn't needed, as described in the comments. >>> >>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >>> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> >>> Cc: Dan Carpenter <dan.carpenter@oracle.com> >>> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> >>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> >>> Cc: Andi Kleen <ak@linux.intel.com> >>> Cc: Dave Hansen <dave.hansen@linux.intel.com> >>> Cc: Omar Sandoval <osandov@fb.com> >>> Cc: Paul McKenney <paulmck@kernel.org> >>> Cc: Tejun Heo <tj@kernel.org> >>> Cc: Will Deacon <will.deacon@arm.com> >>> Cc: Miaohe Lin <linmiaohe@huawei.com> >>> --- >>> mm/swapfile.c | 18 +++++++++--------- >>> 1 file changed, 9 insertions(+), 9 deletions(-) >>> >>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>> index 2aad85751991..4c1fb28bbe0e 100644 >>> --- a/mm/swapfile.c >>> +++ b/mm/swapfile.c >>> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >>> >>> static struct swap_info_struct *swap_type_to_swap_info(int type) >>> { >>> - if (type >= READ_ONCE(nr_swapfiles)) >>> + if (type >= MAX_SWAPFILES) >>> return NULL; >>> >>> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >>> + /* >>> + * The data dependency ordering from the READ_ONCE() pairs >>> + * with smp_wmb() in alloc_swap_info() to guarantee the >>> + * swap_info_struct fields are read after swap_info[type]. >>> + */ >>> return READ_ONCE(swap_info[type]); >>> } >>> >>> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) >>> } >>> if (type >= nr_swapfiles) { >>> p->type = type; >>> - WRITE_ONCE(swap_info[type], p); >>> - /* >>> - * Write swap_info[type] before nr_swapfiles, in case a >>> - * racing procfs swap_start() or swap_next() is reading them. >>> - * (We never shrink nr_swapfiles, we never free this entry.) >>> - */ >>> + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ >>> smp_wmb(); >> >> Many thank for your patch. The patch looks fine to me. There is one question: >> >> There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ? >> Could you please have a explanation ? > > The comment is very clear, it matches READ_ONCE() which implies a > data dependence barrier on some archs. > > Thanks. Got it. I misunderstood it... Thanks! > >> >> Thanks again! >> >>> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >>> + WRITE_ONCE(swap_info[type], p); >>> + nr_swapfiles++; >>> } else { >>> defer = p; >>> p = swap_info[type]; >>> >> > . > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 9:54 ` Muchun Song (?) (?) @ 2021-05-13 12:34 ` Peter Zijlstra -1 siblings, 0 replies; 15+ messages in thread From: Peter Zijlstra @ 2021-05-13 12:34 UTC (permalink / raw) To: Muchun Song Cc: Miaohe Lin, Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon On Thu, May 13, 2021 at 05:54:42PM +0800, Muchun Song wrote: > On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > On 2021/5/13 14:48, Huang Ying wrote: > > > mm/swapfile.c | 18 +++++++++--------- > > > 1 file changed, 9 insertions(+), 9 deletions(-) > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > index 2aad85751991..4c1fb28bbe0e 100644 > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) > > > { > > > - if (type >= READ_ONCE(nr_swapfiles)) > > > + if (type >= MAX_SWAPFILES) > > > return NULL; > > > > > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > > > + /* > > > + * The data dependency ordering from the READ_ONCE() pairs > > > + * with smp_wmb() in alloc_swap_info() to guarantee the > > > + * swap_info_struct fields are read after swap_info[type]. > > > + */ > > > return READ_ONCE(swap_info[type]); > > > } > > > > > > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) > > > } > > > if (type >= nr_swapfiles) { > > > p->type = type; > > > - WRITE_ONCE(swap_info[type], p); > > > - /* > > > - * Write swap_info[type] before nr_swapfiles, in case a > > > - * racing procfs swap_start() or swap_next() is reading them. > > > - * (We never shrink nr_swapfiles, we never free this entry.) > > > - */ > > > + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ > > > smp_wmb(); > > > > Many thank for your patch. The patch looks fine to me. There is one question: > > > > There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ? > > Could you please have a explanation ? > > The comment is very clear, it matches READ_ONCE() which implies a > data dependence barrier on some archs. This statement doesn't make sense; this isn't code that needs to be correct on 'some' archs, it needs to be unconditionally correct. Also, you cannot pair with a single memop, there is no order in a set of one element. And if you depend on a data dependency, you need a store order; but you just removed the store order. in which case the data dependency is also moot. All of this is utter confusion. Possibly correct, but a complete trainwreck non-the-less. Either you say ordering is irrelevant, because we only ever increase the number of swapfiles and therefore any load is either NULL or the correct pointer, as guaranteed by WRITE_ONCE()/READ_ONCE() avoiding load/store tearing. Or you need the data dependency, but then you also need the store order like: CPU0 CPU1 if (type >= READ_ONCE(nr_swapfiles)) WRITE_ONCE(swap_info[type], p); return NULL; /* data-dependency on type */ smp_wmb(); return READ_ONCE(swap_info[type]); WRITE_ONCE(nr_swapfiles, nr_swapfiles+1); But you cannot have half of both and expect any of it to make sense. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 6:48 [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() Huang Ying 2021-05-13 8:49 ` Miaohe Lin @ 2021-05-13 12:46 ` Peter Zijlstra 2021-05-14 1:59 ` Daniel Jordan 2021-05-14 3:27 ` Huang, Ying 1 sibling, 2 replies; 15+ messages in thread From: Peter Zijlstra @ 2021-05-13 12:46 UTC (permalink / raw) To: Huang Ying Cc: Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin On Thu, May 13, 2021 at 02:48:37PM +0800, Huang Ying wrote: > mm/swapfile.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 2aad85751991..4c1fb28bbe0e 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > static struct swap_info_struct *swap_type_to_swap_info(int type) > { > - if (type >= READ_ONCE(nr_swapfiles)) > + if (type >= MAX_SWAPFILES) > return NULL; > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > + /* > + * The data dependency ordering from the READ_ONCE() pairs > + * with smp_wmb() in alloc_swap_info() to guarantee the > + * swap_info_struct fields are read after swap_info[type]. > + */ > return READ_ONCE(swap_info[type]); > } > > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) > } > if (type >= nr_swapfiles) { > p->type = type; > - WRITE_ONCE(swap_info[type], p); > - /* > - * Write swap_info[type] before nr_swapfiles, in case a > - * racing procfs swap_start() or swap_next() is reading them. > - * (We never shrink nr_swapfiles, we never free this entry.) > - */ > + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ > smp_wmb(); > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > + WRITE_ONCE(swap_info[type], p); > + nr_swapfiles++; Ah, I think I see what you meant to say, it would perhaps help if you write it like so: diff --git a/mm/swapfile.c b/mm/swapfile.c index 149e77454e3c..94735248dcd2 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); static struct swap_info_struct *swap_type_to_swap_info(int type) { - if (type >= READ_ONCE(nr_swapfiles)) + if (type >= MAX_SWAPFILES) return NULL; - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ - return READ_ONCE(swap_info[type]); + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } static inline unsigned char swap_count(unsigned char ent) @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) } if (type >= nr_swapfiles) { p->type = type; - WRITE_ONCE(swap_info[type], p); /* - * Write swap_info[type] before nr_swapfiles, in case a - * racing procfs swap_start() or swap_next() is reading them. - * (We never shrink nr_swapfiles, we never free this entry.) + * Publish the swap_info_struct. */ - smp_wmb(); - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ + nr_swapfiles++; } else { defer = p; p = swap_info[type]; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 12:46 ` Peter Zijlstra @ 2021-05-14 1:59 ` Daniel Jordan 2021-05-14 4:02 ` Huang, Ying 2021-05-14 12:04 ` Peter Zijlstra 2021-05-14 3:27 ` Huang, Ying 1 sibling, 2 replies; 15+ messages in thread From: Daniel Jordan @ 2021-05-14 1:59 UTC (permalink / raw) To: Peter Zijlstra, Huang Ying Cc: Andrew Morton, linux-mm, linux-kernel, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote: > Ah, I think I see what you meant to say, it would perhaps help if you > write it like so: > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 149e77454e3c..94735248dcd2 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > static struct swap_info_struct *swap_type_to_swap_info(int type) > { > - if (type >= READ_ONCE(nr_swapfiles)) > + if (type >= MAX_SWAPFILES) > return NULL; > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > - return READ_ONCE(swap_info[type]); > + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ > } > > static inline unsigned char swap_count(unsigned char ent) > @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) > } > if (type >= nr_swapfiles) { > p->type = type; > - WRITE_ONCE(swap_info[type], p); > /* > - * Write swap_info[type] before nr_swapfiles, in case a > - * racing procfs swap_start() or swap_next() is reading them. > - * (We never shrink nr_swapfiles, we never free this entry.) > + * Publish the swap_info_struct. > */ > - smp_wmb(); > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ > + nr_swapfiles++; Yes, this does help, I didn't understand why smp_wmb stayed around in the original post. I think the only access smp_store_release() orders is p->type. Wouldn't it be kinda inconsistent to only initialize that one field before publishing when many others would be done at the end of alloc_swap_info() after the fact? p->type doesn't seem special. For instance, get_swap_page_of_type() touches si->lock soon after it calls swap_type_to_swap_info(), so there could be a small window where there's a non-NULL si with an uninitialized lock. It's not as if this is likely to be a problem in practice, it would just make it harder to understand why smp_store_release is there. Maybe all we need is a WRITE_ONCE, or if it's really necessary for certain fields to be set before publication then move them up and explain? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-14 1:59 ` Daniel Jordan @ 2021-05-14 4:02 ` Huang, Ying 2021-05-14 12:04 ` Peter Zijlstra 1 sibling, 0 replies; 15+ messages in thread From: Huang, Ying @ 2021-05-14 4:02 UTC (permalink / raw) To: Daniel Jordan Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin Daniel Jordan <daniel.m.jordan@oracle.com> writes: > On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote: >> Ah, I think I see what you meant to say, it would perhaps help if you >> write it like so: >> >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 149e77454e3c..94735248dcd2 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >> >> static struct swap_info_struct *swap_type_to_swap_info(int type) >> { >> - if (type >= READ_ONCE(nr_swapfiles)) >> + if (type >= MAX_SWAPFILES) >> return NULL; >> >> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >> - return READ_ONCE(swap_info[type]); >> + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ >> } >> >> static inline unsigned char swap_count(unsigned char ent) >> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) >> } >> if (type >= nr_swapfiles) { >> p->type = type; >> - WRITE_ONCE(swap_info[type], p); >> /* >> - * Write swap_info[type] before nr_swapfiles, in case a >> - * racing procfs swap_start() or swap_next() is reading them. >> - * (We never shrink nr_swapfiles, we never free this entry.) >> + * Publish the swap_info_struct. >> */ >> - smp_wmb(); >> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >> + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ >> + nr_swapfiles++; > > Yes, this does help, I didn't understand why smp_wmb stayed around in > the original post. > > I think the only access smp_store_release() orders is p->type. Wouldn't > it be kinda inconsistent to only initialize that one field before > publishing when many others would be done at the end of > alloc_swap_info() after the fact? In addition to p->type, *p is zeroed via kvzalloc(). > p->type doesn't seem special. For > instance, get_swap_page_of_type() touches si->lock soon after it calls > swap_type_to_swap_info(), so there could be a small window where there's > a non-NULL si with an uninitialized lock. We usually check the state of swap_info_struct before other operations. For example, we check si->swap_map in swap_start(). > It's not as if this is likely to be a problem in practice, it would just > make it harder to understand why smp_store_release is there. Maybe all > we need is a WRITE_ONCE, or if it's really necessary for certain fields > to be set before publication then move them up and explain? I think we have initialized all fields before publication :-). Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() @ 2021-05-14 4:02 ` Huang, Ying 0 siblings, 0 replies; 15+ messages in thread From: Huang, Ying @ 2021-05-14 4:02 UTC (permalink / raw) To: Daniel Jordan Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin Daniel Jordan <daniel.m.jordan@oracle.com> writes: > On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote: >> Ah, I think I see what you meant to say, it would perhaps help if you >> write it like so: >> >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 149e77454e3c..94735248dcd2 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >> >> static struct swap_info_struct *swap_type_to_swap_info(int type) >> { >> - if (type >= READ_ONCE(nr_swapfiles)) >> + if (type >= MAX_SWAPFILES) >> return NULL; >> >> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >> - return READ_ONCE(swap_info[type]); >> + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ >> } >> >> static inline unsigned char swap_count(unsigned char ent) >> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) >> } >> if (type >= nr_swapfiles) { >> p->type = type; >> - WRITE_ONCE(swap_info[type], p); >> /* >> - * Write swap_info[type] before nr_swapfiles, in case a >> - * racing procfs swap_start() or swap_next() is reading them. >> - * (We never shrink nr_swapfiles, we never free this entry.) >> + * Publish the swap_info_struct. >> */ >> - smp_wmb(); >> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >> + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ >> + nr_swapfiles++; > > Yes, this does help, I didn't understand why smp_wmb stayed around in > the original post. > > I think the only access smp_store_release() orders is p->type. Wouldn't > it be kinda inconsistent to only initialize that one field before > publishing when many others would be done at the end of > alloc_swap_info() after the fact? In addition to p->type, *p is zeroed via kvzalloc(). > p->type doesn't seem special. For > instance, get_swap_page_of_type() touches si->lock soon after it calls > swap_type_to_swap_info(), so there could be a small window where there's > a non-NULL si with an uninitialized lock. We usually check the state of swap_info_struct before other operations. For example, we check si->swap_map in swap_start(). > It's not as if this is likely to be a problem in practice, it would just > make it harder to understand why smp_store_release is there. Maybe all > we need is a WRITE_ONCE, or if it's really necessary for certain fields > to be set before publication then move them up and explain? I think we have initialized all fields before publication :-). Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-14 4:02 ` Huang, Ying (?) @ 2021-05-14 20:49 ` Daniel Jordan -1 siblings, 0 replies; 15+ messages in thread From: Daniel Jordan @ 2021-05-14 20:49 UTC (permalink / raw) To: Huang, Ying Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin On Fri, May 14, 2021 at 12:02:05PM +0800, Huang, Ying wrote: > Daniel Jordan <daniel.m.jordan@oracle.com> writes: > > Yes, this does help, I didn't understand why smp_wmb stayed around in > > the original post. > > > > I think the only access smp_store_release() orders is p->type. Wouldn't > > it be kinda inconsistent to only initialize that one field before > > publishing when many others would be done at the end of > > alloc_swap_info() after the fact? > > In addition to p->type, *p is zeroed via kvzalloc(). So it is, good point. > > p->type doesn't seem special. For > > instance, get_swap_page_of_type() touches si->lock soon after it calls > > swap_type_to_swap_info(), so there could be a small window where there's > > a non-NULL si with an uninitialized lock. > > We usually check the state of swap_info_struct before other operations. > For example, we check si->swap_map in swap_start(). Yes, we usually do. > > It's not as if this is likely to be a problem in practice, it would just > > make it harder to understand why smp_store_release is there. Maybe all > > we need is a WRITE_ONCE, or if it's really necessary for certain fields > > to be set before publication then move them up and explain? > > I think we have initialized all fields before publication :-). Probably all the ones that matter in practice, yes :-) Still feeling slightly uneasy about the theoretical p->lock, but that was possible before this change too so it's out of scope. A comment explaining the pairing and that we care mostly about the zero init would be nice. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-14 1:59 ` Daniel Jordan 2021-05-14 4:02 ` Huang, Ying @ 2021-05-14 12:04 ` Peter Zijlstra 2021-05-14 20:51 ` Daniel Jordan 1 sibling, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2021-05-14 12:04 UTC (permalink / raw) To: Daniel Jordan Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin On Thu, May 13, 2021 at 09:59:46PM -0400, Daniel Jordan wrote: > On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote: > > Ah, I think I see what you meant to say, it would perhaps help if you > > write it like so: > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 149e77454e3c..94735248dcd2 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) > > { > > - if (type >= READ_ONCE(nr_swapfiles)) > > + if (type >= MAX_SWAPFILES) > > return NULL; > > > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > > - return READ_ONCE(swap_info[type]); > > + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ > > } > > > > static inline unsigned char swap_count(unsigned char ent) > > @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) > > } > > if (type >= nr_swapfiles) { > > p->type = type; > > - WRITE_ONCE(swap_info[type], p); > > /* > > - * Write swap_info[type] before nr_swapfiles, in case a > > - * racing procfs swap_start() or swap_next() is reading them. > > - * (We never shrink nr_swapfiles, we never free this entry.) > > + * Publish the swap_info_struct. > > */ > > - smp_wmb(); > > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > > + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ > > + nr_swapfiles++; > > Yes, this does help, I didn't understand why smp_wmb stayed around in > the original post. > > I think the only access smp_store_release() orders is p->type. Wouldn't > it be kinda inconsistent to only initialize that one field before > publishing when many others would be done at the end of > alloc_swap_info() after the fact? p->type doesn't seem special. For > instance, get_swap_page_of_type() touches si->lock soon after it calls > swap_type_to_swap_info(), so there could be a small window where there's > a non-NULL si with an uninitialized lock. > > It's not as if this is likely to be a problem in practice, it would just > make it harder to understand why smp_store_release is there. Maybe all > we need is a WRITE_ONCE, or if it's really necessary for certain fields > to be set before publication then move them up and explain? You also care about the zero fill from kvzalloc(). Without the smp_store_release() the zero-fill from the memset() might only be visible 'late'. Unless that also isn't a problem? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-14 12:04 ` Peter Zijlstra @ 2021-05-14 20:51 ` Daniel Jordan 0 siblings, 0 replies; 15+ messages in thread From: Daniel Jordan @ 2021-05-14 20:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin On Fri, May 14, 2021 at 02:04:14PM +0200, Peter Zijlstra wrote: > On Thu, May 13, 2021 at 09:59:46PM -0400, Daniel Jordan wrote: > > Yes, this does help, I didn't understand why smp_wmb stayed around in > > the original post. > > > > I think the only access smp_store_release() orders is p->type. Wouldn't > > it be kinda inconsistent to only initialize that one field before > > publishing when many others would be done at the end of > > alloc_swap_info() after the fact? p->type doesn't seem special. For > > instance, get_swap_page_of_type() touches si->lock soon after it calls > > swap_type_to_swap_info(), so there could be a small window where there's > > a non-NULL si with an uninitialized lock. > > > > It's not as if this is likely to be a problem in practice, it would just > > make it harder to understand why smp_store_release is there. Maybe all > > we need is a WRITE_ONCE, or if it's really necessary for certain fields > > to be set before publication then move them up and explain? > > You also care about the zero fill from kvzalloc(). Without the > smp_store_release() the zero-fill from the memset() might only be > visible 'late'. Aha, yes, didn't consider that! > Unless that also isn't a problem? No, you're right, we need that for p->flags at least. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-05-13 12:46 ` Peter Zijlstra @ 2021-05-14 3:27 ` Huang, Ying 2021-05-14 3:27 ` Huang, Ying 1 sibling, 0 replies; 15+ messages in thread From: Huang, Ying @ 2021-05-14 3:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin Peter Zijlstra <peterz@infradead.org> writes: > On Thu, May 13, 2021 at 02:48:37PM +0800, Huang Ying wrote: >> mm/swapfile.c | 18 +++++++++--------- >> 1 file changed, 9 insertions(+), 9 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 2aad85751991..4c1fb28bbe0e 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >> >> static struct swap_info_struct *swap_type_to_swap_info(int type) >> { >> - if (type >= READ_ONCE(nr_swapfiles)) >> + if (type >= MAX_SWAPFILES) >> return NULL; >> >> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >> + /* >> + * The data dependency ordering from the READ_ONCE() pairs >> + * with smp_wmb() in alloc_swap_info() to guarantee the >> + * swap_info_struct fields are read after swap_info[type]. >> + */ >> return READ_ONCE(swap_info[type]); >> } >> >> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) >> } >> if (type >= nr_swapfiles) { >> p->type = type; >> - WRITE_ONCE(swap_info[type], p); >> - /* >> - * Write swap_info[type] before nr_swapfiles, in case a >> - * racing procfs swap_start() or swap_next() is reading them. >> - * (We never shrink nr_swapfiles, we never free this entry.) >> - */ >> + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ >> smp_wmb(); >> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >> + WRITE_ONCE(swap_info[type], p); >> + nr_swapfiles++; > > Ah, I think I see what you meant to say, it would perhaps help if you > write it like so: > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 149e77454e3c..94735248dcd2 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > static struct swap_info_struct *swap_type_to_swap_info(int type) > { > - if (type >= READ_ONCE(nr_swapfiles)) > + if (type >= MAX_SWAPFILES) > return NULL; > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > - return READ_ONCE(swap_info[type]); > + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ > } > > static inline unsigned char swap_count(unsigned char ent) > @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) > } > if (type >= nr_swapfiles) { > p->type = type; > - WRITE_ONCE(swap_info[type], p); > /* > - * Write swap_info[type] before nr_swapfiles, in case a > - * racing procfs swap_start() or swap_next() is reading them. > - * (We never shrink nr_swapfiles, we never free this entry.) > + * Publish the swap_info_struct. > */ > - smp_wmb(); > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ > + nr_swapfiles++; > } else { > defer = p; > p = swap_info[type]; OK. It seems that this helps people to understand. I will use this in the next version. Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() @ 2021-05-14 3:27 ` Huang, Ying 0 siblings, 0 replies; 15+ messages in thread From: Huang, Ying @ 2021-05-14 3:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin Peter Zijlstra <peterz@infradead.org> writes: > On Thu, May 13, 2021 at 02:48:37PM +0800, Huang Ying wrote: >> mm/swapfile.c | 18 +++++++++--------- >> 1 file changed, 9 insertions(+), 9 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 2aad85751991..4c1fb28bbe0e 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >> >> static struct swap_info_struct *swap_type_to_swap_info(int type) >> { >> - if (type >= READ_ONCE(nr_swapfiles)) >> + if (type >= MAX_SWAPFILES) >> return NULL; >> >> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >> + /* >> + * The data dependency ordering from the READ_ONCE() pairs >> + * with smp_wmb() in alloc_swap_info() to guarantee the >> + * swap_info_struct fields are read after swap_info[type]. >> + */ >> return READ_ONCE(swap_info[type]); >> } >> >> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) >> } >> if (type >= nr_swapfiles) { >> p->type = type; >> - WRITE_ONCE(swap_info[type], p); >> - /* >> - * Write swap_info[type] before nr_swapfiles, in case a >> - * racing procfs swap_start() or swap_next() is reading them. >> - * (We never shrink nr_swapfiles, we never free this entry.) >> - */ >> + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ >> smp_wmb(); >> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >> + WRITE_ONCE(swap_info[type], p); >> + nr_swapfiles++; > > Ah, I think I see what you meant to say, it would perhaps help if you > write it like so: > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 149e77454e3c..94735248dcd2 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); > > static struct swap_info_struct *swap_type_to_swap_info(int type) > { > - if (type >= READ_ONCE(nr_swapfiles)) > + if (type >= MAX_SWAPFILES) > return NULL; > > - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ > - return READ_ONCE(swap_info[type]); > + return READ_ONCE(swap_info[type]); /* rcu_dereference() */ > } > > static inline unsigned char swap_count(unsigned char ent) > @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void) > } > if (type >= nr_swapfiles) { > p->type = type; > - WRITE_ONCE(swap_info[type], p); > /* > - * Write swap_info[type] before nr_swapfiles, in case a > - * racing procfs swap_start() or swap_next() is reading them. > - * (We never shrink nr_swapfiles, we never free this entry.) > + * Publish the swap_info_struct. > */ > - smp_wmb(); > - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); > + smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */ > + nr_swapfiles++; > } else { > defer = p; > p = swap_info[type]; OK. It seems that this helps people to understand. I will use this in the next version. Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-05-14 20:52 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-13 6:48 [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() Huang Ying 2021-05-13 8:49 ` Miaohe Lin 2021-05-13 9:54 ` Muchun Song 2021-05-13 9:54 ` Muchun Song 2021-05-13 11:27 ` Miaohe Lin 2021-05-13 12:34 ` Peter Zijlstra 2021-05-13 12:46 ` Peter Zijlstra 2021-05-14 1:59 ` Daniel Jordan 2021-05-14 4:02 ` Huang, Ying 2021-05-14 4:02 ` Huang, Ying 2021-05-14 20:49 ` Daniel Jordan 2021-05-14 12:04 ` Peter Zijlstra 2021-05-14 20:51 ` Daniel Jordan 2021-05-14 3:27 ` Huang, Ying 2021-05-14 3:27 ` Huang, Ying
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.