All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng
@ 2023-08-02 12:54 Bagas Sanjaya
  2023-08-03  6:06 ` Aaron Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Bagas Sanjaya @ 2023-08-02 12:54 UTC (permalink / raw)
  To: Aaron Lu, Andrew Morton, Linus Torvalds, Colin Ian King
  Cc: Linux Kernel Mailing List, Linux Memory Management List

Hi,

I notice a bug report on Bugzilla [1]. Quoting from it:

> How to reproduce:
> 
> Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
> 
> using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
> (see https://github.com/ColinIanKing/stress-ng )
> 
> build and run stress-ng for say 60 minutes:
> 
> ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
> 
> Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
> 
> See attached file for an image of the console on the hang (I'm trying to get the full stack dump).

See Bugzilla for the full thread and attached console image.

FWIW, I have to forward this bug report to the mailing lists because
Thorsten noted that many developers don't take a look on Bugzilla
(see the BZ thread).

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217738

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng
  2023-08-02 12:54 Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng Bagas Sanjaya
@ 2023-08-03  6:06 ` Aaron Lu
  2023-08-03 13:41   ` Aaron Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Lu @ 2023-08-03  6:06 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Andrew Morton, Linus Torvalds, Colin Ian King,
	Linux Kernel Mailing List, Linux Memory Management List

On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a bug report on Bugzilla [1]. Quoting from it:
>
> > How to reproduce:
> > 
> > Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
> > 
> > using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
> > (see https://github.com/ColinIanKing/stress-ng )
> > 
> > build and run stress-ng for say 60 minutes:
> > 
> > ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
> > 
> > Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
> > 
> > See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
> 
> See Bugzilla for the full thread and attached console image.
> 
> FWIW, I have to forward this bug report to the mailing lists because
> Thorsten noted that many developers don't take a look on Bugzilla
> (see the BZ thread).

Thanks.

I can reproduce this issue using below cmdline:
$ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m

I'll investigate what is happening.

> 
> Thanks.
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217738
> 
> -- 
> An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng
  2023-08-03  6:06 ` Aaron Lu
@ 2023-08-03 13:41   ` Aaron Lu
  2023-08-03 14:36     ` Colin King (gmail)
  2023-08-03 15:04     ` Kefeng Wang
  0 siblings, 2 replies; 6+ messages in thread
From: Aaron Lu @ 2023-08-03 13:41 UTC (permalink / raw)
  To: Bagas Sanjaya, Colin Ian King
  Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List,
	Linux Memory Management List

On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
> On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
> > Hi,
> > 
> > I notice a bug report on Bugzilla [1]. Quoting from it:
> >
> > > How to reproduce:
> > > 
> > > Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
> > > 
> > > using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
> > > (see https://github.com/ColinIanKing/stress-ng )
> > > 
> > > build and run stress-ng for say 60 minutes:
> > > 
> > > ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
> > > 
> > > Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
> > > 
> > > See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
> > 
> > See Bugzilla for the full thread and attached console image.
> > 
> > FWIW, I have to forward this bug report to the mailing lists because
> > Thorsten noted that many developers don't take a look on Bugzilla
> > (see the BZ thread).
> 
> Thanks.
> 
> I can reproduce this issue using below cmdline:
> $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m
> 
> I'll investigate what is happening.

Hi Colin,

Can you try the below diff on top of v6.5-rc4? It works for me here
although I got the warn in a different place in get_swap_pages(): 

                        WARN(!si->highest_bit,
                             "swap_info %d in list but !highest_bit\n",
                             si->type);

I think the warn you got in add_to_avail_list() due to the swap device
is already in the list is similar, see below explanation.

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8e6dde68b389..cb7e93ec1933 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
 	 * swap_info_struct.
 	 */
 	plist_add(&p->list, &swap_active_head);
-	add_to_avail_list(p);
+	if (p->highest_bit)
+		add_to_avail_list(p);
 }
 
 static void enable_swap_info(struct swap_info_struct *p, int prio,

The finding is, if a swap device failed to be swapoff, then it will be
reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
problem is, this swap device may run out of space with its highest_bit
being 0 and shouldn't be added to avail list. In your case, once its
highest_bit becomes non-zero, it will go through add_to_avail_list()
and since it's already in the list, thus the warn.

If it works for you, I'll prepare a patch. Thanks.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng
  2023-08-03 13:41   ` Aaron Lu
@ 2023-08-03 14:36     ` Colin King (gmail)
  2023-08-03 15:04     ` Kefeng Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Colin King (gmail) @ 2023-08-03 14:36 UTC (permalink / raw)
  To: Aaron Lu, Bagas Sanjaya
  Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List,
	Linux Memory Management List

Hi Aaron,

Thanks for the speedy fix. I've tested this for a couple of 10 minute 
soak test and can't reproduce the issue with the fix, so it looks good 
to me, so please add:

Tested-by: Colin Ian King <colin.i.king@gmail.com>

Colin

On 03/08/2023 14:41, Aaron Lu wrote:
> On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
>> On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
>>> Hi,
>>>
>>> I notice a bug report on Bugzilla [1]. Quoting from it:
>>>
>>>> How to reproduce:
>>>>
>>>> Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
>>>>
>>>> using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
>>>> (see https://github.com/ColinIanKing/stress-ng )
>>>>
>>>> build and run stress-ng for say 60 minutes:
>>>>
>>>> ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
>>>>
>>>> Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
>>>>
>>>> See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
>>>
>>> See Bugzilla for the full thread and attached console image.
>>>
>>> FWIW, I have to forward this bug report to the mailing lists because
>>> Thorsten noted that many developers don't take a look on Bugzilla
>>> (see the BZ thread).
>>
>> Thanks.
>>
>> I can reproduce this issue using below cmdline:
>> $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m
>>
>> I'll investigate what is happening.
> 
> Hi Colin,
> 
> Can you try the below diff on top of v6.5-rc4? It works for me here
> although I got the warn in a different place in get_swap_pages():
> 
>                          WARN(!si->highest_bit,
>                               "swap_info %d in list but !highest_bit\n",
>                               si->type);
> 
> I think the warn you got in add_to_avail_list() due to the swap device
> is already in the list is similar, see below explanation.
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 8e6dde68b389..cb7e93ec1933 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
>   	 * swap_info_struct.
>   	 */
>   	plist_add(&p->list, &swap_active_head);
> -	add_to_avail_list(p);
> +	if (p->highest_bit)
> +		add_to_avail_list(p);
>   }
>   
>   static void enable_swap_info(struct swap_info_struct *p, int prio,
> 
> The finding is, if a swap device failed to be swapoff, then it will be
> reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
> problem is, this swap device may run out of space with its highest_bit
> being 0 and shouldn't be added to avail list. In your case, once its
> highest_bit becomes non-zero, it will go through add_to_avail_list()
> and since it's already in the list, thus the warn.
> 
> If it works for you, I'll prepare a patch. Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng
  2023-08-03 13:41   ` Aaron Lu
  2023-08-03 14:36     ` Colin King (gmail)
@ 2023-08-03 15:04     ` Kefeng Wang
  2023-08-04  1:58       ` Lu, Aaron
  1 sibling, 1 reply; 6+ messages in thread
From: Kefeng Wang @ 2023-08-03 15:04 UTC (permalink / raw)
  To: Aaron Lu, Bagas Sanjaya, Colin Ian King
  Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List,
	Linux Memory Management List



On 2023/8/3 21:41, Aaron Lu wrote:
> On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
>> On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
>>> Hi,
>>>
>>> I notice a bug report on Bugzilla [1]. Quoting from it:
>>>
>>>> How to reproduce:
>>>>
>>>> Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
>>>>
>>>> using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
>>>> (see https://github.com/ColinIanKing/stress-ng )
>>>>
>>>> build and run stress-ng for say 60 minutes:
>>>>
>>>> ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
>>>>
>>>> Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
>>>>
>>>> See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
>>>
>>> See Bugzilla for the full thread and attached console image.
>>>
>>> FWIW, I have to forward this bug report to the mailing lists because
>>> Thorsten noted that many developers don't take a look on Bugzilla
>>> (see the BZ thread).
>>
>> Thanks.
>>
>> I can reproduce this issue using below cmdline:
>> $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m
>>
>> I'll investigate what is happening.
> 
> Hi Colin,
> 
> Can you try the below diff on top of v6.5-rc4? It works for me here
> although I got the warn in a different place in get_swap_pages():
> 
>                          WARN(!si->highest_bit,
>                               "swap_info %d in list but !highest_bit\n",
>                               si->type);
> 
> I think the warn you got in add_to_avail_list() due to the swap device
> is already in the list is similar, see below explanation.
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 8e6dde68b389..cb7e93ec1933 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
>   	 * swap_info_struct.
>   	 */
>   	plist_add(&p->list, &swap_active_head);
> -	add_to_avail_list(p);
> +	if (p->highest_bit)
> +		add_to_avail_list(p);
>   }

There is a patch in next,

commit bdfc7028681ddbce5ab08f4888d157a981060544
Author: Ma Wupeng <mawupeng1@huawei.com>
Date:   Tue Jun 27 20:08:33 2023 +0800

     swap: stop add to avail list if swap is full



>   
>   static void enable_swap_info(struct swap_info_struct *p, int prio,
> 
> The finding is, if a swap device failed to be swapoff, then it will be
> reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
> problem is, this swap device may run out of space with its highest_bit
> being 0 and shouldn't be added to avail list. In your case, once its
> highest_bit becomes non-zero, it will go through add_to_avail_list()
> and since it's already in the list, thus the warn.
> 
> If it works for you, I'll prepare a patch. Thanks.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng
  2023-08-03 15:04     ` Kefeng Wang
@ 2023-08-04  1:58       ` Lu, Aaron
  0 siblings, 0 replies; 6+ messages in thread
From: Lu, Aaron @ 2023-08-04  1:58 UTC (permalink / raw)
  To: wangkefeng.wang, colin.i.king, bagasdotme
  Cc: Torvalds, Linus, linux-mm, linux-kernel, akpm

On Thu, 2023-08-03 at 23:04 +0800, Kefeng Wang wrote:
> 
> On 2023/8/3 21:41, Aaron Lu wrote:
> > On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
> > > On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
> > > > Hi,
> > > > 
> > > > I notice a bug report on Bugzilla [1]. Quoting from it:
> > > > 
> > > > > How to reproduce:
> > > > > 
> > > > > Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
> > > > > 
> > > > > using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
> > > > > (see https://github.com/ColinIanKing/stress-ng )
> > > > > 
> > > > > build and run stress-ng for say 60 minutes:
> > > > > 
> > > > > ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
> > > > > 
> > > > > Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
> > > > > 
> > > > > See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
> > > > 
> > > > See Bugzilla for the full thread and attached console image.
> > > > 
> > > > FWIW, I have to forward this bug report to the mailing lists because
> > > > Thorsten noted that many developers don't take a look on Bugzilla
> > > > (see the BZ thread).
> > > 
> > > Thanks.
> > > 
> > > I can reproduce this issue using below cmdline:
> > > $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m
> > > 
> > > I'll investigate what is happening.
> > 
> > Hi Colin,
> > 
> > Can you try the below diff on top of v6.5-rc4? It works for me here
> > although I got the warn in a different place in get_swap_pages():
> > 
> >                          WARN(!si->highest_bit,
> >                               "swap_info %d in list but !highest_bit\n",
> >                               si->type);
> > 
> > I think the warn you got in add_to_avail_list() due to the swap device
> > is already in the list is similar, see below explanation.
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 8e6dde68b389..cb7e93ec1933 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
> >   	 * swap_info_struct.
> >   	 */
> >   	plist_add(&p->list, &swap_active_head);
> > -	add_to_avail_list(p);
> > +	if (p->highest_bit)
> > +		add_to_avail_list(p);
> >   }
> 
> There is a patch in next,
> 
> commit bdfc7028681ddbce5ab08f4888d157a981060544
> Author: Ma Wupeng <mawupeng1@huawei.com>
> Date:   Tue Jun 27 20:08:33 2023 +0800
> 
>      swap: stop add to avail list if swap is full
> 

Ah, should have tried mm-unstable first.

I took a look at that commit and it's exact the same issue and same fix
so with that fix, we are good now.

> 
> 
> >   
> >   static void enable_swap_info(struct swap_info_struct *p, int prio,
> > 
> > The finding is, if a swap device failed to be swapoff, then it will be
> > reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
> > problem is, this swap device may run out of space with its highest_bit
> > being 0 and shouldn't be added to avail list. In your case, once its
> > highest_bit becomes non-zero, it will go through add_to_avail_list()
> > and since it's already in the list, thus the warn.
> > 
> > If it works for you, I'll prepare a patch. Thanks.
> > 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-08-04  1:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-02 12:54 Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng Bagas Sanjaya
2023-08-03  6:06 ` Aaron Lu
2023-08-03 13:41   ` Aaron Lu
2023-08-03 14:36     ` Colin King (gmail)
2023-08-03 15:04     ` Kefeng Wang
2023-08-04  1:58       ` Lu, Aaron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.