linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* circular locking splat in fs/proc/vmcore.c
@ 2022-02-14 15:22 Sven Schnelle
  2022-02-14 15:58 ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Sven Schnelle @ 2022-02-14 15:22 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-kernel

Hi David,

i've seen the following lockdep splat in CI on one of our systems:

[   25.964518] kdump[727]: saving vmcore-dmesg.txt complete
[   26.049877]
[   26.049879] ======================================================
[   26.049881] WARNING: possible circular locking dependency detected
[   26.049883] 5.17.0-20220211.rc3.git2.2636bbc7cadf.300.fc35.s390x+debug #1 Tainted: G        W
[   26.049885] ------------------------------------------------------
[   26.049886] makedumpfile/730 is trying to acquire lock:
[   26.049887] 0000000001a25720 (vmcore_cb_rwsem){.+.+}-{3:3}, at: mmap_vmcore+0x148/0x458
[   26.049896]
[   26.049896] but task is already holding lock:
[   26.049897] 0000000013539d28 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x8e/0x170
[   26.049904]
[   26.049904] which lock already depends on the new lock.
[   26.049904]
[   26.049906]
[   26.049906] the existing dependency chain (in reverse order) is:
[   26.049907]
[   26.049907] -> #1 (&mm->mmap_lock){++++}-{3:3}:
[   26.049910]        __lock_acquire+0x604/0xbd8
[   26.049914]        lock_acquire.part.0+0xe2/0x250
[   26.049916]        lock_acquire+0xb0/0x200
[   26.049918]        __might_fault+0x70/0xa0
[   26.049921]        copy_to_user_real+0x8e/0xf8
[   26.049925]        copy_oldmem_page+0xc0/0x158
[   26.049930]        read_from_oldmem.part.0+0x14c/0x1b8
[   26.049932]        __read_vmcore+0x116/0x1f8
[   26.049933]        proc_reg_read+0x9a/0xf0
[   26.049938]        vfs_read+0x94/0x1a8
[   25.973256] kdump[729]: saving vmcore
[   26.049941]        __s390x_sys_pread64+0x90/0xc8
[   26.049958]        __do_syscall+0x1da/0x208
[   26.049963]        system_call+0x82/0xb0
[   26.049967]
[   26.049967] -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
[   26.049971]        check_prev_add+0xe0/0xed8
[   26.049972]        validate_chain+0x736/0xb20
[   26.049974]        __lock_acquire+0x604/0xbd8
[   26.049976]        lock_acquire.part.0+0xe2/0x250
[   26.049978]        lock_acquire+0xb0/0x200
[   26.049980]        down_read+0x5e/0x180
[   26.049982]        mmap_vmcore+0x148/0x458
[   26.049983]        proc_reg_mmap+0x8e/0xe0
[   26.049985]        mmap_region+0x412/0x668
[   26.049988]        do_mmap+0x3ec/0x4d0
[   26.049989]        vm_mmap_pgoff+0xd4/0x170
[   26.049992]        ksys_mmap_pgoff+0x1d8/0x228
[   26.049994]        __s390x_sys_old_mmap+0xa4/0xb8
[   26.049995]        __do_syscall+0x1da/0x208
[   26.049997]        system_call+0x82/0xb0
[   26.049999]
[   26.049999] other info that might help us debug this:
[   26.049999]
[   26.050001]  Possible unsafe locking scenario:
[   26.050001]
[   26.050002]        CPU0                    CPU1
[   26.050003]        ----                    ----
[   26.050004]   lock(&mm->mmap_lock);
[   26.050006]                                lock(vmcore_cb_rwsem);
[   26.050008]                                lock(&mm->mmap_lock);
[   26.050010]   lock(vmcore_cb_rwsem);
[   26.050012]
[   26.050012]  *** DEADLOCK ***
[   26.050012]
[   26.050013] 1 lock held by makedumpfile/730:
[   26.050015]  #0: 0000000013539d28 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x8e/0x170

I think this was introduced with cc5f2704c934 ("proc/vmcore: convert
oldmem_pfn_is_ram callback to more generic vmcore callbacks")

One fix might be to move the vmcore_cb_rwsem into the loop around the
pfn_is_ram(). But this would likely slow down things. So the diff would
look like: (UNTESTED)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 702754dd1daf..4acd91507d21 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -133,6 +133,7 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 	unsigned long pfn, offset;
 	size_t nr_bytes;
 	ssize_t read = 0, tmp;
+	int is_ram;
 
 	if (!count)
 		return 0;
@@ -140,7 +141,6 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 	offset = (unsigned long)(*ppos % PAGE_SIZE);
 	pfn = (unsigned long)(*ppos / PAGE_SIZE);
 
-	down_read(&vmcore_cb_rwsem);
 	do {
 		if (count > (PAGE_SIZE - offset))
 			nr_bytes = PAGE_SIZE - offset;
@@ -148,7 +148,10 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 			nr_bytes = count;
 
 		/* If pfn is not ram, return zeros for sparse dump files */
-		if (!pfn_is_ram(pfn)) {
+		down_read(&vmcore_cb_rwsem);
+		is_ram = pfn_is_ram(pfn);
+		up_read(&vmcore_cb_rwsem);
+		if (!is_ram) {
 			tmp = 0;
 			if (!userbuf)
 				memset(buf, 0, nr_bytes);
@@ -164,10 +167,8 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 				tmp = copy_oldmem_page(pfn, buf, nr_bytes,
 						       offset, userbuf);
 		}
-		if (tmp < 0) {
-			up_read(&vmcore_cb_rwsem);
+		if (tmp < 0)
 			return tmp;
-		}
 
 		*ppos += nr_bytes;
 		count -= nr_bytes;
@@ -177,7 +178,6 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 		offset = 0;
 	} while (count);
 
-	up_read(&vmcore_cb_rwsem);
 	return read;
 }
 
I think we could also switch the list to an rcu protected list, but i
don't know the code really. Any opinions how to fix this?

Thanks
Sven

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: circular locking splat in fs/proc/vmcore.c
  2022-02-14 15:22 circular locking splat in fs/proc/vmcore.c Sven Schnelle
@ 2022-02-14 15:58 ` David Hildenbrand
  2022-02-14 16:03   ` Sven Schnelle
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2022-02-14 15:58 UTC (permalink / raw)
  To: Sven Schnelle; +Cc: linux-kernel

On 14.02.22 16:22, Sven Schnelle wrote:
> Hi David,
> 
> i've seen the following lockdep splat in CI on one of our systems:
> 
> [   25.964518] kdump[727]: saving vmcore-dmesg.txt complete
> [   26.049877]
> [   26.049879] ======================================================
> [   26.049881] WARNING: possible circular locking dependency detected
> [   26.049883] 5.17.0-20220211.rc3.git2.2636bbc7cadf.300.fc35.s390x+debug #1 Tainted: G        W
> [   26.049885] ------------------------------------------------------
> [   26.049886] makedumpfile/730 is trying to acquire lock:
> [   26.049887] 0000000001a25720 (vmcore_cb_rwsem){.+.+}-{3:3}, at: mmap_vmcore+0x148/0x458
> [   26.049896]
> [   26.049896] but task is already holding lock:
> [   26.049897] 0000000013539d28 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x8e/0x170
> [   26.049904]
> [   26.049904] which lock already depends on the new lock.
> [   26.049904]
> [   26.049906]
> [   26.049906] the existing dependency chain (in reverse order) is:
> [   26.049907]
> [   26.049907] -> #1 (&mm->mmap_lock){++++}-{3:3}:
> [   26.049910]        __lock_acquire+0x604/0xbd8
> [   26.049914]        lock_acquire.part.0+0xe2/0x250
> [   26.049916]        lock_acquire+0xb0/0x200
> [   26.049918]        __might_fault+0x70/0xa0
> [   26.049921]        copy_to_user_real+0x8e/0xf8
> [   26.049925]        copy_oldmem_page+0xc0/0x158
> [   26.049930]        read_from_oldmem.part.0+0x14c/0x1b8
> [   26.049932]        __read_vmcore+0x116/0x1f8
> [   26.049933]        proc_reg_read+0x9a/0xf0
> [   26.049938]        vfs_read+0x94/0x1a8
> [   25.973256] kdump[729]: saving vmcore
> [   26.049941]        __s390x_sys_pread64+0x90/0xc8
> [   26.049958]        __do_syscall+0x1da/0x208
> [   26.049963]        system_call+0x82/0xb0
> [   26.049967]
> [   26.049967] -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
> [   26.049971]        check_prev_add+0xe0/0xed8
> [   26.049972]        validate_chain+0x736/0xb20
> [   26.049974]        __lock_acquire+0x604/0xbd8
> [   26.049976]        lock_acquire.part.0+0xe2/0x250
> [   26.049978]        lock_acquire+0xb0/0x200
> [   26.049980]        down_read+0x5e/0x180
> [   26.049982]        mmap_vmcore+0x148/0x458
> [   26.049983]        proc_reg_mmap+0x8e/0xe0
> [   26.049985]        mmap_region+0x412/0x668
> [   26.049988]        do_mmap+0x3ec/0x4d0
> [   26.049989]        vm_mmap_pgoff+0xd4/0x170
> [   26.049992]        ksys_mmap_pgoff+0x1d8/0x228
> [   26.049994]        __s390x_sys_old_mmap+0xa4/0xb8
> [   26.049995]        __do_syscall+0x1da/0x208
> [   26.049997]        system_call+0x82/0xb0
> [   26.049999]
> [   26.049999] other info that might help us debug this:
> [   26.049999]
> [   26.050001]  Possible unsafe locking scenario:
> [   26.050001]
> [   26.050002]        CPU0                    CPU1
> [   26.050003]        ----                    ----
> [   26.050004]   lock(&mm->mmap_lock);
> [   26.050006]                                lock(vmcore_cb_rwsem);
> [   26.050008]                                lock(&mm->mmap_lock);
> [   26.050010]   lock(vmcore_cb_rwsem);
> [   26.050012]
> [   26.050012]  *** DEADLOCK ***
> [   26.050012]
> [   26.050013] 1 lock held by makedumpfile/730:
> [   26.050015]  #0: 0000000013539d28 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x8e/0x170
> 
> I think this was introduced with cc5f2704c934 ("proc/vmcore: convert
> oldmem_pfn_is_ram callback to more generic vmcore callbacks")
> 
> One fix might be to move the vmcore_cb_rwsem into the loop around the
> pfn_is_ram(). But this would likely slow down things. So the diff would
> look like: (UNTESTED)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 702754dd1daf..4acd91507d21 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -133,6 +133,7 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>  	unsigned long pfn, offset;
>  	size_t nr_bytes;
>  	ssize_t read = 0, tmp;
> +	int is_ram;
>  
>  	if (!count)
>  		return 0;
> @@ -140,7 +141,6 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>  	offset = (unsigned long)(*ppos % PAGE_SIZE);
>  	pfn = (unsigned long)(*ppos / PAGE_SIZE);
>  
> -	down_read(&vmcore_cb_rwsem);
>  	do {
>  		if (count > (PAGE_SIZE - offset))
>  			nr_bytes = PAGE_SIZE - offset;
> @@ -148,7 +148,10 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>  			nr_bytes = count;
>  
>  		/* If pfn is not ram, return zeros for sparse dump files */
> -		if (!pfn_is_ram(pfn)) {
> +		down_read(&vmcore_cb_rwsem);
> +		is_ram = pfn_is_ram(pfn);
> +		up_read(&vmcore_cb_rwsem);
> +		if (!is_ram) {
>  			tmp = 0;
>  			if (!userbuf)
>  				memset(buf, 0, nr_bytes);
> @@ -164,10 +167,8 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>  				tmp = copy_oldmem_page(pfn, buf, nr_bytes,
>  						       offset, userbuf);
>  		}
> -		if (tmp < 0) {
> -			up_read(&vmcore_cb_rwsem);
> +		if (tmp < 0)
>  			return tmp;
> -		}
>  
>  		*ppos += nr_bytes;
>  		count -= nr_bytes;
> @@ -177,7 +178,6 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>  		offset = 0;
>  	} while (count);
>  
> -	up_read(&vmcore_cb_rwsem);
>  	return read;
>  }
>  
> I think we could also switch the list to an rcu protected list, but i
> don't know the code really. Any opinions how to fix this?
> 

Hi Sven,

did you stumble over

https://lkml.kernel.org/r/20220119193417.100385-1-david@redhat.com

yet?

It should be fixing the (mostly impossible to trigger) splat you've seen
--  via sleepable rcu :)

The fix is scheduled for v5.18.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: circular locking splat in fs/proc/vmcore.c
  2022-02-14 15:58 ` David Hildenbrand
@ 2022-02-14 16:03   ` Sven Schnelle
  0 siblings, 0 replies; 3+ messages in thread
From: Sven Schnelle @ 2022-02-14 16:03 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-kernel

Hi David,

David Hildenbrand <david@redhat.com> writes:
> On 14.02.22 16:22, Sven Schnelle wrote:
>> I think we could also switch the list to an rcu protected list, but i
>> don't know the code really. Any opinions how to fix this?
>> 
> did you stumble over
>
> https://lkml.kernel.org/r/20220119193417.100385-1-david@redhat.com
>
> yet?
>
> It should be fixing the (mostly impossible to trigger) splat you've seen
> --  via sleepable rcu :)
>
> The fix is scheduled for v5.18.

No, i missed that. Thank you very much!

/Sven

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-14 16:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 15:22 circular locking splat in fs/proc/vmcore.c Sven Schnelle
2022-02-14 15:58 ` David Hildenbrand
2022-02-14 16:03   ` Sven Schnelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).