Re: [RFC 1/1] pmem: Add cond_resched() in bio_for_each_segment loop in pmem_make_request

From: Ritesh Harjani <riteshh@linux.ibm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-nvdimm@lists.01.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Subject: Re: [RFC 1/1] pmem: Add cond_resched() in bio_for_each_segment loop in pmem_make_request
Date: Mon, 3 Aug 2020 12:44:04 +0530	[thread overview]
Message-ID: <20200803071405.64C0711C058@d06av25.portsmouth.uk.ibm.com> (raw)
In-Reply-To: <20200802230148.GA2114@dread.disaster.area>

On 8/3/20 4:31 AM, Dave Chinner wrote:
> On Wed, Jul 29, 2020 at 02:15:18PM +0530, Ritesh Harjani wrote:
>> For systems which do not have CONFIG_PREEMPT set and
>> if there is a heavy multi-threaded load/store operation happening
>> on pmem + sometimes along with device latencies, softlockup warnings like
>> this could trigger. This was seen on Power where pagesize is 64K.
>>
>> To avoid softlockup, this patch adds a cond_resched() in this path.
>>
>> <...>
>> watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
>> <...>
>> CPU: 31 PID: 15627 <..> 5.3.18-20
>> <...>
>> NIP memcpy_power7+0x43c/0x7e0
>> LR memcpy_flushcache+0x28/0xa0
>>
>> Call Trace:
>> memcpy_power7+0x274/0x7e0 (unreliable)
>> memcpy_flushcache+0x28/0xa0
>> write_pmem+0xa0/0x100 [nd_pmem]
>> pmem_do_bvec+0x1f0/0x420 [nd_pmem]
>> pmem_make_request+0x14c/0x370 [nd_pmem]
>> generic_make_request+0x164/0x400
>> submit_bio+0x134/0x2e0
>> submit_bio_wait+0x70/0xc0
>> blkdev_issue_zeroout+0xf4/0x2a0
>> xfs_zero_extent+0x90/0xc0 [xfs]
>> xfs_bmapi_convert_unwritten+0x198/0x230 [xfs]
>> xfs_bmapi_write+0x284/0x630 [xfs]
>> xfs_iomap_write_direct+0x1f0/0x3e0 [xfs]
>> xfs_file_iomap_begin+0x344/0x690 [xfs]
>> dax_iomap_pmd_fault+0x488/0xc10
>> __xfs_filemap_fault+0x26c/0x2b0 [xfs]
>> __handle_mm_fault+0x794/0x1af0
>> handle_mm_fault+0x12c/0x220
>> __do_page_fault+0x290/0xe40
>> do_page_fault+0x38/0xc0
>> handle_page_fault+0x10/0x30
>>
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>> ---
>>   drivers/nvdimm/pmem.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
>> index 2df6994acf83..fcf7af13897e 100644
>> --- a/drivers/nvdimm/pmem.c
>> +++ b/drivers/nvdimm/pmem.c
>> @@ -214,6 +214,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
>>   			bio->bi_status = rc;
>>   			break;
>>   		}
>> +		cond_resched();
> 
> There are already cond_resched() calls between submitted bios in
> blkdev_issue_zeroout() via both __blkdev_issue_zero_pages() and
> __blkdev_issue_write_zeroes(), so I'm kinda wondering where the
> problem is coming from here.

This problem is coming from that bio call- submit_bio()

> 
> Just how big is the bio being issued here that it spins for 22s
> trying to copy it?

It's 256 (due to BIO_MAX_PAGES) * 64KB (pagesize) = 16MB.
So this is definitely not an easy trigger as per tester was mainly seen
on a VM.

Looking at the cond_resched() inside dax_writeback_mapping_range()
in xas_for_each_marked() loop, I thought it should be good to have a
cond_resched() in the above path as well.

Hence an RFC for discussion.

> 
> And, really, if the system is that bound on cacheline bouncing that
> it prevents memcpy() from making progress, I think we probably
> should be issuing a soft lockup warning like this... >
> Cheers,
> 
> Dave.
>