All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaro Koskinen <aaro.koskinen-X3B1VOXEql0@public.gmane.org>
To: Stefan Wahren <stefan.wahren-eS4NqCHxEME@public.gmane.org>
Cc: linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: Re: Linux v5.0-rc7: bcm2835 MMC issues
Date: Fri, 29 Mar 2019 21:19:26 +0200	[thread overview]
Message-ID: <20190329191926.GG16484@darkstar.musicnaut.iki.fi> (raw)
In-Reply-To: <20190328225818.GE16484-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>

Hi,

On Fri, Mar 29, 2019 at 12:58:18AM +0200, Aaro Koskinen wrote:
> On Sun, Mar 10, 2019 at 02:51:31AM +0200, Aaro Koskinen wrote:
> > On Sat, Mar 09, 2019 at 11:57:57AM +0100, Stefan Wahren wrote:
> > > > Aaro Koskinen <aaro.koskinen-X3B1VOXEql0@public.gmane.org> hat am 27. Februar 2019 um 19:51 geschrieben:
> > > > On Tue, Feb 26, 2019 at 09:31:14AM +0100, Stefan Wahren wrote:
> > > > > it will take some time for to setup this test scenario. Could you please
> > > > > do me a favor and test 4.20.12 which has some backports of recent mmc /
> > > > > DMA fixes.
> > > > 
> > > > Both 4.20 and 4.20.12 work fine. Only 5.0-rc fails reliably.
> > > 
> > > I tried to reproduce the issue by compiling gcc and using stress on
> > > Raspberry Pi 3 (arm64/defconfig) with Arch Linux ARM without any luck.
> > > 
> > > Were you able to reproduce the issue using stress?
> > 
> > No, not yet. I'll let you know if I'm able to come up with a more reliable
> > reproducer.
> 
> I tried GCC bootstrap again with 5.1-rc2 and LOCKDEP enabled, and get
> the below warning. Might some different unrelated issue, however.

So with 5.1-rc2, the GCC bootstrap & testsuite went fine (some 20 hours)
without any MMC timeout errors or lockups. Also I think the below may
be the cause of the earlier problems I had:

> [ 1164.390902] 
> [ 1164.398302] ======================================================
> [ 1164.416501] WARNING: possible circular locking dependency detected
> [ 1164.434710] 5.1.0-rc2-rpi3-los_6ba38c+-00247-g9936328b41ce-dirty #1 Not tainted
> [ 1164.454495] ------------------------------------------------------
> [ 1164.472908] cc1plus/30873 is trying to acquire lock:
> [ 1164.489750] 0000000040a8ff57 (&mq->complete_lock){+.+.}, at: mmc_blk_mq_complete_prev_req.part.12+0x3c/0x220
> [ 1164.518548] 
> [ 1164.518548] but task is already holding lock:
> [ 1164.541662] 0000000059d7e9bb (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.19+0x0/0x40
> [ 1164.567105] 
> [ 1164.567105] which lock already depends on the new lock.
> [ 1164.567105] 
> [ 1164.595691] 
> [ 1164.595691] the existing dependency chain (in reverse order) is:
> [ 1164.616711] 
> [ 1164.616711] -> #2 (fs_reclaim){+.+.}:
> [ 1164.630507]        lock_acquire+0xe8/0x250
> [ 1164.638922]        fs_reclaim_acquire.part.19+0x34/0x40
> [ 1164.652170]        fs_reclaim_acquire+0x20/0x28
> [ 1164.665139]        __kmalloc+0x50/0x390
> [ 1164.673717]        bcm2835_dma_create_cb_chain+0x70/0x270
                                 
I think the bug is that it's using GFP_KERNEL here.

> [ 1164.688863]        bcm2835_dma_prep_slave_sg+0xf4/0x280
> [ 1164.704061]        bcm2835_request+0x300/0x3f8
> [ 1164.718923]        __mmc_start_request+0xb8/0x2b0
> [ 1164.734415]        mmc_start_request+0x74/0x98
> [ 1164.749591]        mmc_blk_mq_issue_rq+0x328/0x774
> [ 1164.765346]        mmc_mq_queue_rq+0x108/0x278
> [ 1164.780772]        blk_mq_try_issue_directly+0x13c/0x240
> [ 1164.797298]        blk_mq_make_request+0x494/0x820
> [ 1164.813224]        generic_make_request+0x25c/0x4a8
> [ 1164.829126]        submit_bio+0x38/0x1c0
> [ 1164.838778]        submit_bh_wbc.isra.11+0x168/0x1d0
> [ 1164.854312]        block_read_full_page+0x298/0x3a8
> [ 1164.869741]        blkdev_readpage+0x18/0x20
> [ 1164.884479]        do_read_cache_page+0x1b8/0x378
> [ 1164.899756]        read_cache_page+0x10/0x18
> [ 1164.914504]        read_dev_sector+0x34/0xb8
> [ 1164.929506]        msdos_partition+0x68/0x628
> [ 1164.944814]        check_partition+0x108/0x210
> [ 1164.960213]        rescan_partitions+0xb8/0x3a8
> [ 1164.975682]        __blkdev_get+0x2f0/0x3f8
> [ 1164.990768]        blkdev_get+0x148/0x340
> [ 1165.000495]        __device_add_disk+0x3ac/0x458
> [ 1165.015811]        device_add_disk+0x10/0x18
> [ 1165.030725]        mmc_add_disk+0x34/0x11c
> [ 1165.040460]        mmc_blk_probe+0x25c/0x640
> [ 1165.055042]        mmc_bus_probe+0x1c/0x28
> [ 1165.064621]        really_probe+0xd8/0x298
> [ 1165.074075]        driver_probe_device+0x54/0xe8
> [ 1165.088485]        __device_attach_driver+0x9c/0xd8
> [ 1165.103403]        bus_for_each_drv+0x78/0xc8
> [ 1165.117702]        __device_attach+0xd4/0x130
> [ 1165.132180]        device_initial_probe+0x10/0x18
> [ 1165.147239]        bus_probe_device+0x98/0xa0
> [ 1165.162333]        device_add+0x3ac/0x5d8
> [ 1165.172090]        mmc_add_card+0x1f8/0x2c8
> [ 1165.186913]        mmc_attach_sd+0xec/0x168
> [ 1165.201705]        mmc_rescan+0x298/0x370
> [ 1165.211301]        process_one_work+0x2ac/0x6f8
> [ 1165.226087]        worker_thread+0x40/0x448
> [ 1165.240461]        kthread+0x128/0x130
> [ 1165.249507]        ret_from_fork+0x10/0x1c
> [ 1165.258809] 
> [ 1165.258809] -> #1 (&host->mutex){+.+.}:
> [ 1165.273924]        lock_acquire+0xe8/0x250
> [ 1165.283052]        __mutex_lock+0x8c/0x840
> [ 1165.291983]        mutex_lock_nested+0x1c/0x28
> [ 1165.305449]        bcm2835_request+0xc0/0x3f8
> [ 1165.318987]        __mmc_start_request+0xb8/0x2b0
> [ 1165.333011]        mmc_start_request+0x74/0x98
> [ 1165.347223]        mmc_wait_for_req+0x68/0xe8
> [ 1165.361806]        mmc_wait_for_cmd+0x7c/0xa8
> [ 1165.376982]        __mmc_send_status+0x70/0xa0
> [ 1165.392393]        card_busy_detect.isra.7+0x68/0x120
> [ 1165.408500]        mmc_blk_mq_complete_prev_req.part.12+0x134/0x220
> [ 1165.426099]        mmc_blk_mq_complete_work+0x2c/0x38
> [ 1165.442580]        process_one_work+0x2ac/0x6f8
> [ 1165.458455]        worker_thread+0x40/0x448
> [ 1165.473919]        kthread+0x128/0x130
> [ 1165.483508]        ret_from_fork+0x10/0x1c
> [ 1165.493342] 
> [ 1165.493342] -> #0 (&mq->complete_lock){+.+.}:
> [ 1165.514773]        __lock_acquire+0xcbc/0x12a8
> [ 1165.529403]        lock_acquire+0xe8/0x250
> [ 1165.538896]        __mutex_lock+0x8c/0x840
> [ 1165.548198]        mutex_lock_nested+0x1c/0x28
> [ 1165.562147]        mmc_blk_mq_complete_prev_req.part.12+0x3c/0x220
> [ 1165.578490]        mmc_blk_rw_wait+0x88/0x150
> [ 1165.593514]        mmc_blk_mq_issue_rq+0x30c/0x774
> [ 1165.609020]        mmc_mq_queue_rq+0x108/0x278
> [ 1165.624125]        blk_mq_try_issue_directly+0x13c/0x240
> [ 1165.640763]        blk_mq_try_issue_list_directly+0x7c/0xe0
> [ 1165.657851]        blk_mq_sched_insert_requests+0x84/0xa0
> [ 1165.674561]        blk_mq_flush_plug_list+0x22c/0x4f0
> [ 1165.690782]        blk_flush_plug_list+0xd4/0x100
> [ 1165.706603]        blk_finish_plug+0x30/0x40
> [ 1165.721924]        shrink_node_memcg.constprop.19+0x4f0/0x6d0
> [ 1165.739046]        shrink_node+0x7c/0x300
> [ 1165.748985]        try_to_free_pages+0x1d8/0x6b0
> [ 1165.764594]        __alloc_pages_nodemask+0x4c8/0x10c0
> [ 1165.780768]        __handle_mm_fault+0x378/0xb10
> [ 1165.796317]        handle_mm_fault+0x98/0xe0
> [ 1165.811461]        do_page_fault+0x120/0x460
> [ 1165.826546]        do_translation_fault+0x50/0x58
> [ 1165.842096]        do_mem_abort+0x3c/0x98
> [ 1165.851830]        el0_da+0x20/0x24
> [ 1165.860835] 
> [ 1165.860835] other info that might help us debug this:
> [ 1165.860835] 
> [ 1165.888360] Chain exists of:
> [ 1165.888360]   &mq->complete_lock --> &host->mutex --> fs_reclaim
> [ 1165.888360] 
> [ 1165.916434]  Possible unsafe locking scenario:
> [ 1165.916434] 
> [ 1165.934797]        CPU0                    CPU1
> [ 1165.947234]        ----                    ----
> [ 1165.959140]   lock(fs_reclaim);
> [ 1165.966281]                                lock(&host->mutex);
> [ 1165.979605]                                lock(fs_reclaim);
> [ 1165.992221]   lock(&mq->complete_lock);
> [ 1165.999628] 
> [ 1165.999628]  *** DEADLOCK ***
> [ 1165.999628] 
> [ 1166.015913] 3 locks held by cc1plus/30873:
> [ 1166.024001]  #0: 00000000163613c2 (&mm->mmap_sem){++++}, at: do_page_fault+0xc8/0x460
> [ 1166.040348]  #1: 0000000059d7e9bb (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.19+0x0/0x40
> [ 1166.063019]  #2: 0000000081e56ff0 (hctx->srcu){....}, at: hctx_lock+0x70/0xe8
> [ 1166.080956] 
> [ 1166.080956] stack backtrace:
> [ 1166.095758] CPU: 3 PID: 30873 Comm: cc1plus Not tainted 5.1.0-rc2-rpi3-los_6ba38c+-00247-g9936328b41ce-dirty #1
> [ 1166.123148] Hardware name: Raspberry Pi 3 Model B (DT)
> [ 1166.139641] Call trace:
> [ 1166.147494]  dump_backtrace+0x0/0x120
> [ 1166.156570]  show_stack+0x14/0x20
> [ 1166.165072]  dump_stack+0xd4/0x11c
> [ 1166.173320]  print_circular_bug.isra.20+0x26c/0x2d8
> [ 1166.187945]  check_prev_add.constprop.28+0x610/0xc80
> [ 1166.203228]  __lock_acquire+0xcbc/0x12a8
> [ 1166.212469]  lock_acquire+0xe8/0x250
> [ 1166.221217]  __mutex_lock+0x8c/0x840
> [ 1166.229745]  mutex_lock_nested+0x1c/0x28
> [ 1166.238735]  mmc_blk_mq_complete_prev_req.part.12+0x3c/0x220
> [ 1166.254447]  mmc_blk_rw_wait+0x88/0x150
> [ 1166.263940]  mmc_blk_mq_issue_rq+0x30c/0x774
> [ 1166.278741]  mmc_mq_queue_rq+0x108/0x278
> [ 1166.288203]  blk_mq_try_issue_directly+0x13c/0x240
> [ 1166.303322]  blk_mq_try_issue_list_directly+0x7c/0xe0
> [ 1166.319311]  blk_mq_sched_insert_requests+0x84/0xa0
> [ 1166.335859]  blk_mq_flush_plug_list+0x22c/0x4f0
> [ 1166.352129]  blk_flush_plug_list+0xd4/0x100
> [ 1166.368455]  blk_finish_plug+0x30/0x40
> [ 1166.378500]  shrink_node_memcg.constprop.19+0x4f0/0x6d0
> [ 1166.396035]  shrink_node+0x7c/0x300
> [ 1166.405729]  try_to_free_pages+0x1d8/0x6b0
> [ 1166.416012]  __alloc_pages_nodemask+0x4c8/0x10c0
> [ 1166.432173]  __handle_mm_fault+0x378/0xb10
> [ 1166.442314]  handle_mm_fault+0x98/0xe0
> [ 1166.451920]  do_page_fault+0x120/0x460
> [ 1166.461355]  do_translation_fault+0x50/0x58
> [ 1166.476086]  do_mem_abort+0x3c/0x98
> [ 1166.485124]  el0_da+0x20/0x24

A.

  parent reply	other threads:[~2019-03-29 19:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-24 15:07 Linux v5.0-rc7: bcm2835 MMC issues Aaro Koskinen
     [not found] ` <20190224150746.GC26495-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>
2019-02-24 19:03   ` Stefan Wahren
     [not found]     ` <155745885.93258.1551034990363-uEpKuDZ350hmhno068Nerg@public.gmane.org>
2019-02-24 20:15       ` Aaro Koskinen
2019-02-26  1:15       ` Aaro Koskinen
     [not found]         ` <06528ceb-ff1d-626b-f520-477b0ea49d6f@i2se.com>
     [not found]           ` <06528ceb-ff1d-626b-f520-477b0ea49d6f-eS4NqCHxEME@public.gmane.org>
2019-02-27 18:51             ` Aaro Koskinen
     [not found]               ` <20190227185101.GA28051-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>
2019-02-28  8:09                 ` Stefan Wahren
2019-03-09 10:57                 ` Stefan Wahren
     [not found]                   ` <1685719010.348465.1552129077871-uEpKuDZ350hmhno068Nerg@public.gmane.org>
2019-03-10  0:51                     ` Aaro Koskinen
     [not found]                       ` <20190310005131.GF30189-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>
2019-03-15 17:12                         ` Stefan Wahren
2019-03-28 22:58                         ` Aaro Koskinen
     [not found]                           ` <20190328225818.GE16484-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>
2019-03-29 19:19                             ` Aaro Koskinen [this message]
     [not found]                               ` <20190329191926.GG16484-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>
2019-03-30 13:15                                 ` Stefan Wahren
     [not found]                                   ` <1632685811.158736.1553951733181-uEpKuDZ350hmhno068Nerg@public.gmane.org>
2019-03-30 14:53                                     ` Aaro Koskinen
     [not found]                                       ` <20190330145300.GI16484-4/PLUo9XfK9owWHViPbQSXlKr5M7+etX9Sl0XMgJPXI@public.gmane.org>
2019-03-30 15:13                                         ` Stefan Wahren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190329191926.GG16484@darkstar.musicnaut.iki.fi \
    --to=aaro.koskinen-x3b1voxeql0@public.gmane.org \
    --cc=linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=stefan.wahren-eS4NqCHxEME@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.