* Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed @ 2019-01-14 15:35 Jon Hunter [not found] ` <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org> 2019-01-22 9:36 ` Geert Uytterhoeven 0 siblings, 2 replies; 19+ messages in thread From: Jon Hunter @ 2019-01-14 15:35 UTC (permalink / raw) To: kernel, Mark Brown; +Cc: linux-tegra, Linux Kernel Mailing List, linux-spi Hi Martin, Mark, I have noticed that system suspend has started failing consistently on a couple Tegra boards over the last few days with the linux-next branch. The following error is seen on on entering suspend ... [ 58.222033] spi_master spi1: could not stop message queue [ 58.222038] spi_master spi1: queue stop failed [ 58.222048] dpm_run_callback(): platform_pm_suspend+0x0/0x54 returns -16 [ 58.222052] PM: Device 7000da00.spi failed to suspend: error -16 [ 58.222057] PM: Some devices failed to suspend, or early wake event detected Bisecting today's -next points to commit 412e60373245 ("spi: core: avoid waking pump thread from spi_sync instead run teardown delayed") and reverting this on top of -next fixes the problem. I have not had chance to dig any further but wanted to report this issue to see if you have any thoughts. Cheers Jon -- nvpublic ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org>]
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed [not found] ` <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org> @ 2019-01-15 14:26 ` Jon Hunter 2019-01-15 15:10 ` Mark Brown 2019-01-15 17:39 ` kernel 0 siblings, 2 replies; 19+ messages in thread From: Jon Hunter @ 2019-01-15 14:26 UTC (permalink / raw) To: Martin Sperl Cc: Mark Brown, linux-tegra, Linux Kernel Mailing List, linux-spi Hi Martin, On 14/01/2019 22:01, Martin Sperl wrote: > Hi Jon, > > On 14.01.2019, at 16:35, Jon Hunter <jonathanh@nvidia.com > <mailto:jonathanh@nvidia.com>> wrote: > >> Hi Martin, Mark, >> >> [ 58.222033] spi_master spi1: could not stop message queue >> [ 58.222038] spi_master spi1: queue stop failed >> [ 58.222048] dpm_run_callback(): platform_pm_suspend+0x0/0x54 >> returns -16 >> [ 58.222052] PM: Device 7000da00.spi failed to suspend: error -16 >> [ 58.222057] PM: Some devices failed to suspend, or early wake event >> detected > > Unfortunately I have not been able to reproduce this in > my test cases with the hw available to me. Looking at both boards that fail, tegra30-cardhu-a04 and tegra124-jetson-tk1 they both have a spi-flash. The compatible strings for the spi flashes are "winbond,w25q32" and "winbond,w25q32dw", respectively which interestingly are not documented/used anywhere in the kernel. It appears that there was a patch to fix this a few years back but never got applied [0]. However, applying this patch does not fix the issue. Furthermore, without this patch applied I see that the spi flash is detected fine ... [ 2.540395] m25p80 spi1.0: w25q32dw (4096 Kbytes) So this is not related but the main point is occurs with a spi flash device. > Looks as if there is something missing in spi_stop_queue that > would wake the worker thread one last time without any delays > and finish the hw shutdown immediately - it runs as a delayed > task... > > One question: do you run any spi transfers in > your test case before suspend? No and before suspending I dumped some of the spi stats and I see no tranfers/messages at all ... Stats for spi1 ... Bytes: 0 Errors: 0 Messages: 0 Transfers: 0 > /sys/class/spi_master/spi1/statistics/messages gives some > counters on the number of spi messages processed which > would give you an indication if that is happening. > > It could be as easy as adding right after the first lock > in spi_stop_queue: > kthread_mod_delayed_work(&ctlr->kworker, > &ctlr->pump_idle_teardown, 0); > (plus maybe a yield or similar to allow the worker to > quickly/reliably run on a single core machine) > > I hope that this initial guess helps. Unfortunately, the above did not help and the issue persists. Digging a bit deeper I see that now the 'ctlr->queue' is empty but 'ctlr->busy' flag is set and this is causing the 'could not stop message queue' error. It seems that __spi_pump_messages() is getting called several times during boot when registering the spi-flash, then after the spi-flash has been registered, about a 1 sec later spi_pump_idle_teardown() is called (as expected), but exits because 'ctlr->running' is true. However, spi_pump_idle_teardown() is never called again and when we suspend we are stuck in the busy/running state. In this case should something be scheduling spi_pump_idle_teardown() again? Although even if it does I don't see where the busy flag would be cleared in this path? Cheers Jon [0] https://patchwork.kernel.org/patch/7021961/ -- nvpublic ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 14:26 ` Jon Hunter @ 2019-01-15 15:10 ` Mark Brown 2019-01-15 16:09 ` Jon Hunter 2019-01-15 17:39 ` kernel 1 sibling, 1 reply; 19+ messages in thread From: Mark Brown @ 2019-01-15 15:10 UTC (permalink / raw) To: Jon Hunter Cc: Martin Sperl, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 8567 bytes --] On Tue, Jan 15, 2019 at 02:26:02PM +0000, Jon Hunter wrote: > It seems that __spi_pump_messages() is getting called several times > during boot when registering the spi-flash, then after the spi-flash has > been registered, about a 1 sec later spi_pump_idle_teardown() is called > (as expected), but exits because 'ctlr->running' is true. However, > spi_pump_idle_teardown() is never called again and when we suspend we > are stuck in the busy/running state. In this case should something be > scheduling spi_pump_idle_teardown() again? Although even if it does I > don't see where the busy flag would be cleared in this path? Right, I think with the current code we just shouldn't be checking for busy in teardown, since there's now a fairly big delay between idle and actually turning the hardware off the name is just super misleading and the logic confused. I don't have time to test right now but does something like the below which changes it to a flag for the hardware being powered up work: diff --git a/drivers/spi/spi-stm32-qspi.c b/drivers/spi/spi-stm32-qspi.c index 3b2a9a6b990d..0170b0ef5d37 100644 --- a/drivers/spi/spi-stm32-qspi.c +++ b/drivers/spi/spi-stm32-qspi.c @@ -358,9 +358,6 @@ static int stm32_qspi_setup(struct spi_device *spi) struct stm32_qspi_flash *flash; u32 cr, presc; - if (ctrl->busy) - return -EBUSY; - if (!spi->max_speed_hz) return -EINVAL; diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c index 5f19016bbf10..3d746a0782eb 100644 --- a/drivers/spi/spi-ti-qspi.c +++ b/drivers/spi/spi-ti-qspi.c @@ -152,11 +152,6 @@ static int ti_qspi_setup(struct spi_device *spi) int clk_div = 0, ret; u32 clk_ctrl_reg, clk_rate, clk_mask; - if (spi->master->busy) { - dev_dbg(qspi->dev, "master busy doing other transfers\n"); - return -EBUSY; - } - if (!qspi->spi_max_frequency) { dev_err(qspi->dev, "spi max frequency not defined\n"); return -EINVAL; diff --git a/drivers/spi/spi-zynqmp-gqspi.c b/drivers/spi/spi-zynqmp-gqspi.c index 9f83e1b17aa1..a173a3114813 100644 --- a/drivers/spi/spi-zynqmp-gqspi.c +++ b/drivers/spi/spi-zynqmp-gqspi.c @@ -487,22 +487,6 @@ static int zynqmp_qspi_setup_transfer(struct spi_device *qspi, return 0; } -/** - * zynqmp_qspi_setup: Configure the QSPI controller - * @qspi: Pointer to the spi_device structure - * - * Sets the operational mode of QSPI controller for the next QSPI transfer, - * baud rate and divisor value to setup the requested qspi clock. - * - * Return: 0 on success; error value otherwise. - */ -static int zynqmp_qspi_setup(struct spi_device *qspi) -{ - if (qspi->master->busy) - return -EBUSY; - return 0; -} - /** * zynqmp_qspi_filltxfifo: Fills the TX FIFO as long as there is room in * the FIFO or the bytes required to be @@ -1088,7 +1072,6 @@ static int zynqmp_qspi_probe(struct platform_device *pdev) master->num_chipselect = GQSPI_DEFAULT_NUM_CS; - master->setup = zynqmp_qspi_setup; master->set_cs = zynqmp_qspi_chipselect; master->transfer_one = zynqmp_qspi_start_transfer; master->prepare_transfer_hardware = zynqmp_prepare_transfer_hardware; diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 06b9139664a3..6d10ad9d768f 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1234,7 +1234,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread) /* Check if the queue is idle */ if (list_empty(&ctlr->queue) || !ctlr->running) { - if (!ctlr->busy) { + if (!ctlr->hw_active) { spin_unlock_irqrestore(&ctlr->queue_lock, flags); return; } @@ -1252,10 +1252,10 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread) list_first_entry(&ctlr->queue, struct spi_message, queue); list_del_init(&ctlr->cur_msg->queue); - if (ctlr->busy) + if (ctlr->hw_active) was_busy = true; else - ctlr->busy = true; + ctlr->hw_active = true; spin_unlock_irqrestore(&ctlr->queue_lock, flags); mutex_lock(&ctlr->io_mutex); @@ -1362,10 +1362,6 @@ static void spi_pump_idle_teardown(struct kthread_work *work) if (ctlr->running) goto out; - /* if the controller is busy then exit */ - if (ctlr->busy) - goto out; - /* if the controller is idling then exit * this is actually a bit strange and would indicate that * this function is scheduled twice, which should not happen @@ -1374,7 +1370,6 @@ static void spi_pump_idle_teardown(struct kthread_work *work) goto out; /* set up the initial states */ - ctlr->busy = false; ctlr->idling = true; spin_unlock_irqrestore(&ctlr->queue_lock, flags); @@ -1384,15 +1379,17 @@ static void spi_pump_idle_teardown(struct kthread_work *work) kfree(ctlr->dummy_tx); ctlr->dummy_tx = NULL; - /* unprepare hardware */ - if (ctlr->unprepare_transfer_hardware && - ctlr->unprepare_transfer_hardware(ctlr)) - dev_err(&ctlr->dev, - "failed to unprepare transfer hardware\n"); - /* handle pm */ - if (ctlr->auto_runtime_pm) { - pm_runtime_mark_last_busy(ctlr->dev.parent); - pm_runtime_put_autosuspend(ctlr->dev.parent); + if (ctlr->hw_active) { + /* unprepare hardware */ + if (ctlr->unprepare_transfer_hardware && + ctlr->unprepare_transfer_hardware(ctlr)) + dev_err(&ctlr->dev, + "failed to unprepare transfer hardware\n"); + /* handle pm */ + if (ctlr->auto_runtime_pm) { + pm_runtime_mark_last_busy(ctlr->dev.parent); + pm_runtime_put_autosuspend(ctlr->dev.parent); + } } /* mark controller as idle */ @@ -1401,6 +1398,7 @@ static void spi_pump_idle_teardown(struct kthread_work *work) /* finally put us from idling into stopped */ spin_lock_irqsave(&ctlr->queue_lock, flags); ctlr->idling = false; + ctlr->hw_active = false; out: spin_unlock_irqrestore(&ctlr->queue_lock, flags); @@ -1411,7 +1409,7 @@ static int spi_init_queue(struct spi_controller *ctlr) struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 }; ctlr->running = false; - ctlr->busy = false; + ctlr->hw_active = false; kthread_init_worker(&ctlr->kworker); ctlr->kworker_task = kthread_run(kthread_worker_fn, &ctlr->kworker, @@ -1520,7 +1518,7 @@ static int spi_start_queue(struct spi_controller *ctlr) spin_lock_irqsave(&ctlr->queue_lock, flags); - if (ctlr->running || ctlr->busy) { + if (ctlr->running || ctlr->hw_active) { spin_unlock_irqrestore(&ctlr->queue_lock, flags); return -EBUSY; } @@ -1543,18 +1541,19 @@ static int spi_stop_queue(struct spi_controller *ctlr) spin_lock_irqsave(&ctlr->queue_lock, flags); /* - * This is a bit lame, but is optimized for the common execution path. - * A wait_queue on the ctlr->busy could be used, but then the common - * execution path (pump_messages) would be required to call wake_up or - * friends on every SPI message. Do this instead. + * This is a bit sad, but is optimized for the common execution path. + * A wait_queue on the ctlr->hw_active could be used, but then + * the common execution path (pump_messages) would be required + * to call wake_up or friends on every SPI message. Do this + * instead. */ - while ((!list_empty(&ctlr->queue) || ctlr->busy) && limit--) { + while ((!list_empty(&ctlr->queue) || ctlr->hw_active) && limit--) { spin_unlock_irqrestore(&ctlr->queue_lock, flags); usleep_range(10000, 11000); spin_lock_irqsave(&ctlr->queue_lock, flags); } - if (!list_empty(&ctlr->queue) || ctlr->busy) + if (!list_empty(&ctlr->queue) || ctlr->hw_active) ret = -EBUSY; else ctlr->running = false; diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 79ad62e2487c..d9b7be89e50b 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -343,7 +343,7 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv) * in-flight message * @cur_msg_mapped: message has been mapped for DMA * @xfer_completion: used by core transfer_one_message() - * @busy: message pump is busy + * @hw_active: message pump has hardware powered up * @running: message pump is running * @rt: whether this queue is set to run as a realtime task * @auto_runtime_pm: the core should ensure a runtime PM reference is held @@ -538,7 +538,7 @@ struct spi_controller { struct list_head queue; struct spi_message *cur_msg; bool idling; - bool busy; + bool hw_active; bool running; bool rt; bool auto_runtime_pm; [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 15:10 ` Mark Brown @ 2019-01-15 16:09 ` Jon Hunter 2019-01-15 19:27 ` Mark Brown 0 siblings, 1 reply; 19+ messages in thread From: Jon Hunter @ 2019-01-15 16:09 UTC (permalink / raw) To: Mark Brown Cc: Martin Sperl, linux-tegra, Linux Kernel Mailing List, linux-spi On 15/01/2019 15:10, Mark Brown wrote: > On Tue, Jan 15, 2019 at 02:26:02PM +0000, Jon Hunter wrote: > >> It seems that __spi_pump_messages() is getting called several times >> during boot when registering the spi-flash, then after the spi-flash has >> been registered, about a 1 sec later spi_pump_idle_teardown() is called >> (as expected), but exits because 'ctlr->running' is true. However, >> spi_pump_idle_teardown() is never called again and when we suspend we >> are stuck in the busy/running state. In this case should something be >> scheduling spi_pump_idle_teardown() again? Although even if it does I >> don't see where the busy flag would be cleared in this path? > > Right, I think with the current code we just shouldn't be checking for > busy in teardown, since there's now a fairly big delay between idle and > actually turning the hardware off the name is just super misleading and > the logic confused. I don't have time to test right now but does > something like the below which changes it to a flag for the hardware > being powered up work: I tried your change but the same problem still persists. Cheers Jon -- nvpublic ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 16:09 ` Jon Hunter @ 2019-01-15 19:27 ` Mark Brown 0 siblings, 0 replies; 19+ messages in thread From: Mark Brown @ 2019-01-15 19:27 UTC (permalink / raw) To: Jon Hunter Cc: Martin Sperl, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 753 bytes --] On Tue, Jan 15, 2019 at 04:09:14PM +0000, Jon Hunter wrote: > On 15/01/2019 15:10, Mark Brown wrote: > > On Tue, Jan 15, 2019 at 02:26:02PM +0000, Jon Hunter wrote: > > Right, I think with the current code we just shouldn't be checking for > > busy in teardown, since there's now a fairly big delay between idle and > > actually turning the hardware off the name is just super misleading and > > the logic confused. I don't have time to test right now but does > > something like the below which changes it to a flag for the hardware > > being powered up work: > I tried your change but the same problem still persists. OK, I feared as much with the stats not showing any activity. I do think it needs a cleanup still but probably won't help here. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 14:26 ` Jon Hunter 2019-01-15 15:10 ` Mark Brown @ 2019-01-15 17:39 ` kernel 2019-01-15 19:26 ` Mark Brown 2019-01-16 10:58 ` Jon Hunter 1 sibling, 2 replies; 19+ messages in thread From: kernel @ 2019-01-15 17:39 UTC (permalink / raw) To: Jon Hunter; +Cc: Mark Brown, linux-tegra, Linux Kernel Mailing List, linux-spi Hi John! > On 15.01.2019, at 15:26, Jon Hunter <jonathanh@nvidia.com> wrote: >> Looks as if there is something missing in spi_stop_queue that >> would wake the worker thread one last time without any delays >> and finish the hw shutdown immediately - it runs as a delayed >> task... >> >> One question: do you run any spi transfers in >> your test case before suspend? > > No and before suspending I dumped some of the spi stats and I see no > tranfers/messages at all ... > > Stats for spi1 ... > Bytes: 0 > Errors: 0 > Messages: 0 > Transfers: 0 This... >> /sys/class/spi_master/spi1/statistics/messages gives some >> counters on the number of spi messages processed which >> would give you an indication if that is happening. >> >> It could be as easy as adding right after the first lock >> in spi_stop_queue: >> kthread_mod_delayed_work(&ctlr->kworker, >> &ctlr->pump_idle_teardown, 0); >> (plus maybe a yield or similar to allow the worker to >> quickly/reliably run on a single core machine) >> >> I hope that this initial guess helps. > > Unfortunately, the above did not help and the issue persists. > > Digging a bit deeper I see that now the 'ctlr->queue' is empty but > 'ctlr->busy' flag is set and this is causing the 'could not stop message > queue' error. > > It seems that __spi_pump_messages() is getting called several times > during boot when registering the spi-flash, then after the spi-flash has > been registered, about a 1 sec later spi_pump_idle_teardown() is called > (as expected), but exits because 'ctlr->running' is true. However, and this contradicts each other! If there is a call to message pump, then we should process a message and the counters should increase. If those counters do not increase then there is something strange. If we are called without anything to do then the pump should trigger a tear down and stop. > spi_pump_idle_teardown() is never called again and when we suspend we > are stuck in the busy/running state. In this case should something be > scheduling spi_pump_idle_teardown() again? Although even if it does I > don't see where the busy flag would be cleared in this path? > I am wondering where busy/running would be set in the first place if there are no counters... Is it possible that the specific flash is not using the “normal” spi_pump_message, but spi_controller_mem_ops operations? Maybe we are missing the teardown in that driver or something is changing flags there. grepping a bit: I see spi_mem_access_start calling spi_flush_queue, which is calling pump_message, which - if there is no transfer - will trigger a delayed tear-down, while it maybe should not be doing that. Maybe spi_mem_access_end needs a call to spi_flush_queue as well? Unfortunately this is theoretical work and quite a lot of guesswork without an actual device available for testing... Thanks, Martin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 17:39 ` kernel @ 2019-01-15 19:26 ` Mark Brown 2019-01-15 20:58 ` Martin Sperl 2019-01-16 10:58 ` Jon Hunter 1 sibling, 1 reply; 19+ messages in thread From: Mark Brown @ 2019-01-15 19:26 UTC (permalink / raw) To: kernel; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 1001 bytes --] On Tue, Jan 15, 2019 at 06:39:27PM +0100, kernel@martin.sperl.org wrote: > Is it possible that the specific flash is not using the “normal” > spi_pump_message, but spi_controller_mem_ops operations? Right, that's my best guess at the minute as well. > Maybe we are missing the teardown in that driver or something is > changing flags there. > grepping a bit: > I see spi_mem_access_start calling spi_flush_queue, which is calling > pump_message, which - if there is no transfer - will trigger a delayed > tear-down, while it maybe should not be doing that. If nothing else it's inefficient. > Maybe spi_mem_access_end needs a call to spi_flush_queue as well? Hrm, or needs to schedule the queue at any rate (though this will only have an impact in the fairly unusual case where there's something sharing the bus with a flash). > Unfortunately this is theoretical work and quite a lot of guesswork > without an actual device available for testing... Indeed. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 19:26 ` Mark Brown @ 2019-01-15 20:58 ` Martin Sperl 2019-01-15 21:25 ` Mark Brown 0 siblings, 1 reply; 19+ messages in thread From: Martin Sperl @ 2019-01-15 20:58 UTC (permalink / raw) To: Mark Brown; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi > On 15.01.2019, at 20:26, Mark Brown <broonie@kernel.org> wrote: > >> On Tue, Jan 15, 2019 at 06:39:27PM +0100, kernel@martin.sperl.org wrote: >> >> Is it possible that the specific flash is not using the “normal” >> spi_pump_message, but spi_controller_mem_ops operations? > > Right, that's my best guess at the minute as well. > >> Maybe we are missing the teardown in that driver or something is >> changing flags there. > >> grepping a bit: > >> I see spi_mem_access_start calling spi_flush_queue, which is calling >> pump_message, which - if there is no transfer - will trigger a delayed >> tear-down, while it maybe should not be doing that. > > If nothing else it's inefficient. > >> Maybe spi_mem_access_end needs a call to spi_flush_queue as well? > > Hrm, or needs to schedule the queue at any rate (though this will only > have an impact in the fairly unusual case where there's something > sharing the bus with a flash). > >> Unfortunately this is theoretical work and quite a lot of guesswork >> without an actual device available for testing... > > Indeed. Maybe a bigger change to the reduce the complexity of the state machine would solve that problem and also reduce code complexity... I may find some time over the weekend if no solution has been found until then. The way I would envision it it would have a “state” as a level (0=shutdown, 1=hw enabled, 2=in pump, 3=in transfer, 4=in hw-mode,...) and a complete to allow waking the shutdown thread (and by this avoiding the busy wait loop we have now). This would replace those idling, busy, and running flags. Drawback: it is invasive, but let us see what it really looks like... Martin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 20:58 ` Martin Sperl @ 2019-01-15 21:25 ` Mark Brown 2019-01-16 11:01 ` Jon Hunter 2019-01-18 17:11 ` kernel 0 siblings, 2 replies; 19+ messages in thread From: Mark Brown @ 2019-01-15 21:25 UTC (permalink / raw) To: Martin Sperl Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 1301 bytes --] On Tue, Jan 15, 2019 at 09:58:55PM +0100, Martin Sperl wrote: > Maybe a bigger change to the reduce the complexity of > the state machine would solve that problem and also > reduce code complexity... Yeah, that's where I was getting to with that test patch I posted. > I may find some time over the weekend if no solution > has been found until then. Thanks for volunteering :) > The way I would envision it it would have a “state” > as a level (0=shutdown, 1=hw enabled, 2=in pump, > 3=in transfer, 4=in hw-mode,...) and a complete > to allow waking the shutdown thread (and by this > avoiding the busy wait loop we have now). > This would replace those idling, busy, and running flags. That's a good idea, yes - a single enum much more reflects what we can actually do in terms of transitions. > Drawback: it is invasive, but let us see what it > really looks like... I think we need to either drop your change (which would be bad since it is a big performance improvement, I'd punted it for later when I did the original refactoring to push the work into the caller threads then never got around to it) or have a invasive changes to make the new situation clearer. Right now things are just far too complex to reason about which isn't helping anyone. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 21:25 ` Mark Brown @ 2019-01-16 11:01 ` Jon Hunter 2019-01-18 17:11 ` kernel 1 sibling, 0 replies; 19+ messages in thread From: Jon Hunter @ 2019-01-16 11:01 UTC (permalink / raw) To: Mark Brown, Martin Sperl Cc: linux-tegra, Linux Kernel Mailing List, linux-spi On 15/01/2019 21:25, Mark Brown wrote: > On Tue, Jan 15, 2019 at 09:58:55PM +0100, Martin Sperl wrote: > >> Maybe a bigger change to the reduce the complexity of >> the state machine would solve that problem and also >> reduce code complexity... > > Yeah, that's where I was getting to with that test patch I posted. > >> I may find some time over the weekend if no solution >> has been found until then. > > Thanks for volunteering :) > >> The way I would envision it it would have a “state” >> as a level (0=shutdown, 1=hw enabled, 2=in pump, >> 3=in transfer, 4=in hw-mode,...) and a complete >> to allow waking the shutdown thread (and by this >> avoiding the busy wait loop we have now). >> This would replace those idling, busy, and running flags. > > That's a good idea, yes - a single enum much more reflects what we can > actually do in terms of transitions. > >> Drawback: it is invasive, but let us see what it >> really looks like... > > I think we need to either drop your change (which would be bad since it > is a big performance improvement, I'd punted it for later when I did the > original refactoring to push the work into the caller threads then never > got around to it) or have a invasive changes to make the new situation > clearer. Right now things are just far too complex to reason about > which isn't helping anyone. What I am seeing on Tegra is that the SPI controller is no longer runtime-pm suspended and so while this may improve performance, I think that this change is going to have a negative impact on idle power. I would be surprised if this problem is specific to Tegra because I don't see how the controller's busy state can ever be cleared :-( Cheers Jon -- nvpublic ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 21:25 ` Mark Brown 2019-01-16 11:01 ` Jon Hunter @ 2019-01-18 17:11 ` kernel 2019-01-18 19:12 ` Mark Brown 1 sibling, 1 reply; 19+ messages in thread From: kernel @ 2019-01-18 17:11 UTC (permalink / raw) To: Mark Brown; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi Hi Mark! Just got access to my computer back for the weekend ;) > On 15.01.2019, at 22:25, Mark Brown <broonie@kernel.org> wrote: > > On Tue, Jan 15, 2019 at 09:58:55PM +0100, Martin Sperl wrote: > >> I may find some time over the weekend if no solution >> has been found until then. > > Thanks for volunteering :) I actually brought this about so I see is as part of my responsibility trying to solve the issue as well... > >> The way I would envision it it would have a “state” >> as a level (0=shutdown, 1=hw enabled, 2=in pump, >> 3=in transfer, 4=in hw-mode,...) and a complete >> to allow waking the shutdown thread (and by this >> avoiding the busy wait loop we have now). >> This would replace those idling, busy, and running flags. > > That's a good idea, yes - a single enum much more reflects what we can > actually do in terms of transitions. Does something like this looks acceptable? diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index ec210286567c..677fc5025033 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -288,6 +288,21 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv) module_driver(__spi_driver, spi_register_driver, \ spi_unregister_driver) +/* define SPI Controller states in the state machine */ +enum spi_controller_state { + SPI_CONTROLLER_STATE_SHUTDOWN = 0, + SPI_CONTROLLER_STATE_IDLE = 1, + SPI_CONTROLLER_STATE_IN_PROCESS = 2, + SPI_CONTROLLER_STATE_IN_TRANSFER = 3, +}; + +/* define SPI Controller mode */ +enum spi_controller_mode { + SPI_CONTROLLER_MODE_NORMAL = 0, /* normal spi mode */ + SPI_CONTROLLER_MODE_MEM = 1, /* memory mode using mem_ops */ + SPI_CONTROLLER_MODE_EXCLUSIVE = 2, /* could replace bus_lock_flag */ +}; + /** * struct spi_controller - interface to SPI master or slave controller * @dev: device interface to this driver @@ -332,14 +347,16 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv) * @pump_idle_teardown: work structure for scheduling a teardown delayed * @queue_lock: spinlock to syncronise access to message queue * @queue: message queue - * @idling: the device is entering idle state * @cur_msg: the currently in-flight message * @cur_msg_prepared: spi_prepare_message was called for the currently * in-flight message * @cur_msg_mapped: message has been mapped for DMA + * @controller_state: defines the current state of the controller + * state machine + * @controller_state_changed: wakeup of threads interrested in getting + * notified about a mode change (primarily pm) + * @controller_mode: defines the current mode of the controller * @xfer_completion: used by core transfer_one_message() - * @busy: message pump is busy - * @running: message pump is running * @rt: whether this queue is set to run as a realtime task * @auto_runtime_pm: the core should ensure a runtime PM reference is held * while the hardware is prepared, using the parent @@ -524,9 +541,9 @@ struct spi_controller { spinlock_t queue_lock; struct list_head queue; struct spi_message *cur_msg; - bool idling; - bool busy; - bool running; + enum spi_controller_state controller_state; + struct completion controller_state_changed; + enum spi_controller_mode controller_mode; bool rt; bool auto_runtime_pm; bool cur_msg_prepared; SPI_CONTROLLER_MODE_EXCLUSIVE could replace the bus_lock_flag. I am also not sure of the “convention” of memory mode (i.e using mem_ops). Is it maybe implicitly spi_bus_locked - i.e typically only a single device? Essentially we would transition between these states linearly (hopefully no jumps are needed). Martin ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-18 17:11 ` kernel @ 2019-01-18 19:12 ` Mark Brown 2019-01-20 11:24 ` kernel 0 siblings, 1 reply; 19+ messages in thread From: Mark Brown @ 2019-01-18 19:12 UTC (permalink / raw) To: kernel; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 1211 bytes --] On Fri, Jan 18, 2019 at 06:11:31PM +0100, kernel@martin.sperl.org wrote: > Does something like this looks acceptable? > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > index ec210286567c..677fc5025033 100644 > --- a/include/linux/spi/spi.h > +++ b/include/linux/spi/spi.h > @@ -288,6 +288,21 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv) > module_driver(__spi_driver, spi_register_driver, \ > spi_unregister_driver) > > +/* define SPI Controller states in the state machine */ > +enum spi_controller_state { > + SPI_CONTROLLER_STATE_SHUTDOWN = 0, > + SPI_CONTROLLER_STATE_IDLE = 1, > + SPI_CONTROLLER_STATE_IN_PROCESS = 2, > + SPI_CONTROLLER_STATE_IN_TRANSFER = 3, > +}; Yes, it does! > SPI_CONTROLLER_MODE_EXCLUSIVE could replace the bus_lock_flag. > I am also not sure of the “convention” of memory mode (i.e > using mem_ops). Is it maybe implicitly spi_bus_locked - i.e > typically only a single device? Yes, it does - we're basically handing over the entire SPI bus to another bit of hardware that will do memory mapped flash access so we can't really have anything else trying to do SPI operations at the same time. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-18 19:12 ` Mark Brown @ 2019-01-20 11:24 ` kernel 2019-01-23 17:56 ` Mark Brown 0 siblings, 1 reply; 19+ messages in thread From: kernel @ 2019-01-20 11:24 UTC (permalink / raw) To: Mark Brown; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi Hi Mark! > On 18.01.2019, at 20:12, Mark Brown <broonie@kernel.org> wrote: > > On Fri, Jan 18, 2019 at 06:11:31PM +0100, kernel@martin.sperl.org wrote: > >> Does something like this looks acceptable? > > Yes, it does! > >> SPI_CONTROLLER_MODE_EXCLUSIVE could replace the bus_lock_flag. >> I am also not sure of the “convention” of memory mode (i.e >> using mem_ops). Is it maybe implicitly spi_bus_locked - i.e >> typically only a single device? > > Yes, it does - we're basically handing over the entire SPI bus to > another bit of hardware that will do memory mapped flash access so we > can't really have anything else trying to do SPI operations at the same > time. These kind of changes it requires are consuming a bit more time than I was hoping for. So maybe at this very moment the best is reverting the patch. I have some patches that would move towards that goal in baby steps without having to resort to a big rewrite at this very moment. Maybe Jon can check each patch one after another to see if any is producing the unexpected behavior and only then you would include them... As for the root cause of the regression: my guess is that spi-mem is just not triggering a shutdown any more because of how message_pump works. The improvement of the state-machine can then get addressed separately. Martin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-20 11:24 ` kernel @ 2019-01-23 17:56 ` Mark Brown 2019-05-09 19:47 ` Martin Sperl 0 siblings, 1 reply; 19+ messages in thread From: Mark Brown @ 2019-01-23 17:56 UTC (permalink / raw) To: kernel; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 572 bytes --] On Sun, Jan 20, 2019 at 12:24:23PM +0100, kernel@martin.sperl.org wrote: > These kind of changes it requires are consuming a bit more time than > I was hoping for. Thanks for trying. > So maybe at this very moment the best is reverting the patch. Yes, I'm just going to do that for now. > As for the root cause of the regression: my guess is that spi-mem is > just not triggering a shutdown any more because of how message_pump works. I'm fairly sure that's what's going on but not been able to get my head around things enough to figure out what's going wrong yet. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-23 17:56 ` Mark Brown @ 2019-05-09 19:47 ` Martin Sperl 2019-05-12 8:54 ` Mark Brown 0 siblings, 1 reply; 19+ messages in thread From: Martin Sperl @ 2019-05-09 19:47 UTC (permalink / raw) To: Mark Brown; +Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi Hi Mark! > On 23.01.2019, at 18:56, Mark Brown <broonie@kernel.org> wrote: > >> On Sun, Jan 20, 2019 at 12:24:23PM +0100, kernel@martin.sperl.org wrote: >> >> These kind of changes it requires are consuming a bit more time than >> I was hoping for. > > Thanks for trying. > >> So maybe at this very moment the best is reverting the patch. > > Yes, I'm just going to do that for now. > >> As for the root cause of the regression: my guess is that spi-mem is >> just not triggering a shutdown any more because of how message_pump works. > > I'm fairly sure that's what's going on but not been able to get my head > around things enough to figure out what's going wrong yet. While thinking about this again maybe an idea: What about implement a second spi_transfer_one implementation (together with a message pump implementation) that would handle things correctly. Any driver then can select the old (default) or new implementation and thus would allow the optimizations to take place only for verified working drivers... At least this way we would not be blocked because no hw exposing this Behavior is available to us - at the cost of extra code to get maintained. What I would then also like to do for the new implementation is modify the API a bit - ideally I would like to: * Make spi_sync the primary interface which the message pump is also using directly * move all the prepare stuff early into spi-sync, so that for example the Preparing (including dma mapping) would get done in the calling thread And only the prepared message would get submitted to the queue - special processing would be needed for the spi-async case. This should optimize the computations out of the central loop faster. Adding spi-nand support could get added later by someone who has access to a device making use of this. If that sounds as somewhat acceptable then I will try get something Implemented. Any other ideas where we could improve as well? Martin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-05-09 19:47 ` Martin Sperl @ 2019-05-12 8:54 ` Mark Brown 0 siblings, 0 replies; 19+ messages in thread From: Mark Brown @ 2019-05-12 8:54 UTC (permalink / raw) To: Martin Sperl Cc: Jon Hunter, linux-tegra, Linux Kernel Mailing List, linux-spi [-- Attachment #1: Type: text/plain, Size: 2482 bytes --] On Thu, May 09, 2019 at 09:47:08PM +0200, Martin Sperl wrote: > While thinking about this again maybe an idea: > What about implement a second spi_transfer_one implementation (together > with a message pump implementation) that would handle things correctly. > Any driver then can select the old (default) or new implementation and thus > would allow the optimizations to take place only for verified working drivers... I'd rather avoid having yet another interface for drivers to use, people already get confused trying to choose between the ones we already have. It'd have to be something where the existing drivers got actively converted and the old interface retired rather than something that hangs around. > What I would then also like to do for the new implementation is modify the > API a bit - ideally I would like to: > * Make spi_sync the primary interface which the message pump is also > using directly > * move all the prepare stuff early into spi-sync, so that for example the > Preparing (including dma mapping) would get done in the calling thread > And only the prepared message would get submitted to the queue > - special processing would be needed for the spi-async case. IIRC the mapping is deliberately done late in order to minimize the amount of time we're consuming resources for the mapping, there were some systems that had limited DMA channels. However I don't know how big a concern that is in this day and age with even relatively old systems. The idea of spi_async() having a separate path also makes me a bit nervous as it's much less widely used so more likely to get broken accidentially. Otherwise pushing things out to the caller makes sense, it should have no real impact in the majority of cases where the thread is just getting used to idle the controller and the actual work is all happening in the calling context anyway and if the pump is being used it means it's spending more time actually pushing transfers out. For the case where we do have the message pump going one thing it'd be good to do is overlap more of the admin work around the messages with other transfers - ideally we'd be able to kick off the next transfer from within the completion of a DMA. I need to have a dig around and figure out if I have any hardware that can actually support that, last time I looked at this my main system needed to kick everything up to the thread due to hardware requirements. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-15 17:39 ` kernel 2019-01-15 19:26 ` Mark Brown @ 2019-01-16 10:58 ` Jon Hunter 1 sibling, 0 replies; 19+ messages in thread From: Jon Hunter @ 2019-01-16 10:58 UTC (permalink / raw) To: kernel; +Cc: Mark Brown, linux-tegra, Linux Kernel Mailing List, linux-spi On 15/01/2019 17:39, kernel@martin.sperl.org wrote: > Hi John! > >> On 15.01.2019, at 15:26, Jon Hunter <jonathanh@nvidia.com> wrote: >>> Looks as if there is something missing in spi_stop_queue that >>> would wake the worker thread one last time without any delays >>> and finish the hw shutdown immediately - it runs as a delayed >>> task... >>> >>> One question: do you run any spi transfers in >>> your test case before suspend? >> >> No and before suspending I dumped some of the spi stats and I see no >> tranfers/messages at all ... >> >> Stats for spi1 ... >> Bytes: 0 >> Errors: 0 >> Messages: 0 >> Transfers: 0 > > This... > >>> /sys/class/spi_master/spi1/statistics/messages gives some >>> counters on the number of spi messages processed which >>> would give you an indication if that is happening. >>> >>> It could be as easy as adding right after the first lock >>> in spi_stop_queue: >>> kthread_mod_delayed_work(&ctlr->kworker, >>> &ctlr->pump_idle_teardown, 0); >>> (plus maybe a yield or similar to allow the worker to >>> quickly/reliably run on a single core machine) >>> >>> I hope that this initial guess helps. >> >> Unfortunately, the above did not help and the issue persists. >> >> Digging a bit deeper I see that now the 'ctlr->queue' is empty but >> 'ctlr->busy' flag is set and this is causing the 'could not stop message >> queue' error. >> >> It seems that __spi_pump_messages() is getting called several times >> during boot when registering the spi-flash, then after the spi-flash has >> been registered, about a 1 sec later spi_pump_idle_teardown() is called >> (as expected), but exits because 'ctlr->running' is true. However, > > and this contradicts each other! > If there is a call to message pump, then we should process a message > and the counters should increase. > > If those counters do not increase then there is something strange. Actually no, this is nothing strange here at all. If you look Tegra has it's own 'transfer_one_message' handler (tegra_spi_transfer_one_message in drivers/spi/spi-tegra114.c) and this does not update any of the statistics. Looks like the statistics are not updated if a SPI controller has it own transfer_one_message hander which is a shame. > If we are called without anything to do then the pump should trigger a > tear down and stop. >> spi_pump_idle_teardown() is never called again and when we suspend we >> are stuck in the busy/running state. In this case should something be >> scheduling spi_pump_idle_teardown() again? Although even if it does I >> don't see where the busy flag would be cleared in this path? >> > > I am wondering where busy/running would be set in the first place > if there are no counters... Looks like running is set true on initialising the queue and busy is set true in __spi_pump_messages(). > Is it possible that the specific flash is not using the “normal” > spi_pump_message, but spi_controller_mem_ops operations? I definitely see __spi_pump_messages() being called which in turn calls tegra_spi_transfer_one_message(). > Maybe we are missing the teardown in that driver or something is > changing flags there. > > grepping a bit: > > I see spi_mem_access_start calling spi_flush_queue, which is calling > pump_message, which - if there is no transfer - will trigger a delayed > tear-down, while it maybe should not be doing that. > > Maybe spi_mem_access_end needs a call to spi_flush_queue as well? > > Unfortunately this is theoretical work and quite a lot of guesswork > without an actual device available for testing... Looking some more I think that the problem is ... 1. The controller is put into the 'running' state when spi_start_queue() is called (which itself is called during the registration of the spi controller). 2. After sending some messages to initialise the SPI flash, spi_pump_idle_teardown() is called but exits early because the controller is in the 'running' state and this prevents 'busy' from being cleared. 3. On entering suspend spi_stop_queue() is called but the controller is still in the 'busy' state and so fails to stop. So the problem I see is that 'busy' is never cleared and before it was. Furthermore, because 'busy' is never cleared, I see that the SPI controller is never pm-runtime suspended and cannot be suspended, and so this will have an impact on power. With your patch after booting I see ... $ cat /sys/class/spi_master/spi1/device/power/runtime_status active Reverting your patch, after booting I see ... $ cat /sys/class/spi_master/spi1/device/power/runtime_status suspended Does the device you are testing with support runtime-pm? If so what do you see? Cheers Jon -- nvpublic ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed 2019-01-14 15:35 Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed Jon Hunter [not found] ` <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org> @ 2019-01-22 9:36 ` Geert Uytterhoeven [not found] ` <CGME20190123082650eucas1p243ce21346a00e9b3e9eed2863cd3d280@eucas1p2.samsung.com> 1 sibling, 1 reply; 19+ messages in thread From: Geert Uytterhoeven @ 2019-01-22 9:36 UTC (permalink / raw) To: Jon Hunter Cc: Martin Sperl, Mark Brown, linux-tegra, Linux Kernel Mailing List, linux-spi, Linux-Renesas On Mon, Jan 14, 2019 at 4:36 PM Jon Hunter <jonathanh@nvidia.com> wrote: > I have noticed that system suspend has started failing consistently on a > couple Tegra boards over the last few days with the linux-next branch. > The following error is seen on on entering suspend ... > > [ 58.222033] spi_master spi1: could not stop message queue > [ 58.222038] spi_master spi1: queue stop failed > [ 58.222048] dpm_run_callback(): platform_pm_suspend+0x0/0x54 returns -16 > [ 58.222052] PM: Device 7000da00.spi failed to suspend: error -16 > [ 58.222057] PM: Some devices failed to suspend, or early wake event detected > > Bisecting today's -next points to commit 412e60373245 ("spi: core: avoid > waking pump thread from spi_sync instead run teardown delayed") and > reverting this on top of -next fixes the problem. I have not had chance > to dig any further but wanted to report this issue to see if you have > any thoughts. I can confirm this on r8a7791/koelsch: PM: suspend entry (deep) PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.010 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.009 seconds) done. spi_master spi0: could not stop message queue spi_master spi0: queue stop failed dpm_run_callback(): rspi_suspend+0x0/0xc returns -16 PM: Device e6b10000.spi failed to suspend: error -16 PM: Some devices failed to suspend, or early wake event detected In addition, after boot, when the QSPI device is idle (the only activity was the initial probe), the device is no longer runtime-suspended, as witnessed by the following change to /sys/kernel/debug/pm_genpd/pm_genpd_summary: - /devices/platform/soc/e6b10000.spi suspended + /devices/platform/soc/e6b10000.spi active and by the following change to /sys/kernel/debug/clk/clk_summary: - qspi 0 1 0 97500000 0 0 50000 - qspi_mod 0 1 0 97500000 0 0 50000 + qspi 1 1 0 97500000 0 0 50000 + qspi_mod 1 1 0 97500000 0 0 50000 Reverting 412e60373245 also fixes that. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CGME20190123082650eucas1p243ce21346a00e9b3e9eed2863cd3d280@eucas1p2.samsung.com>]
* Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed [not found] ` <CGME20190123082650eucas1p243ce21346a00e9b3e9eed2863cd3d280@eucas1p2.samsung.com> @ 2019-01-23 8:26 ` Marek Szyprowski 0 siblings, 0 replies; 19+ messages in thread From: Marek Szyprowski @ 2019-01-23 8:26 UTC (permalink / raw) To: Geert Uytterhoeven, Jon Hunter Cc: Martin Sperl, Mark Brown, linux-tegra, Linux Kernel Mailing List, linux-spi, Linux-Renesas Hi All On 2019-01-22 10:36, Geert Uytterhoeven wrote: > On Mon, Jan 14, 2019 at 4:36 PM Jon Hunter <jonathanh@nvidia.com> wrote: >> I have noticed that system suspend has started failing consistently on a >> couple Tegra boards over the last few days with the linux-next branch. >> The following error is seen on on entering suspend ... >> >> [ 58.222033] spi_master spi1: could not stop message queue >> [ 58.222038] spi_master spi1: queue stop failed >> [ 58.222048] dpm_run_callback(): platform_pm_suspend+0x0/0x54 returns -16 >> [ 58.222052] PM: Device 7000da00.spi failed to suspend: error -16 >> [ 58.222057] PM: Some devices failed to suspend, or early wake event detected >> >> Bisecting today's -next points to commit 412e60373245 ("spi: core: avoid >> waking pump thread from spi_sync instead run teardown delayed") and >> reverting this on top of -next fixes the problem. I have not had chance >> to dig any further but wanted to report this issue to see if you have >> any thoughts. > I can confirm this on r8a7791/koelsch: > > PM: suspend entry (deep) > PM: Syncing filesystems ... done. > Freezing user space processes ... (elapsed 0.010 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.009 seconds) done. > spi_master spi0: could not stop message queue > spi_master spi0: queue stop failed > dpm_run_callback(): rspi_suspend+0x0/0xc returns -16 > PM: Device e6b10000.spi failed to suspend: error -16 > PM: Some devices failed to suspend, or early wake event detected With recent linux-next I've got a similar issue on Samsung Exynos based Chromebooks: Peach-Pit and Peach-Pi, both using SPI for communicating with regulators via chrome EC bridge: PM: suspend entry (deep) PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.001 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. printk: Suspending console(s) (use no_console_suspend to debug) ... spi_master spi2: could not stop message queue spi_master spi2: queue stop failed dpm_run_callback(): platform_pm_suspend+0x0/0x54 returns -16 PM: Device 12d40000.spi failed to suspend: error -16 PM: Some devices failed to suspend, or early wake event detected Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2019-05-12 8:54 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-14 15:35 Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed Jon Hunter [not found] ` <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org> 2019-01-15 14:26 ` Jon Hunter 2019-01-15 15:10 ` Mark Brown 2019-01-15 16:09 ` Jon Hunter 2019-01-15 19:27 ` Mark Brown 2019-01-15 17:39 ` kernel 2019-01-15 19:26 ` Mark Brown 2019-01-15 20:58 ` Martin Sperl 2019-01-15 21:25 ` Mark Brown 2019-01-16 11:01 ` Jon Hunter 2019-01-18 17:11 ` kernel 2019-01-18 19:12 ` Mark Brown 2019-01-20 11:24 ` kernel 2019-01-23 17:56 ` Mark Brown 2019-05-09 19:47 ` Martin Sperl 2019-05-12 8:54 ` Mark Brown 2019-01-16 10:58 ` Jon Hunter 2019-01-22 9:36 ` Geert Uytterhoeven [not found] ` <CGME20190123082650eucas1p243ce21346a00e9b3e9eed2863cd3d280@eucas1p2.samsung.com> 2019-01-23 8:26 ` Marek Szyprowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).