All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
To: Vinod Koul <vkoul@kernel.org>
Cc: Bard Liao <yung-chuan.liao@linux.intel.com>,
	alsa-devel@alsa-project.org, tiwai@suse.de,
	gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	ranjani.sridharan@linux.intel.com, hui.wang@canonical.com,
	broonie@kernel.org, srinivas.kandagatla@linaro.org,
	jank@cadence.com, mengdong.lin@intel.com,
	sanyog.r.kale@intel.com, rander.wang@linux.intel.com,
	bard.liao@intel.com
Subject: Re: [PATCH] soundwire: cadence: fix race condition between suspend and Slave device alerts
Date: Fri, 21 Aug 2020 10:17:25 -0500	[thread overview]
Message-ID: <29ea5a44-b971-770a-519c-ae879557b63f@linux.intel.com> (raw)
In-Reply-To: <20200821050758.GI2639@vkoul-mobl>



>> cancel_work_sync() will either
>> a) wait until the current work completes, or
>> b) prevent a new one from starting.
>>
>> there's no way to really 'abort' a workqueue, 'cancel' means either complete
>> or don't start.
> 
> Quite right, as that is how everyone deals with it. Stop the irq from
> firing first and then wait until work is cancelled or completed, hence
> cancel_work_sync()

No, this cannot work... The work queue in progress will initiate 
transactions which would never complete because the interrupts are disabled.

>> if you disable the interrupts then cancel the work, you have a risk of not
>> letting the work complete if it already started (case a).
>>
>> The race is
>> a) the interrupt thread (this function) starts
>> b) the work is scheduled and starts
>> c) the suspend handler starts and disables interrupts in [1] below.
>> d) the work initiates transactions which will never complete since Cadence
>> interrupts have been disabled.
> 
> Would it not be better to let work handle the case of interrupts
> disabled and not initiates transactions which wont complete here? That
> sounds more reasonable to do rather than complete the work which anyone
> doesn't matter as you are suspending

A in-progress workqueue has no notion that interrupts are disabled, nor 
that the device is in the process of suspending. It writes into a fifo 
and blocks with wait_for_completion(). the complete() is done in an 
interrupt thread, triggered when the RX Fifo reaches a watermark.

So if you disable interrupts, the complete() never happens.

The safe way to do things with the current code is to let the workqueue 
complete, then disable interrupts.

We only disable interrupts on suspend, we don't need to test if 
interrupts are enabled for every single byte that's transmitted on the 
bus. Not to mention that this additional test would be racy as hell and 
require yet another synchronization primitive making the code more 
complicated.

So yes, the current transactions will complete and will be ignored, but 
it's a lot better than trying to prevent these transactions from 
happening with extra layers of complexity that will impact *every* 
transaction.

BTW I looked at another alternative based on the msg lock, so that 
interrupts cannot be disabled while a message is being sent. However 
because a workqueue may send multiple messages, e.g. to read registers 
are non-contiguous locations, there is no way to guarantee what happens 
how messages and interrupt disabling are scheduled, so there'd still be 
a case of a workqueue not completing and being stuck on a mutex_lock(), 
not so good either.

In short, this is the simplest way to fix the timeout on resume.

WARNING: multiple messages have this Message-ID (diff)
From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
To: Vinod Koul <vkoul@kernel.org>
Cc: alsa-devel@alsa-project.org, tiwai@suse.de,
	gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	ranjani.sridharan@linux.intel.com, hui.wang@canonical.com,
	broonie@kernel.org, srinivas.kandagatla@linaro.org,
	jank@cadence.com, mengdong.lin@intel.com,
	sanyog.r.kale@intel.com,
	Bard Liao <yung-chuan.liao@linux.intel.com>,
	rander.wang@linux.intel.com, bard.liao@intel.com
Subject: Re: [PATCH] soundwire: cadence: fix race condition between suspend and Slave device alerts
Date: Fri, 21 Aug 2020 10:17:25 -0500	[thread overview]
Message-ID: <29ea5a44-b971-770a-519c-ae879557b63f@linux.intel.com> (raw)
In-Reply-To: <20200821050758.GI2639@vkoul-mobl>



>> cancel_work_sync() will either
>> a) wait until the current work completes, or
>> b) prevent a new one from starting.
>>
>> there's no way to really 'abort' a workqueue, 'cancel' means either complete
>> or don't start.
> 
> Quite right, as that is how everyone deals with it. Stop the irq from
> firing first and then wait until work is cancelled or completed, hence
> cancel_work_sync()

No, this cannot work... The work queue in progress will initiate 
transactions which would never complete because the interrupts are disabled.

>> if you disable the interrupts then cancel the work, you have a risk of not
>> letting the work complete if it already started (case a).
>>
>> The race is
>> a) the interrupt thread (this function) starts
>> b) the work is scheduled and starts
>> c) the suspend handler starts and disables interrupts in [1] below.
>> d) the work initiates transactions which will never complete since Cadence
>> interrupts have been disabled.
> 
> Would it not be better to let work handle the case of interrupts
> disabled and not initiates transactions which wont complete here? That
> sounds more reasonable to do rather than complete the work which anyone
> doesn't matter as you are suspending

A in-progress workqueue has no notion that interrupts are disabled, nor 
that the device is in the process of suspending. It writes into a fifo 
and blocks with wait_for_completion(). the complete() is done in an 
interrupt thread, triggered when the RX Fifo reaches a watermark.

So if you disable interrupts, the complete() never happens.

The safe way to do things with the current code is to let the workqueue 
complete, then disable interrupts.

We only disable interrupts on suspend, we don't need to test if 
interrupts are enabled for every single byte that's transmitted on the 
bus. Not to mention that this additional test would be racy as hell and 
require yet another synchronization primitive making the code more 
complicated.

So yes, the current transactions will complete and will be ignored, but 
it's a lot better than trying to prevent these transactions from 
happening with extra layers of complexity that will impact *every* 
transaction.

BTW I looked at another alternative based on the msg lock, so that 
interrupts cannot be disabled while a message is being sent. However 
because a workqueue may send multiple messages, e.g. to read registers 
are non-contiguous locations, there is no way to guarantee what happens 
how messages and interrupt disabling are scheduled, so there'd still be 
a case of a workqueue not completing and being stuck on a mutex_lock(), 
not so good either.

In short, this is the simplest way to fix the timeout on resume.

  reply	other threads:[~2020-08-21 15:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-17 22:23 [PATCH] soundwire: cadence: fix race condition between suspend and Slave device alerts Bard Liao
2020-08-17 22:23 ` Bard Liao
2020-08-19  9:06 ` Vinod Koul
2020-08-19  9:06   ` Vinod Koul
2020-08-19 12:51   ` Pierre-Louis Bossart
2020-08-21  5:07     ` Vinod Koul
2020-08-21  5:07       ` Vinod Koul
2020-08-21 15:17       ` Pierre-Louis Bossart [this message]
2020-08-21 15:17         ` Pierre-Louis Bossart
2020-08-28  8:00         ` Vinod Koul
2020-08-28  8:00           ` Vinod Koul
2020-08-28 15:14           ` Pierre-Louis Bossart
2020-08-28 15:14             ` Pierre-Louis Bossart
2020-09-08 11:58             ` Jaroslav Kysela
2020-09-09  7:59               ` Vinod Koul
2020-09-09  7:59                 ` Vinod Koul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29ea5a44-b971-770a-519c-ae879557b63f@linux.intel.com \
    --to=pierre-louis.bossart@linux.intel.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=bard.liao@intel.com \
    --cc=broonie@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hui.wang@canonical.com \
    --cc=jank@cadence.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mengdong.lin@intel.com \
    --cc=rander.wang@linux.intel.com \
    --cc=ranjani.sridharan@linux.intel.com \
    --cc=sanyog.r.kale@intel.com \
    --cc=srinivas.kandagatla@linaro.org \
    --cc=tiwai@suse.de \
    --cc=vkoul@kernel.org \
    --cc=yung-chuan.liao@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.