linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Middendorf <kernel@tuxforce.de>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Anand Jain <anand.jain@oracle.com>,
	linux-btrfs@vger.kernel.org, Antti Palosaari <crope@iki.fi>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-media@vger.kernel.org
Subject: Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?
Date: Fri, 14 Aug 2020 13:38:40 +0200	[thread overview]
Message-ID: <fc887e06-874c-79d8-0607-4e27ae788446@tuxforce.de> (raw)
In-Reply-To: <20200813221348.GB4332@42.do-not-panic.com>

Hey Louis,

On 14/08/2020 00:13, Luis Chamberlain wrote:
> On Thu, Aug 13, 2020 at 11:53:36PM +0200, Lukas Middendorf wrote:
>> With "not used" I mean that the device has been recognized, the si2168
>> module is loaded and si2168_probe() has been called on the device, but I
>> have not started a media player which actually plays a DVB-C stream.
>> Therefore si2168_init() has never been called and the firmware has never
>> been requested or loaded to the device.
>>
>> On resume si2168_init() will be called (although I don't think this actually
>> is really necessary)
> 
> Indeed, that seems odd given its not on probe. So yet another possible
> si2168 bug. Or another way to put it: your cache calls are not needed
> if you remove that si2168_init() if init was not called yet. So simply
> extending the data structure for the driver and seting a bool flag to
> true if init was called should do the trick.
> 
> Then the two cache calls would not be needed.

I already thought about a way to remove the firmware load on suspend if 
it has not happened earlier. But I saw no way to distinguish between 
calls to si2168_init() during resume and calls to si2168_init() at the 
start of device usage.

In contrast to struct dvb_tuner_ops the relevant struct dvb_frontend_ops 
has no separate members "suspend" and "resume" to store callbacks to be 
called instead of sleep and init during suspend and resume. Changing the 
behavior of si2168 would likely include some bigger architectural work 
on the dvb_frontend system to extend dvb_frontend_ops.
It might be possible to just put nothing into dvb_frontend_ops.init but 
instead populate the dvb_tuner_ops part of the struct, but those 
pointers might already be populated externally for some other use.

If I try to do this on my very first kernel contribution without 
in-depth knowledge of how the dvb drivers really work, this will likely 
just explode, if not for me then for somebody else with different hardware.

>> If the kernel does not already know that the files
>> are not present without access to the file system, the system just freezes.
> 
> It is not clear to me what this means. Can you clarify?
> 
>>>>      e) having the firmware files not installed freezes the system during
>>>> resume if the content of /usr/lib/firmware has not been listed before
>>>> suspend (e.g. installing the nvidia driver, so the nouveau driver does not
>>>> access this directory)
>>>
>>> That may be the same issue as in b) assuming that you meant you didn't
>>> use the dvb device, and that the firmware load issue is from nouveau.
>>
>> This is actually just the inverted case of b).
>> The only real relevance of the nouveau driver here is that its (perfectly
>> working) firmware caching on suspend actually seems to be equivalent to
>> manually running "ls" on the firmware directory and afterwards the kernel
>> also knows whether or not the si2168 firmware files are present without file
>> system access.
> 
> OK.. I still don't get it, so let me see if we can decipher what you
> mean here.
> 
> If the firmware is *not* present for the si2168 driver and the device
> has *not* been used yet you get a system freeze which you cannot recover
> from, but only if you are *not* using a driver which also caches its
> firmware already?

Yes, this is exactly what I wanted to say.

A new installation of Fedora 32 without firmware files and with nouveau 
did not show my freeze problem. Installing either the firmware files or 
the nvidia driver started the freeze during resume.

> In this case if the nouveau is used this freeze does not happen, and you
> believe this is because of at least one fetch for the directory is done
> already. If so, then this speedup of a fugure negative dentry lookup on
> btrfs seems like an interesting test case for btrfs developers too look
> at.

According to my previous observations, an alternative to having the 
nouveau driver fetch files to prevent the freeze when no files are 
installed is to manually run "ls" on the firmware directory.

> But we must first undersand the case well.

I will check this corner case again in depth tonight or tomorrow and 
will report my findings, at least as far as it is externally observable 
without in-depth understanding of the internals.


>>>> I'm not sure who really is to blame here:
>>>> - BTRFS (ext4 is fine after all)
>>>> - the firmware loader because the implementation or the documentation are
>>>> wrong
>>>> - si2168 because of not loading the firmware or calling
>>>> firmware_request_cache() before suspend. Also I don't think it is even
>>>> necessary to load the firmware during resume, it should be sufficient to
>>>> load the firmware when the tuner is used. I'm not sure though whether the
>>>> dvb-frontend driver structure allows to properly distinguish between
>>>> resuming and initialization before device usage. The problem can definitely
>>>> be worked around here until the root cause is fixed. I can provide a patch
>>>> if this solution is seen as appropriate.
>>>>
>>>> I'm putting all the maintainers and/or lists of the possible culprits on CC.
>>>
>>> Indeed send the patch to use firmware_request_cache() for si2168.
>>
>> I sent it to the media mailing list (split into two patches).
> 
> Looks good to me, except now that you metion the resuem thing, I think
> you can drop those patches if you do the check I mentioned on resume,
> ie, to verify that init hasn't been called yet.

As I have written above, I likely would have tried that, had I found a 
way to distinguish between calls to init during resume and other calls 
to that function.
Do you know of any safe way to detect that a resume operation is going on?

> Also, might as well rename that function to make it clearer what it does
> as init prefixed or postfixed calls tend to be used for things during
> initialization.

As a pointer to this function goes into the "init" member of the 
dvb_frontend_ops struct, the name seems to be appropriate, at least at 
the moment.

>>> The generic filesystem races on suspend/resume are known, and I will
>>> tackle that once I am done with some other task I am completing.
>>
>> Great to know that the underlying cause for the freeze is known and a
>> solution is being worked on (or at least planned to be worked on).
> 
> There is a sliver possibility here that a negative dentry slowdown may
> cause a btrfs freeze on suspend easily. If so that can likely be fixed
> before that big general filesystem freeze issue I noted with URLs and
> talks. But I think we need to confirm that is what is happening here.

This sounds even better, although I don't really understand your theory.

Lukas

  reply	other threads:[~2020-08-14 11:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-09 18:51 Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs? Lukas Middendorf
2020-08-13 16:37 ` Luis Chamberlain
2020-08-13 21:53   ` Lukas Middendorf
2020-08-13 22:13     ` Luis Chamberlain
2020-08-14 11:38       ` Lukas Middendorf [this message]
2020-08-14 16:37         ` Luis Chamberlain
2020-08-14 21:59           ` Lukas Middendorf
2020-08-17 15:20             ` Luis Chamberlain
2020-08-17 22:04               ` Lukas Middendorf
2020-08-18 14:37                 ` Luis Chamberlain
2021-04-01 14:59                   ` Lukas Middendorf
2021-04-02 18:02                     ` Luis Chamberlain
2021-04-02 22:19                       ` Luis Chamberlain
2021-04-02 22:58                         ` Luis Chamberlain
2021-04-03 10:24                           ` Lukas Middendorf
2021-04-03 16:07                             ` Lukas Middendorf
2021-04-03 20:25                             ` Luis Chamberlain
2021-04-03 21:04                               ` Luis Chamberlain
2021-04-05  9:52                                 ` Lukas Middendorf
2021-04-04  0:50                               ` Lukas Middendorf
2021-04-08 18:02                               ` Luis Chamberlain
2021-04-16 23:17                                 ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc887e06-874c-79d8-0607-4e27ae788446@tuxforce.de \
    --to=kernel@tuxforce.de \
    --cc=anand.jain@oracle.com \
    --cc=crope@iki.fi \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mchehab@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).