Re: [PATCHv2 0/3] Enable no_cache flag to driver_data

From: "Li, Yi" <yi1.li@linux.intel.com>
To: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: atull@kernel.org, gregkh@linuxfoundation.org, wagi@monom.org,
	dwmw2@infradead.org, rafal@milecki.pl,
	arend.vanspriel@broadcom.com, rjw@rjwysocki.net,
	moritz.fischer@ettus.com, pmladek@suse.com,
	johannes.berg@intel.com, emmanuel.grumbach@intel.com,
	luciano.coelho@intel.com, kvalo@codeaurora.org, luto@kernel.org,
	takahiro.akashi@linaro.org, dhowells@redhat.com,
	pjones@redhat.com, linux-kernel@vger.kernel.org,
	linux-fpga@vger.kernel.org
Subject: Re: [PATCHv2 0/3] Enable no_cache flag to driver_data
Date: Tue, 6 Jun 2017 14:31:54 -0500	[thread overview]
Message-ID: <bc29b730-def3-a422-8f64-60111f8c3c02@linux.intel.com> (raw)
In-Reply-To: <20170524190357.GB8951@wotan.suse.de>

Hi Luis,

On 5/24/2017 2:03 PM, Luis R. Rodriguez wrote:
> On Sat, May 20, 2017 at 01:46:56AM -0500, yi1.li@linux.intel.com wrote:
>> From: Yi Li <yi1.li@linux.intel.com>
>>
>> Changes in v2:
>>
>>    - Rebase to Luis R. Rodriguez's 20170501-driver-data-try2
>>      branch
>>    - Expose DRIVER_DATA_REQ_NO_CACHE flag to public
>>      driver_data_req_params structure, so upper drivers can ask
>>      driver_data driver to bypass the internal caching mechanism.
>>      This will be used for streaming and other drivers maintains
>>      their own caching like iwlwifi.
>>    - Add self test cases.
>>
>>
>> Yi Li (3):
>>    firmware_class: move NO_CACHE from private to driver_data_req_params
>>    iwlwifi: use DRIVER_DATA_REQ_NO_CACHE for driver_data
>>    test: add no_cache to driver_data load tester
>>
>>   drivers/base/firmware_class.c                   | 16 ++++-----
>>   drivers/net/wireless/intel/iwlwifi/iwl-drv.c    |  2 ++
>>   include/linux/driver_data.h                     |  4 +++
>>   lib/test_driver_data.c                          | 43 +++++++++++++++++++++++--
>>   tools/testing/selftests/firmware/driver_data.sh | 36 +++++++++++++++++++++
>>   5 files changed, 89 insertions(+), 12 deletions(-)
>>
> Good stuff, this series is looking good and very easy to read !  Only thing
> though -- the cache test is just setting up the cache and ensuring it gets set,
> it doesn't really *test* the cache is functional. Can you devise a test which
> does ensure the cache is functional ?
>
> We use the cache upon suspend to cache the firmware so that upon resume a
> request will use that cache, to avoid the file lookup on disk. Doing a test
> with qemu suspend + resume is possible but that requires having access to
> the qemu monitor interface and doing something like this to trigger a wakeup:
>
> echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl
>
> where /dev/pts/7 is assumed to be the PTY. This is rather complex for ksefltest,
> so another option is to add a debug mode debugfs interface for firmware_class where
> we can force-enable the cache mechanism without actually this being prompted by a
> suspend/hibernate. Then we can just toggle this bit from a testing perspective and
> excercise the caching mechanism.
>
> So if you look at fw_pm_notify()
>
>          switch (mode) {
>          case PM_HIBERNATION_PREPARE:
>          case PM_SUSPEND_PREPARE:
>          case PM_RESTORE_PREPARE:
>                  /*
>                   * kill pending fallback requests with a custom fallback
>                   * to avoid stalling suspend.
>                   */
>                  kill_pending_fw_fallback_reqs(true);
>                  device_cache_fw_images();
>                  disable_firmware();
>                  break;
>
I am studying at the firmware caching codes and have to say it's very 
complicated. :-( Here are some questions:

1. Since device_cache_fw_images invokes dev_cache_fw_image through 
dpm_for_each_dev, adding a debugfs driver to kick it can only cache 
firmware for those associated with devices which has PM enabled, which 
do not include the driver_data_test_device. Any suggestions?

2. Look into dev_cache_fw_image function, devres_for_each_res will walk through the firmware have been loaded before (through assign_firmware_buf -> fw_add_devm_name) and add to the todo list, eventually it will create the fw_names list. So in the test driver, we need to load the firmware once before calling the kick?
dev_cache_fw_image(struct device *dev, void *data)

{

LIST_HEAD(todo);

struct fw_cache_entry *fce;

struct fw_cache_entry *fce_next;

struct firmware_cache *fwc = &fw_cache;

devres_for_each_res(dev, fw_name_devm_release,

devm_name_match, &fw_cache,

dev_create_fw_entry, &todo);

list_for_each_entry_safe(fce, fce_next, &todo, list) {

list_del(&fce->list);

spin_lock(&fwc->name_lock);

/* only one cache entry for one firmware */

if (!__fw_entry_found(fce->name)) {

list_add(&fce->list, &fwc->fw_names);

} else {

free_fw_cache_entry(fce);

           ...
}
> kill_pending_fw_fallback_reqs(true), device_cache_fw_images(), and
> disable_firmware() could be folder into a helper, then a debugfs interface
> could kick that into action to put us cache mode as the
> device_cache_fw_images() changes the cache state to FW_LOADER_START_CACHE, and
> when this is done you'll notice that assign_firmware_buf() does:
>
>          /*
>           * After caching firmware image is started, let it piggyback
>           * on request firmware.
>           */
>          if (!driver_data_param_nocache(&data_params->priv_params) &&
>              buf->fwc->state == FW_LOADER_START_CACHE) {
>                  if (fw_cache_piggyback_on_request(buf->fw_id))
>                          kref_get(&buf->ref);
>          }
>
> Which adds an incoming request to the cache. The first request that adds this cache
> entry would be triggered by device_cache_fw_images() after the cache state is enabled:
>
> static void device_cache_fw_images(void)
> {
> 	...
>          fwc->state = FW_LOADER_START_CACHE;
>          dpm_for_each_dev(NULL, dev_cache_fw_image);
> 	...
> }
This only applies to the devices have PM enabled.
> Subsequent requests then lookup for the cache through _request_firmware_prepare()
> when fw_lookup_and_allocate_buf() is called. __fw_lookup_buf() really should be
> renamed to something that reflects this is a cache lookup. In fact if you find
> anything else that needs renaming to make it clear please feel free to send patches
> for it.
>
> We want to test that when caching is enabled, the cache is actually used.
>
> Note that disable_firmware() above on the notifier *does* disable subsequent firmware
> lookups but this is only *if* the lookup fails with _request_firmware_prepare():
>
> _request_firmware(const struct firmware **firmware_p, const char *name,
>                    struct driver_data_params *data_params,
>                    struct device *device)
> {
>          struct firmware *fw = NULL;
>          int ret;
>                                                                                  
>          if (!firmware_p)
>                  return -EINVAL;
>                                                                                  
>          if (!name || name[0] == '\0') {
>                  ret = -EINVAL;
>                  goto out;
>          }
>                                                                                  
>          ret = _request_firmware_prepare(&fw, name, device, data_params);
>          if (ret <= 0) /* error or already assigned */
>                  goto out;
>                                                                                  
>          if (!firmware_enabled()) {
>                  WARN(1, "firmware request while host is not available\n");
>                  ret = -EHOSTDOWN;
>                  goto out;
>          }
> 	...
> }
>
> The idea is that _request_firmware_prepare() will have picked up the cache and
> enabled use of that while the infrastructure for disk lookups is disabled. The
> caching effect is lifted later on the same notifier fw_pm_notify() and it also
> schedules a clearing of the cached firmwares with device_uncache_fw_images_delay().

device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which means the hibernation restore got error. How about the successful restore case, calling driver will free its firmware_buf after loading?

> To me this last part smells like a possible source of issue (not sure) if we might
> suspend/resume a lot in short period of time, this theory could be tested by toggling
> on/of this debugfs interface I suggested while having requests of different types
> blast in.
Agree, it might create an issue if the system is get into 
restore_prepare again before the device_uncache_fw_images_delay clear 
the cache, why we need the 10 * MSEC_PER_SEC delay? In theory, 
fwc->name_lock should protect the case though.
>
> This is the sort of testing which would really help here.
>
> Likewise, if you are extending functionality please consider ways to break it :)
Understand the need to test the firmware caching part. For non-caching 
test case, will it be enough if we can test that the noncache setting 
will ban the firmware name be added to the fwc->fw_names list?
> and test against it. Please think about these things carefully, its what will
> change the stability for the better long term of our loader infrastructure.
>
>    Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fpga" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>