All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
@ 2023-03-29 20:21 Dennis Zhou
  2023-03-29 22:56 ` kernel test robot
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-03-29 20:21 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: linux-mmc, linux-kernel, Dennis Zhou

When using dm-verity with a data partition on an emmc device, dm-verity
races with the discovery of attached emmc devices. This is because mmc's
probing code sets up the host data structure then a work item is
scheduled to do discovery afterwards. To prevent this race on init,
let's inline the first call to detection, __mm_scan(), and let
subsequent detect calls be handled via the workqueue.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 drivers/mmc/core/core.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 368f10405e13..c0fdc438c882 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
 }
 EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
 
-void mmc_rescan(struct work_struct *work)
+void __mmc_rescan(struct mmc_host *host)
 {
-	struct mmc_host *host =
-		container_of(work, struct mmc_host, detect.work);
 	int i;
 
 	if (host->rescan_disable)
@@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
 		mmc_schedule_delayed_work(&host->detect, HZ);
 }
 
+void mmc_rescan(struct work_struct *work)
+{
+	struct mmc_host *host =
+		container_of(work, struct mmc_host, detect.work);
+
+	__mmc_rescan(host);
+}
+
 void mmc_start_host(struct mmc_host *host)
 {
 	host->f_init = max(min(freqs[0], host->f_max), host->f_min);
@@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
 	}
 
 	mmc_gpiod_request_cd_irq(host);
-	_mmc_detect_change(host, 0, false);
+	host->detect_change = 1;
+	__mmc_rescan(host);
 }
 
 void __mmc_stop_host(struct mmc_host *host)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-29 20:21 [PATCH] mmc: inline the first mmc_scan() on mmc_start_host() Dennis Zhou
@ 2023-03-29 22:56 ` kernel test robot
  2023-03-29 23:36 ` kernel test robot
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2023-03-29 22:56 UTC (permalink / raw)
  To: Dennis Zhou, Ulf Hansson
  Cc: oe-kbuild-all, linux-mmc, linux-kernel, Dennis Zhou

Hi Dennis,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ulf-hansson-mmc-mirror/next]
[also build test WARNING on linus/master v6.3-rc4 next-20230329]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
base:   https://git.linaro.org/people/ulf.hansson/mmc-mirror.git next
patch link:    https://lore.kernel.org/r/20230329202148.71107-1-dennis%40kernel.org
patch subject: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
config: arm-randconfig-r046-20230329 (https://download.01.org/0day-ci/archive/20230330/202303300639.UEUl5xWY-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/d2de7314d2198df0c7a452546af0c15799b2d864
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
        git checkout d2de7314d2198df0c7a452546af0c15799b2d864
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/mmc/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303300639.UEUl5xWY-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/mmc/core/core.c:2202:6: warning: no previous prototype for '__mmc_rescan' [-Wmissing-prototypes]
    2202 | void __mmc_rescan(struct mmc_host *host)
         |      ^~~~~~~~~~~~


vim +/__mmc_rescan +2202 drivers/mmc/core/core.c

  2201	
> 2202	void __mmc_rescan(struct mmc_host *host)
  2203	{
  2204		int i;
  2205	
  2206		if (host->rescan_disable)
  2207			return;
  2208	
  2209		/* If there is a non-removable card registered, only scan once */
  2210		if (!mmc_card_is_removable(host) && host->rescan_entered)
  2211			return;
  2212		host->rescan_entered = 1;
  2213	
  2214		if (host->trigger_card_event && host->ops->card_event) {
  2215			mmc_claim_host(host);
  2216			host->ops->card_event(host);
  2217			mmc_release_host(host);
  2218			host->trigger_card_event = false;
  2219		}
  2220	
  2221		/* Verify a registered card to be functional, else remove it. */
  2222		if (host->bus_ops)
  2223			host->bus_ops->detect(host);
  2224	
  2225		host->detect_change = 0;
  2226	
  2227		/* if there still is a card present, stop here */
  2228		if (host->bus_ops != NULL)
  2229			goto out;
  2230	
  2231		mmc_claim_host(host);
  2232		if (mmc_card_is_removable(host) && host->ops->get_cd &&
  2233				host->ops->get_cd(host) == 0) {
  2234			mmc_power_off(host);
  2235			mmc_release_host(host);
  2236			goto out;
  2237		}
  2238	
  2239		/* If an SD express card is present, then leave it as is. */
  2240		if (mmc_card_sd_express(host)) {
  2241			mmc_release_host(host);
  2242			goto out;
  2243		}
  2244	
  2245		for (i = 0; i < ARRAY_SIZE(freqs); i++) {
  2246			unsigned int freq = freqs[i];
  2247			if (freq > host->f_max) {
  2248				if (i + 1 < ARRAY_SIZE(freqs))
  2249					continue;
  2250				freq = host->f_max;
  2251			}
  2252			if (!mmc_rescan_try_freq(host, max(freq, host->f_min)))
  2253				break;
  2254			if (freqs[i] <= host->f_min)
  2255				break;
  2256		}
  2257	
  2258		/*
  2259		 * Ignore the command timeout errors observed during
  2260		 * the card init as those are excepted.
  2261		 */
  2262		host->err_stats[MMC_ERR_CMD_TIMEOUT] = 0;
  2263		mmc_release_host(host);
  2264	
  2265	 out:
  2266		if (host->caps & MMC_CAP_NEEDS_POLL)
  2267			mmc_schedule_delayed_work(&host->detect, HZ);
  2268	}
  2269	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-29 20:21 [PATCH] mmc: inline the first mmc_scan() on mmc_start_host() Dennis Zhou
  2023-03-29 22:56 ` kernel test robot
@ 2023-03-29 23:36 ` kernel test robot
  2023-03-29 23:48 ` [PATCH v2] " Dennis Zhou
  2023-06-13 14:25 ` [PATCH] " Ulf Hansson
  3 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2023-03-29 23:36 UTC (permalink / raw)
  To: Dennis Zhou, Ulf Hansson
  Cc: llvm, oe-kbuild-all, linux-mmc, linux-kernel, Dennis Zhou

Hi Dennis,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ulf-hansson-mmc-mirror/next]
[also build test WARNING on linus/master v6.3-rc4 next-20230329]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
base:   https://git.linaro.org/people/ulf.hansson/mmc-mirror.git next
patch link:    https://lore.kernel.org/r/20230329202148.71107-1-dennis%40kernel.org
patch subject: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
config: hexagon-randconfig-r045-20230329 (https://download.01.org/0day-ci/archive/20230330/202303300728.pAr0H6ZJ-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/d2de7314d2198df0c7a452546af0c15799b2d864
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
        git checkout d2de7314d2198df0c7a452546af0c15799b2d864
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/mmc/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303300728.pAr0H6ZJ-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/mmc/core/core.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from drivers/mmc/core/core.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from drivers/mmc/core/core.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> drivers/mmc/core/core.c:2202:6: warning: no previous prototype for function '__mmc_rescan' [-Wmissing-prototypes]
   void __mmc_rescan(struct mmc_host *host)
        ^
   drivers/mmc/core/core.c:2202:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __mmc_rescan(struct mmc_host *host)
   ^
   static 
   7 warnings generated.


vim +/__mmc_rescan +2202 drivers/mmc/core/core.c

  2201	
> 2202	void __mmc_rescan(struct mmc_host *host)
  2203	{
  2204		int i;
  2205	
  2206		if (host->rescan_disable)
  2207			return;
  2208	
  2209		/* If there is a non-removable card registered, only scan once */
  2210		if (!mmc_card_is_removable(host) && host->rescan_entered)
  2211			return;
  2212		host->rescan_entered = 1;
  2213	
  2214		if (host->trigger_card_event && host->ops->card_event) {
  2215			mmc_claim_host(host);
  2216			host->ops->card_event(host);
  2217			mmc_release_host(host);
  2218			host->trigger_card_event = false;
  2219		}
  2220	
  2221		/* Verify a registered card to be functional, else remove it. */
  2222		if (host->bus_ops)
  2223			host->bus_ops->detect(host);
  2224	
  2225		host->detect_change = 0;
  2226	
  2227		/* if there still is a card present, stop here */
  2228		if (host->bus_ops != NULL)
  2229			goto out;
  2230	
  2231		mmc_claim_host(host);
  2232		if (mmc_card_is_removable(host) && host->ops->get_cd &&
  2233				host->ops->get_cd(host) == 0) {
  2234			mmc_power_off(host);
  2235			mmc_release_host(host);
  2236			goto out;
  2237		}
  2238	
  2239		/* If an SD express card is present, then leave it as is. */
  2240		if (mmc_card_sd_express(host)) {
  2241			mmc_release_host(host);
  2242			goto out;
  2243		}
  2244	
  2245		for (i = 0; i < ARRAY_SIZE(freqs); i++) {
  2246			unsigned int freq = freqs[i];
  2247			if (freq > host->f_max) {
  2248				if (i + 1 < ARRAY_SIZE(freqs))
  2249					continue;
  2250				freq = host->f_max;
  2251			}
  2252			if (!mmc_rescan_try_freq(host, max(freq, host->f_min)))
  2253				break;
  2254			if (freqs[i] <= host->f_min)
  2255				break;
  2256		}
  2257	
  2258		/*
  2259		 * Ignore the command timeout errors observed during
  2260		 * the card init as those are excepted.
  2261		 */
  2262		host->err_stats[MMC_ERR_CMD_TIMEOUT] = 0;
  2263		mmc_release_host(host);
  2264	
  2265	 out:
  2266		if (host->caps & MMC_CAP_NEEDS_POLL)
  2267			mmc_schedule_delayed_work(&host->detect, HZ);
  2268	}
  2269	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-29 20:21 [PATCH] mmc: inline the first mmc_scan() on mmc_start_host() Dennis Zhou
  2023-03-29 22:56 ` kernel test robot
  2023-03-29 23:36 ` kernel test robot
@ 2023-03-29 23:48 ` Dennis Zhou
  2023-03-31 12:43   ` Ulf Hansson
  2023-06-27 17:20   ` Geert Uytterhoeven
  2023-06-13 14:25 ` [PATCH] " Ulf Hansson
  3 siblings, 2 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-03-29 23:48 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: linux-mmc, linux-kernel

When using dm-verity with a data partition on an emmc device, dm-verity
races with the discovery of attached emmc devices. This is because mmc's
probing code sets up the host data structure then a work item is
scheduled to do discovery afterwards. To prevent this race on init,
let's inline the first call to detection, __mm_scan(), and let
subsequent detect calls be handled via the workqueue.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
Sigh.. fix missing static declaration.

 drivers/mmc/core/core.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 368f10405e13..fda7ee57dee3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
 }
 EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
 
-void mmc_rescan(struct work_struct *work)
+static void __mmc_rescan(struct mmc_host *host)
 {
-	struct mmc_host *host =
-		container_of(work, struct mmc_host, detect.work);
 	int i;
 
 	if (host->rescan_disable)
@@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
 		mmc_schedule_delayed_work(&host->detect, HZ);
 }
 
+void mmc_rescan(struct work_struct *work)
+{
+	struct mmc_host *host =
+		container_of(work, struct mmc_host, detect.work);
+
+	__mmc_rescan(host);
+}
+
 void mmc_start_host(struct mmc_host *host)
 {
 	host->f_init = max(min(freqs[0], host->f_max), host->f_min);
@@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
 	}
 
 	mmc_gpiod_request_cd_irq(host);
-	_mmc_detect_change(host, 0, false);
+	host->detect_change = 1;
+	__mmc_rescan(host);
 }
 
 void __mmc_stop_host(struct mmc_host *host)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-29 23:48 ` [PATCH v2] " Dennis Zhou
@ 2023-03-31 12:43   ` Ulf Hansson
  2023-03-31 18:23     ` Dennis Zhou
  2023-06-27 17:20   ` Geert Uytterhoeven
  1 sibling, 1 reply; 24+ messages in thread
From: Ulf Hansson @ 2023-03-31 12:43 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: linux-mmc, linux-kernel

On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
>
> When using dm-verity with a data partition on an emmc device, dm-verity
> races with the discovery of attached emmc devices. This is because mmc's
> probing code sets up the host data structure then a work item is
> scheduled to do discovery afterwards. To prevent this race on init,
> let's inline the first call to detection, __mm_scan(), and let
> subsequent detect calls be handled via the workqueue.

In principle, I don't mind the changes in $subject patch, as long as
it doesn't hurt the overall initialization/boot time. Especially, we
may have more than one mmc-slot being used, so this needs to be well
tested.

Although, more importantly, I fail to understand how this is going to
solve the race condition. Any I/O request to an eMMC or SD requires
the mmc block device driver to be up and running too, which is getting
probed from a separate module/driver that's not part of mmc_rescan().

Kind regards
Uffe

>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
> Sigh.. fix missing static declaration.
>
>  drivers/mmc/core/core.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 368f10405e13..fda7ee57dee3 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
>  }
>  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
>
> -void mmc_rescan(struct work_struct *work)
> +static void __mmc_rescan(struct mmc_host *host)
>  {
> -       struct mmc_host *host =
> -               container_of(work, struct mmc_host, detect.work);
>         int i;
>
>         if (host->rescan_disable)
> @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
>                 mmc_schedule_delayed_work(&host->detect, HZ);
>  }
>
> +void mmc_rescan(struct work_struct *work)
> +{
> +       struct mmc_host *host =
> +               container_of(work, struct mmc_host, detect.work);
> +
> +       __mmc_rescan(host);
> +}
> +
>  void mmc_start_host(struct mmc_host *host)
>  {
>         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
>         }
>
>         mmc_gpiod_request_cd_irq(host);
> -       _mmc_detect_change(host, 0, false);
> +       host->detect_change = 1;
> +       __mmc_rescan(host);
>  }
>
>  void __mmc_stop_host(struct mmc_host *host)
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-31 12:43   ` Ulf Hansson
@ 2023-03-31 18:23     ` Dennis Zhou
  2023-04-03  9:50       ` Ulf Hansson
  0 siblings, 1 reply; 24+ messages in thread
From: Dennis Zhou @ 2023-03-31 18:23 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: linux-mmc, linux-kernel

Hi Ulf,

On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> >
> > When using dm-verity with a data partition on an emmc device, dm-verity
> > races with the discovery of attached emmc devices. This is because mmc's
> > probing code sets up the host data structure then a work item is
> > scheduled to do discovery afterwards. To prevent this race on init,
> > let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> 
> In principle, I don't mind the changes in $subject patch, as long as
> it doesn't hurt the overall initialization/boot time. Especially, we
> may have more than one mmc-slot being used, so this needs to be well
> tested.
> 

I unfortunately don't have a device with multiple mmcs available. Is
this something you could help me with?

> Although, more importantly, I fail to understand how this is going to
> solve the race condition. Any I/O request to an eMMC or SD requires
> the mmc block device driver to be up and running too, which is getting
> probed from a separate module/driver that's not part of mmc_rescan().

I believe the call chain is something like this:

__mmc_rescan()
    mmc_rescan_try_freq()
        mmc_attach_mmc()
            mmc_add_card()
                device_add()
                    bus_probe_device()
                        mmc_blk_probe()

The initial calling of this is the host probe. So effectively if there
is a card attached, we're inlining the device_add() call for the card
attached rather than waiting for the workqueue item to kick off.

dm is a part of late_initcall() while mmc is a module_init(), when built
in becoming a device_initcall(). So this solves a race via the initcall
chain. In the current state, device_initcall() finishes and we move onto
the late_initcall() phase. But now, dm is racing with the workqueue to
init the attached emmc device.

Thanks,
Dennis

> 
> Kind regards
> Uffe
> 
> >
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > ---
> > Sigh.. fix missing static declaration.
> >
> >  drivers/mmc/core/core.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > index 368f10405e13..fda7ee57dee3 100644
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> >  }
> >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +static void __mmc_rescan(struct mmc_host *host)
> >  {
> > -       struct mmc_host *host =
> > -               container_of(work, struct mmc_host, detect.work);
> >         int i;
> >
> >         if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> >                 mmc_schedule_delayed_work(&host->detect, HZ);
> >  }
> >
> > +void mmc_rescan(struct work_struct *work)
> > +{
> > +       struct mmc_host *host =
> > +               container_of(work, struct mmc_host, detect.work);
> > +
> > +       __mmc_rescan(host);
> > +}
> > +
> >  void mmc_start_host(struct mmc_host *host)
> >  {
> >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> >         }
> >
> >         mmc_gpiod_request_cd_irq(host);
> > -       _mmc_detect_change(host, 0, false);
> > +       host->detect_change = 1;
> > +       __mmc_rescan(host);
> >  }
> >
> >  void __mmc_stop_host(struct mmc_host *host)
> > --
> > 2.40.0
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-31 18:23     ` Dennis Zhou
@ 2023-04-03  9:50       ` Ulf Hansson
  2023-04-07  8:24         ` Dennis Zhou
  2023-05-12 11:42         ` Ulf Hansson
  0 siblings, 2 replies; 24+ messages in thread
From: Ulf Hansson @ 2023-04-03  9:50 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: linux-mmc, linux-kernel

On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
>
> Hi Ulf,
>
> On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > >
> > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > races with the discovery of attached emmc devices. This is because mmc's
> > > probing code sets up the host data structure then a work item is
> > > scheduled to do discovery afterwards. To prevent this race on init,
> > > let's inline the first call to detection, __mm_scan(), and let
> > > subsequent detect calls be handled via the workqueue.
> >
> > In principle, I don't mind the changes in $subject patch, as long as
> > it doesn't hurt the overall initialization/boot time. Especially, we
> > may have more than one mmc-slot being used, so this needs to be well
> > tested.
> >
>
> I unfortunately don't have a device with multiple mmcs available. Is
> this something you could help me with?

Yes, I can help to test. Allow me a few days to see what I can do.

Note that, just having one eMMC and one SD card should work too. It
doesn't have to be multiple eMMCs.

>
> > Although, more importantly, I fail to understand how this is going to
> > solve the race condition. Any I/O request to an eMMC or SD requires
> > the mmc block device driver to be up and running too, which is getting
> > probed from a separate module/driver that's not part of mmc_rescan().
>
> I believe the call chain is something like this:
>
> __mmc_rescan()
>     mmc_rescan_try_freq()
>         mmc_attach_mmc()
>             mmc_add_card()
>                 device_add()
>                     bus_probe_device()
>                         mmc_blk_probe()
>
> The initial calling of this is the host probe. So effectively if there
> is a card attached, we're inlining the device_add() call for the card
> attached rather than waiting for the workqueue item to kick off.
>
> dm is a part of late_initcall() while mmc is a module_init(), when built
> in becoming a device_initcall(). So this solves a race via the initcall
> chain. In the current state, device_initcall() finishes and we move onto
> the late_initcall() phase. But now, dm is racing with the workqueue to
> init the attached emmc device.

You certainly have a point!

This should work when the mmc blk module is built-in. Even if that
doesn't solve the entire problem, it should be a step in the right
direction.

I will give it some more thinking and run some tests at my side, then
I will get back to you again.

Kind regards
Uffe

> >
> > >
> > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > ---
> > > Sigh.. fix missing static declaration.
> > >
> > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > index 368f10405e13..fda7ee57dee3 100644
> > > --- a/drivers/mmc/core/core.c
> > > +++ b/drivers/mmc/core/core.c
> > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > >  }
> > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > >
> > > -void mmc_rescan(struct work_struct *work)
> > > +static void __mmc_rescan(struct mmc_host *host)
> > >  {
> > > -       struct mmc_host *host =
> > > -               container_of(work, struct mmc_host, detect.work);
> > >         int i;
> > >
> > >         if (host->rescan_disable)
> > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > >  }
> > >
> > > +void mmc_rescan(struct work_struct *work)
> > > +{
> > > +       struct mmc_host *host =
> > > +               container_of(work, struct mmc_host, detect.work);
> > > +
> > > +       __mmc_rescan(host);
> > > +}
> > > +
> > >  void mmc_start_host(struct mmc_host *host)
> > >  {
> > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > >         }
> > >
> > >         mmc_gpiod_request_cd_irq(host);
> > > -       _mmc_detect_change(host, 0, false);
> > > +       host->detect_change = 1;
> > > +       __mmc_rescan(host);
> > >  }
> > >
> > >  void __mmc_stop_host(struct mmc_host *host)
> > > --
> > > 2.40.0
> > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-04-03  9:50       ` Ulf Hansson
@ 2023-04-07  8:24         ` Dennis Zhou
  2023-04-11 20:29           ` Dennis Zhou
  2023-04-12 11:05           ` Ulf Hansson
  2023-05-12 11:42         ` Ulf Hansson
  1 sibling, 2 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-04-07  8:24 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: linux-mmc, linux-kernel

On Mon, Apr 03, 2023 at 11:50:41AM +0200, Ulf Hansson wrote:
> On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> >
> > Hi Ulf,
> >
> > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > >
> > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > probing code sets up the host data structure then a work item is
> > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > let's inline the first call to detection, __mm_scan(), and let
> > > > subsequent detect calls be handled via the workqueue.
> > >
> > > In principle, I don't mind the changes in $subject patch, as long as
> > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > may have more than one mmc-slot being used, so this needs to be well
> > > tested.
> > >
> >
> > I unfortunately don't have a device with multiple mmcs available. Is
> > this something you could help me with?
> 
> Yes, I can help to test. Allow me a few days to see what I can do.
> 
> Note that, just having one eMMC and one SD card should work too. It
> doesn't have to be multiple eMMCs.
> 
> >
> > > Although, more importantly, I fail to understand how this is going to
> > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > the mmc block device driver to be up and running too, which is getting
> > > probed from a separate module/driver that's not part of mmc_rescan().
> >
> > I believe the call chain is something like this:
> >
> > __mmc_rescan()
> >     mmc_rescan_try_freq()
> >         mmc_attach_mmc()
> >             mmc_add_card()
> >                 device_add()
> >                     bus_probe_device()
> >                         mmc_blk_probe()
> >
> > The initial calling of this is the host probe. So effectively if there
> > is a card attached, we're inlining the device_add() call for the card
> > attached rather than waiting for the workqueue item to kick off.
> >
> > dm is a part of late_initcall() while mmc is a module_init(), when built
> > in becoming a device_initcall(). So this solves a race via the initcall
> > chain. In the current state, device_initcall() finishes and we move onto
> > the late_initcall() phase. But now, dm is racing with the workqueue to
> > init the attached emmc device.
> 
> You certainly have a point!
> 
> This should work when the mmc blk module is built-in. Even if that
> doesn't solve the entire problem, it should be a step in the right
> direction.
> 
> I will give it some more thinking and run some tests at my side, then
> I will get back to you again.
> 

Hi Ulf, is there an update on testing with this patch?

Thanks,
Dennis

> Kind regards
> Uffe
> 
> > >
> > > >
> > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > ---
> > > > Sigh.. fix missing static declaration.
> > > >
> > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > index 368f10405e13..fda7ee57dee3 100644
> > > > --- a/drivers/mmc/core/core.c
> > > > +++ b/drivers/mmc/core/core.c
> > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > >  }
> > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > >
> > > > -void mmc_rescan(struct work_struct *work)
> > > > +static void __mmc_rescan(struct mmc_host *host)
> > > >  {
> > > > -       struct mmc_host *host =
> > > > -               container_of(work, struct mmc_host, detect.work);
> > > >         int i;
> > > >
> > > >         if (host->rescan_disable)
> > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > >  }
> > > >
> > > > +void mmc_rescan(struct work_struct *work)
> > > > +{
> > > > +       struct mmc_host *host =
> > > > +               container_of(work, struct mmc_host, detect.work);
> > > > +
> > > > +       __mmc_rescan(host);
> > > > +}
> > > > +
> > > >  void mmc_start_host(struct mmc_host *host)
> > > >  {
> > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > >         }
> > > >
> > > >         mmc_gpiod_request_cd_irq(host);
> > > > -       _mmc_detect_change(host, 0, false);
> > > > +       host->detect_change = 1;
> > > > +       __mmc_rescan(host);
> > > >  }
> > > >
> > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > --
> > > > 2.40.0
> > > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-04-07  8:24         ` Dennis Zhou
@ 2023-04-11 20:29           ` Dennis Zhou
  2023-04-12 11:05           ` Ulf Hansson
  1 sibling, 0 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-04-11 20:29 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: linux-mmc, linux-kernel

On Fri, Apr 07, 2023 at 01:24:15AM -0700, Dennis Zhou wrote:
> On Mon, Apr 03, 2023 at 11:50:41AM +0200, Ulf Hansson wrote:
> > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > >
> > > Hi Ulf,
> > >
> > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > >
> > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > probing code sets up the host data structure then a work item is
> > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > subsequent detect calls be handled via the workqueue.
> > > >
> > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > may have more than one mmc-slot being used, so this needs to be well
> > > > tested.
> > > >
> > >
> > > I unfortunately don't have a device with multiple mmcs available. Is
> > > this something you could help me with?
> > 
> > Yes, I can help to test. Allow me a few days to see what I can do.
> > 
> > Note that, just having one eMMC and one SD card should work too. It
> > doesn't have to be multiple eMMCs.
> > 
> > >
> > > > Although, more importantly, I fail to understand how this is going to
> > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > the mmc block device driver to be up and running too, which is getting
> > > > probed from a separate module/driver that's not part of mmc_rescan().
> > >
> > > I believe the call chain is something like this:
> > >
> > > __mmc_rescan()
> > >     mmc_rescan_try_freq()
> > >         mmc_attach_mmc()
> > >             mmc_add_card()
> > >                 device_add()
> > >                     bus_probe_device()
> > >                         mmc_blk_probe()
> > >
> > > The initial calling of this is the host probe. So effectively if there
> > > is a card attached, we're inlining the device_add() call for the card
> > > attached rather than waiting for the workqueue item to kick off.
> > >
> > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > in becoming a device_initcall(). So this solves a race via the initcall
> > > chain. In the current state, device_initcall() finishes and we move onto
> > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > init the attached emmc device.
> > 
> > You certainly have a point!
> > 
> > This should work when the mmc blk module is built-in. Even if that
> > doesn't solve the entire problem, it should be a step in the right
> > direction.
> > 
> > I will give it some more thinking and run some tests at my side, then
> > I will get back to you again.
> > 
> 
> Hi Ulf, is there an update on testing with this patch?
> 

Ping.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-04-07  8:24         ` Dennis Zhou
  2023-04-11 20:29           ` Dennis Zhou
@ 2023-04-12 11:05           ` Ulf Hansson
  1 sibling, 0 replies; 24+ messages in thread
From: Ulf Hansson @ 2023-04-12 11:05 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: linux-mmc, linux-kernel

On Fri, 7 Apr 2023 at 10:24, Dennis Zhou <dennis@kernel.org> wrote:
>
> On Mon, Apr 03, 2023 at 11:50:41AM +0200, Ulf Hansson wrote:
> > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > >
> > > Hi Ulf,
> > >
> > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > >
> > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > probing code sets up the host data structure then a work item is
> > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > subsequent detect calls be handled via the workqueue.
> > > >
> > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > may have more than one mmc-slot being used, so this needs to be well
> > > > tested.
> > > >
> > >
> > > I unfortunately don't have a device with multiple mmcs available. Is
> > > this something you could help me with?
> >
> > Yes, I can help to test. Allow me a few days to see what I can do.
> >
> > Note that, just having one eMMC and one SD card should work too. It
> > doesn't have to be multiple eMMCs.
> >
> > >
> > > > Although, more importantly, I fail to understand how this is going to
> > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > the mmc block device driver to be up and running too, which is getting
> > > > probed from a separate module/driver that's not part of mmc_rescan().
> > >
> > > I believe the call chain is something like this:
> > >
> > > __mmc_rescan()
> > >     mmc_rescan_try_freq()
> > >         mmc_attach_mmc()
> > >             mmc_add_card()
> > >                 device_add()
> > >                     bus_probe_device()
> > >                         mmc_blk_probe()
> > >
> > > The initial calling of this is the host probe. So effectively if there
> > > is a card attached, we're inlining the device_add() call for the card
> > > attached rather than waiting for the workqueue item to kick off.
> > >
> > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > in becoming a device_initcall(). So this solves a race via the initcall
> > > chain. In the current state, device_initcall() finishes and we move onto
> > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > init the attached emmc device.
> >
> > You certainly have a point!
> >
> > This should work when the mmc blk module is built-in. Even if that
> > doesn't solve the entire problem, it should be a step in the right
> > direction.
> >
> > I will give it some more thinking and run some tests at my side, then
> > I will get back to you again.
> >
>
> Hi Ulf, is there an update on testing with this patch?

Sorry, it's a busy period for me and I expect it to remain like that
for another couple of weeks.

I will try to squeeze in some time for this, but no promises. Sorry.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-04-03  9:50       ` Ulf Hansson
  2023-04-07  8:24         ` Dennis Zhou
@ 2023-05-12 11:42         ` Ulf Hansson
  2023-06-08 20:49           ` Dennis Zhou
  1 sibling, 1 reply; 24+ messages in thread
From: Ulf Hansson @ 2023-05-12 11:42 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: linux-mmc, linux-kernel, Linus Walleij

+ Linus,

Hi Dennis,

On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> >
> > Hi Ulf,
> >
> > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > >
> > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > probing code sets up the host data structure then a work item is
> > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > let's inline the first call to detection, __mm_scan(), and let
> > > > subsequent detect calls be handled via the workqueue.
> > >
> > > In principle, I don't mind the changes in $subject patch, as long as
> > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > may have more than one mmc-slot being used, so this needs to be well
> > > tested.
> > >
> >
> > I unfortunately don't have a device with multiple mmcs available. Is
> > this something you could help me with?
>
> Yes, I can help to test. Allow me a few days to see what I can do.
>
> Note that, just having one eMMC and one SD card should work too. It
> doesn't have to be multiple eMMCs.
>
> >
> > > Although, more importantly, I fail to understand how this is going to
> > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > the mmc block device driver to be up and running too, which is getting
> > > probed from a separate module/driver that's not part of mmc_rescan().
> >
> > I believe the call chain is something like this:
> >
> > __mmc_rescan()
> >     mmc_rescan_try_freq()
> >         mmc_attach_mmc()
> >             mmc_add_card()
> >                 device_add()
> >                     bus_probe_device()
> >                         mmc_blk_probe()
> >
> > The initial calling of this is the host probe. So effectively if there
> > is a card attached, we're inlining the device_add() call for the card
> > attached rather than waiting for the workqueue item to kick off.
> >
> > dm is a part of late_initcall() while mmc is a module_init(), when built
> > in becoming a device_initcall(). So this solves a race via the initcall
> > chain. In the current state, device_initcall() finishes and we move onto
> > the late_initcall() phase. But now, dm is racing with the workqueue to
> > init the attached emmc device.
>
> You certainly have a point!
>
> This should work when the mmc blk module is built-in. Even if that
> doesn't solve the entire problem, it should be a step in the right
> direction.
>
> I will give it some more thinking and run some tests at my side, then
> I will get back to you again.
>
> Kind regards
> Uffe
>
> > >
> > > >
> > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > ---
> > > > Sigh.. fix missing static declaration.
> > > >
> > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > index 368f10405e13..fda7ee57dee3 100644
> > > > --- a/drivers/mmc/core/core.c
> > > > +++ b/drivers/mmc/core/core.c
> > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > >  }
> > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > >
> > > > -void mmc_rescan(struct work_struct *work)
> > > > +static void __mmc_rescan(struct mmc_host *host)
> > > >  {
> > > > -       struct mmc_host *host =
> > > > -               container_of(work, struct mmc_host, detect.work);
> > > >         int i;
> > > >
> > > >         if (host->rescan_disable)
> > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > >  }
> > > >
> > > > +void mmc_rescan(struct work_struct *work)
> > > > +{
> > > > +       struct mmc_host *host =
> > > > +               container_of(work, struct mmc_host, detect.work);
> > > > +
> > > > +       __mmc_rescan(host);
> > > > +}
> > > > +
> > > >  void mmc_start_host(struct mmc_host *host)
> > > >  {
> > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > >         }
> > > >
> > > >         mmc_gpiod_request_cd_irq(host);
> > > > -       _mmc_detect_change(host, 0, false);
> > > > +       host->detect_change = 1;
> > > > +       __mmc_rescan(host);
> > > >  }
> > > >
> > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > --
> > > > 2.40.0
> > > >

My apologies for the long delay. I finally managed to test this.

I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
based upon the ux500 SoC. It's quite good to use for these types of
tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
SD-card slot. So in total there are 4 devices that get probed.

The SDIO card isn't detected properly, but always fails in the similar
way (thus I left it out from the below data). I tested both with and
without an SD card inserted during boot, to get some more data to
compare. These are the summary from my tests:

v6.4-rc1 without SD card:
~2.18s - MMC1 (eMMC)
~3.33s - MMC3 (eMMC)
~5.91s - kernel boot complete

v6.4-rc1 with an SD card:
~2.18s - MMC1 (eMMC)
~3.45s - MMC3 (eMMC)
~3.57s - MMC2 (SD)
~5.76s - kernel boot complete

v6.4-rc1 + patch without SD card:
~2.24s - MMC1 (eMMC)
~3.58s - MMC3 (eMMC)
~5.96s - kernel boot complete

v6.4-rc1 + patch with an SD card:
~2.24s - MMC1 (eMMC)
~3.73s - MMC2 (SD)
~3.98s - MMC3 (eMMC)
~6.73s - kernel boot complete

By looking at these results, I was kind of surprised. I was thinking
that the asynchronous probe should address the parallelism problem.
Then I discovered that it in fact, hasn't been enabled for the mmci
driver that is being used for this platform. Huh, I was under the
assumption that it has been enabled for all mmc hosts by now. :-)

Okay, so I am going to run another round of tests, with async probe
enabled for the mmci driver too. I will let you know the results as
soon as I can.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-05-12 11:42         ` Ulf Hansson
@ 2023-06-08 20:49           ` Dennis Zhou
  2023-06-09  6:19             ` Greg KH
  0 siblings, 1 reply; 24+ messages in thread
From: Dennis Zhou @ 2023-06-08 20:49 UTC (permalink / raw)
  To: Ulf Hansson, Greg KH; +Cc: Dennis Zhou, linux-mmc, linux-kernel, Linus Walleij

On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> + Linus,
> 
> Hi Dennis,
> 
> On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > >
> > > Hi Ulf,
> > >
> > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > >
> > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > probing code sets up the host data structure then a work item is
> > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > subsequent detect calls be handled via the workqueue.
> > > >
> > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > may have more than one mmc-slot being used, so this needs to be well
> > > > tested.
> > > >
> > >
> > > I unfortunately don't have a device with multiple mmcs available. Is
> > > this something you could help me with?
> >
> > Yes, I can help to test. Allow me a few days to see what I can do.
> >
> > Note that, just having one eMMC and one SD card should work too. It
> > doesn't have to be multiple eMMCs.
> >
> > >
> > > > Although, more importantly, I fail to understand how this is going to
> > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > the mmc block device driver to be up and running too, which is getting
> > > > probed from a separate module/driver that's not part of mmc_rescan().
> > >
> > > I believe the call chain is something like this:
> > >
> > > __mmc_rescan()
> > >     mmc_rescan_try_freq()
> > >         mmc_attach_mmc()
> > >             mmc_add_card()
> > >                 device_add()
> > >                     bus_probe_device()
> > >                         mmc_blk_probe()
> > >
> > > The initial calling of this is the host probe. So effectively if there
> > > is a card attached, we're inlining the device_add() call for the card
> > > attached rather than waiting for the workqueue item to kick off.
> > >
> > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > in becoming a device_initcall(). So this solves a race via the initcall
> > > chain. In the current state, device_initcall() finishes and we move onto
> > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > init the attached emmc device.
> >
> > You certainly have a point!
> >
> > This should work when the mmc blk module is built-in. Even if that
> > doesn't solve the entire problem, it should be a step in the right
> > direction.
> >
> > I will give it some more thinking and run some tests at my side, then
> > I will get back to you again.
> >
> > Kind regards
> > Uffe
> >
> > > >
> > > > >
> > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > > ---
> > > > > Sigh.. fix missing static declaration.
> > > > >
> > > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > --- a/drivers/mmc/core/core.c
> > > > > +++ b/drivers/mmc/core/core.c
> > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > >  }
> > > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > >
> > > > > -void mmc_rescan(struct work_struct *work)
> > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > >  {
> > > > > -       struct mmc_host *host =
> > > > > -               container_of(work, struct mmc_host, detect.work);
> > > > >         int i;
> > > > >
> > > > >         if (host->rescan_disable)
> > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > > >  }
> > > > >
> > > > > +void mmc_rescan(struct work_struct *work)
> > > > > +{
> > > > > +       struct mmc_host *host =
> > > > > +               container_of(work, struct mmc_host, detect.work);
> > > > > +
> > > > > +       __mmc_rescan(host);
> > > > > +}
> > > > > +
> > > > >  void mmc_start_host(struct mmc_host *host)
> > > > >  {
> > > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > >         }
> > > > >
> > > > >         mmc_gpiod_request_cd_irq(host);
> > > > > -       _mmc_detect_change(host, 0, false);
> > > > > +       host->detect_change = 1;
> > > > > +       __mmc_rescan(host);
> > > > >  }
> > > > >
> > > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > > --
> > > > > 2.40.0
> > > > >
> 
> My apologies for the long delay. I finally managed to test this.
> 
> I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> based upon the ux500 SoC. It's quite good to use for these types of
> tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> SD-card slot. So in total there are 4 devices that get probed.
> 
> The SDIO card isn't detected properly, but always fails in the similar
> way (thus I left it out from the below data). I tested both with and
> without an SD card inserted during boot, to get some more data to
> compare. These are the summary from my tests:
> 
> v6.4-rc1 without SD card:
> ~2.18s - MMC1 (eMMC)
> ~3.33s - MMC3 (eMMC)
> ~5.91s - kernel boot complete
> 
> v6.4-rc1 with an SD card:
> ~2.18s - MMC1 (eMMC)
> ~3.45s - MMC3 (eMMC)
> ~3.57s - MMC2 (SD)
> ~5.76s - kernel boot complete
> 
> v6.4-rc1 + patch without SD card:
> ~2.24s - MMC1 (eMMC)
> ~3.58s - MMC3 (eMMC)
> ~5.96s - kernel boot complete
> 
> v6.4-rc1 + patch with an SD card:
> ~2.24s - MMC1 (eMMC)
> ~3.73s - MMC2 (SD)
> ~3.98s - MMC3 (eMMC)
> ~6.73s - kernel boot complete
> 
> By looking at these results, I was kind of surprised. I was thinking
> that the asynchronous probe should address the parallelism problem.
> Then I discovered that it in fact, hasn't been enabled for the mmci
> driver that is being used for this platform. Huh, I was under the
> assumption that it has been enabled for all mmc hosts by now. :-)
> 
> Okay, so I am going to run another round of tests, with async probe
> enabled for the mmci driver too. I will let you know the results as
> soon as I can.
> 
> Kind regards
> Uffe

Hi Uffe,

Kindly this has been way too long for review. It's been over 3 months.
What's going on here?

I think there's a misunderstanding too. Without this fix, the machine
doesn't even boot. I'm not sure why perf is the blocking question here.

Greg, is there another tree I can run this through?

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-08 20:49           ` Dennis Zhou
@ 2023-06-09  6:19             ` Greg KH
  2023-06-09  7:16               ` Dennis Zhou
  0 siblings, 1 reply; 24+ messages in thread
From: Greg KH @ 2023-06-09  6:19 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Ulf Hansson, linux-mmc, linux-kernel, Linus Walleij

On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > + Linus,
> > 
> > Hi Dennis,
> > 
> > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > > >
> > > > Hi Ulf,
> > > >
> > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > > >
> > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > probing code sets up the host data structure then a work item is
> > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > subsequent detect calls be handled via the workqueue.
> > > > >
> > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > tested.
> > > > >
> > > >
> > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > this something you could help me with?
> > >
> > > Yes, I can help to test. Allow me a few days to see what I can do.
> > >
> > > Note that, just having one eMMC and one SD card should work too. It
> > > doesn't have to be multiple eMMCs.
> > >
> > > >
> > > > > Although, more importantly, I fail to understand how this is going to
> > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > the mmc block device driver to be up and running too, which is getting
> > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > >
> > > > I believe the call chain is something like this:
> > > >
> > > > __mmc_rescan()
> > > >     mmc_rescan_try_freq()
> > > >         mmc_attach_mmc()
> > > >             mmc_add_card()
> > > >                 device_add()
> > > >                     bus_probe_device()
> > > >                         mmc_blk_probe()
> > > >
> > > > The initial calling of this is the host probe. So effectively if there
> > > > is a card attached, we're inlining the device_add() call for the card
> > > > attached rather than waiting for the workqueue item to kick off.
> > > >
> > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > init the attached emmc device.
> > >
> > > You certainly have a point!
> > >
> > > This should work when the mmc blk module is built-in. Even if that
> > > doesn't solve the entire problem, it should be a step in the right
> > > direction.
> > >
> > > I will give it some more thinking and run some tests at my side, then
> > > I will get back to you again.
> > >
> > > Kind regards
> > > Uffe
> > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > > > ---
> > > > > > Sigh.. fix missing static declaration.
> > > > > >
> > > > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > --- a/drivers/mmc/core/core.c
> > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > >
> > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > >  {
> > > > > > -       struct mmc_host *host =
> > > > > > -               container_of(work, struct mmc_host, detect.work);
> > > > > >         int i;
> > > > > >
> > > > > >         if (host->rescan_disable)
> > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > >  }
> > > > > >
> > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > +{
> > > > > > +       struct mmc_host *host =
> > > > > > +               container_of(work, struct mmc_host, detect.work);
> > > > > > +
> > > > > > +       __mmc_rescan(host);
> > > > > > +}
> > > > > > +
> > > > > >  void mmc_start_host(struct mmc_host *host)
> > > > > >  {
> > > > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > >         }
> > > > > >
> > > > > >         mmc_gpiod_request_cd_irq(host);
> > > > > > -       _mmc_detect_change(host, 0, false);
> > > > > > +       host->detect_change = 1;
> > > > > > +       __mmc_rescan(host);
> > > > > >  }
> > > > > >
> > > > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > > > --
> > > > > > 2.40.0
> > > > > >
> > 
> > My apologies for the long delay. I finally managed to test this.
> > 
> > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > based upon the ux500 SoC. It's quite good to use for these types of
> > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > SD-card slot. So in total there are 4 devices that get probed.
> > 
> > The SDIO card isn't detected properly, but always fails in the similar
> > way (thus I left it out from the below data). I tested both with and
> > without an SD card inserted during boot, to get some more data to
> > compare. These are the summary from my tests:
> > 
> > v6.4-rc1 without SD card:
> > ~2.18s - MMC1 (eMMC)
> > ~3.33s - MMC3 (eMMC)
> > ~5.91s - kernel boot complete
> > 
> > v6.4-rc1 with an SD card:
> > ~2.18s - MMC1 (eMMC)
> > ~3.45s - MMC3 (eMMC)
> > ~3.57s - MMC2 (SD)
> > ~5.76s - kernel boot complete
> > 
> > v6.4-rc1 + patch without SD card:
> > ~2.24s - MMC1 (eMMC)
> > ~3.58s - MMC3 (eMMC)
> > ~5.96s - kernel boot complete
> > 
> > v6.4-rc1 + patch with an SD card:
> > ~2.24s - MMC1 (eMMC)
> > ~3.73s - MMC2 (SD)
> > ~3.98s - MMC3 (eMMC)
> > ~6.73s - kernel boot complete
> > 
> > By looking at these results, I was kind of surprised. I was thinking
> > that the asynchronous probe should address the parallelism problem.
> > Then I discovered that it in fact, hasn't been enabled for the mmci
> > driver that is being used for this platform. Huh, I was under the
> > assumption that it has been enabled for all mmc hosts by now. :-)
> > 
> > Okay, so I am going to run another round of tests, with async probe
> > enabled for the mmci driver too. I will let you know the results as
> > soon as I can.
> > 
> > Kind regards
> > Uffe
> 
> Hi Uffe,
> 
> Kindly this has been way too long for review. It's been over 3 months.
> What's going on here?
> 
> I think there's a misunderstanding too. Without this fix, the machine
> doesn't even boot. I'm not sure why perf is the blocking question here.

Well you can not degrade performance of existing machines that work
today, right?  That would be a regression and it seems that you are
doing that if I read the numbers above correctly.

> Greg, is there another tree I can run this through?

Why would you want to route around a maintainer just to get a patch that
would have to be reverted applied?  :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-09  6:19             ` Greg KH
@ 2023-06-09  7:16               ` Dennis Zhou
  2023-06-09  8:53                 ` Greg KH
  2023-06-12 14:55                 ` Ulf Hansson
  0 siblings, 2 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-06-09  7:16 UTC (permalink / raw)
  To: Greg KH; +Cc: Dennis Zhou, Ulf Hansson, linux-mmc, linux-kernel, Linus Walleij

Hi Greg,

On Fri, Jun 09, 2023 at 08:19:51AM +0200, Greg KH wrote:
> On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> > On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > > + Linus,
> > > 
> > > Hi Dennis,
> > > 
> > > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > > > >
> > > > > Hi Ulf,
> > > > >
> > > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > > > >
> > > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > > probing code sets up the host data structure then a work item is
> > > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > > subsequent detect calls be handled via the workqueue.
> > > > > >
> > > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > > tested.
> > > > > >
> > > > >
> > > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > > this something you could help me with?
> > > >
> > > > Yes, I can help to test. Allow me a few days to see what I can do.
> > > >
> > > > Note that, just having one eMMC and one SD card should work too. It
> > > > doesn't have to be multiple eMMCs.
> > > >
> > > > >
> > > > > > Although, more importantly, I fail to understand how this is going to
> > > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > > the mmc block device driver to be up and running too, which is getting
> > > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > > >
> > > > > I believe the call chain is something like this:
> > > > >
> > > > > __mmc_rescan()
> > > > >     mmc_rescan_try_freq()
> > > > >         mmc_attach_mmc()
> > > > >             mmc_add_card()
> > > > >                 device_add()
> > > > >                     bus_probe_device()
> > > > >                         mmc_blk_probe()
> > > > >
> > > > > The initial calling of this is the host probe. So effectively if there
> > > > > is a card attached, we're inlining the device_add() call for the card
> > > > > attached rather than waiting for the workqueue item to kick off.
> > > > >
> > > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > > init the attached emmc device.
> > > >
> > > > You certainly have a point!
> > > >
> > > > This should work when the mmc blk module is built-in. Even if that
> > > > doesn't solve the entire problem, it should be a step in the right
> > > > direction.
> > > >
> > > > I will give it some more thinking and run some tests at my side, then
> > > > I will get back to you again.
> > > >
> > > > Kind regards
> > > > Uffe
> > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > > > > ---
> > > > > > > Sigh.. fix missing static declaration.
> > > > > > >
> > > > > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > > --- a/drivers/mmc/core/core.c
> > > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > > >
> > > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > >  {
> > > > > > > -       struct mmc_host *host =
> > > > > > > -               container_of(work, struct mmc_host, detect.work);
> > > > > > >         int i;
> > > > > > >
> > > > > > >         if (host->rescan_disable)
> > > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > >  }
> > > > > > >
> > > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > > +{
> > > > > > > +       struct mmc_host *host =
> > > > > > > +               container_of(work, struct mmc_host, detect.work);
> > > > > > > +
> > > > > > > +       __mmc_rescan(host);
> > > > > > > +}
> > > > > > > +
> > > > > > >  void mmc_start_host(struct mmc_host *host)
> > > > > > >  {
> > > > > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > >         }
> > > > > > >
> > > > > > >         mmc_gpiod_request_cd_irq(host);
> > > > > > > -       _mmc_detect_change(host, 0, false);
> > > > > > > +       host->detect_change = 1;
> > > > > > > +       __mmc_rescan(host);
> > > > > > >  }
> > > > > > >
> > > > > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > > > > --
> > > > > > > 2.40.0
> > > > > > >
> > > 
> > > My apologies for the long delay. I finally managed to test this.
> > > 
> > > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > > based upon the ux500 SoC. It's quite good to use for these types of
> > > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > > SD-card slot. So in total there are 4 devices that get probed.
> > > 
> > > The SDIO card isn't detected properly, but always fails in the similar
> > > way (thus I left it out from the below data). I tested both with and
> > > without an SD card inserted during boot, to get some more data to
> > > compare. These are the summary from my tests:
> > > 
> > > v6.4-rc1 without SD card:
> > > ~2.18s - MMC1 (eMMC)
> > > ~3.33s - MMC3 (eMMC)
> > > ~5.91s - kernel boot complete
> > > 
> > > v6.4-rc1 with an SD card:
> > > ~2.18s - MMC1 (eMMC)
> > > ~3.45s - MMC3 (eMMC)
> > > ~3.57s - MMC2 (SD)
> > > ~5.76s - kernel boot complete
> > > 
> > > v6.4-rc1 + patch without SD card:
> > > ~2.24s - MMC1 (eMMC)
> > > ~3.58s - MMC3 (eMMC)
> > > ~5.96s - kernel boot complete
> > > 
> > > v6.4-rc1 + patch with an SD card:
> > > ~2.24s - MMC1 (eMMC)
> > > ~3.73s - MMC2 (SD)
> > > ~3.98s - MMC3 (eMMC)
> > > ~6.73s - kernel boot complete
> > > 
> > > By looking at these results, I was kind of surprised. I was thinking
> > > that the asynchronous probe should address the parallelism problem.
> > > Then I discovered that it in fact, hasn't been enabled for the mmci
> > > driver that is being used for this platform. Huh, I was under the
> > > assumption that it has been enabled for all mmc hosts by now. :-)
> > > 
> > > Okay, so I am going to run another round of tests, with async probe
> > > enabled for the mmci driver too. I will let you know the results as
> > > soon as I can.
> > > 
> > > Kind regards
> > > Uffe
> > 
> > Hi Uffe,
> > 
> > Kindly this has been way too long for review. It's been over 3 months.
> > What's going on here?
> > 
> > I think there's a misunderstanding too. Without this fix, the machine
> > doesn't even boot. I'm not sure why perf is the blocking question here.
> 
> Well you can not degrade performance of existing machines that work
> today, right?  That would be a regression and it seems that you are
> doing that if I read the numbers above correctly.
> 

I agree that we shouldn't degrade performance of existing machines, but
this is a timing bug on existing platforms that have a slow enough cpu
such that emmc doesn't finish probing before dm-verity progresses to
trying to read off the device. In my opinion it's a bit unfair to trade
performance in the common case for not supporting all use cases. I'm
just trying to get my machines to boot without having to carry my own
patch here.

As a path forward I can add a command line flag as a bool to handle this
and that should hopefully take care of the regresion aspect to this.


> > Greg, is there another tree I can run this through?
> 
> Why would you want to route around a maintainer just to get a patch that
> would have to be reverted applied?  :)
> 

What's your advice here as I don't feel like I'm getting adequate
traction with Ulf. I think I've generally been quite patient here
waiting > 3 months for this patch to be reviewed.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-09  7:16               ` Dennis Zhou
@ 2023-06-09  8:53                 ` Greg KH
  2023-06-09  8:58                   ` Linus Walleij
  2023-06-12 14:55                 ` Ulf Hansson
  1 sibling, 1 reply; 24+ messages in thread
From: Greg KH @ 2023-06-09  8:53 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Ulf Hansson, linux-mmc, linux-kernel, Linus Walleij

On Fri, Jun 09, 2023 at 12:16:19AM -0700, Dennis Zhou wrote:
> Hi Greg,
> 
> On Fri, Jun 09, 2023 at 08:19:51AM +0200, Greg KH wrote:
> > On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> > > On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > > > + Linus,
> > > > 
> > > > Hi Dennis,
> > > > 
> > > > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > > > > >
> > > > > > Hi Ulf,
> > > > > >
> > > > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > > > > >
> > > > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > > > probing code sets up the host data structure then a work item is
> > > > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > > > subsequent detect calls be handled via the workqueue.
> > > > > > >
> > > > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > > > tested.
> > > > > > >
> > > > > >
> > > > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > > > this something you could help me with?
> > > > >
> > > > > Yes, I can help to test. Allow me a few days to see what I can do.
> > > > >
> > > > > Note that, just having one eMMC and one SD card should work too. It
> > > > > doesn't have to be multiple eMMCs.
> > > > >
> > > > > >
> > > > > > > Although, more importantly, I fail to understand how this is going to
> > > > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > > > the mmc block device driver to be up and running too, which is getting
> > > > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > > > >
> > > > > > I believe the call chain is something like this:
> > > > > >
> > > > > > __mmc_rescan()
> > > > > >     mmc_rescan_try_freq()
> > > > > >         mmc_attach_mmc()
> > > > > >             mmc_add_card()
> > > > > >                 device_add()
> > > > > >                     bus_probe_device()
> > > > > >                         mmc_blk_probe()
> > > > > >
> > > > > > The initial calling of this is the host probe. So effectively if there
> > > > > > is a card attached, we're inlining the device_add() call for the card
> > > > > > attached rather than waiting for the workqueue item to kick off.
> > > > > >
> > > > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > > > init the attached emmc device.
> > > > >
> > > > > You certainly have a point!
> > > > >
> > > > > This should work when the mmc blk module is built-in. Even if that
> > > > > doesn't solve the entire problem, it should be a step in the right
> > > > > direction.
> > > > >
> > > > > I will give it some more thinking and run some tests at my side, then
> > > > > I will get back to you again.
> > > > >
> > > > > Kind regards
> > > > > Uffe
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > > > > > ---
> > > > > > > > Sigh.. fix missing static declaration.
> > > > > > > >
> > > > > > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > > > --- a/drivers/mmc/core/core.c
> > > > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > > > >
> > > > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > > >  {
> > > > > > > > -       struct mmc_host *host =
> > > > > > > > -               container_of(work, struct mmc_host, detect.work);
> > > > > > > >         int i;
> > > > > > > >
> > > > > > > >         if (host->rescan_disable)
> > > > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > > > +{
> > > > > > > > +       struct mmc_host *host =
> > > > > > > > +               container_of(work, struct mmc_host, detect.work);
> > > > > > > > +
> > > > > > > > +       __mmc_rescan(host);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  void mmc_start_host(struct mmc_host *host)
> > > > > > > >  {
> > > > > > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         mmc_gpiod_request_cd_irq(host);
> > > > > > > > -       _mmc_detect_change(host, 0, false);
> > > > > > > > +       host->detect_change = 1;
> > > > > > > > +       __mmc_rescan(host);
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > > > > > --
> > > > > > > > 2.40.0
> > > > > > > >
> > > > 
> > > > My apologies for the long delay. I finally managed to test this.
> > > > 
> > > > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > > > based upon the ux500 SoC. It's quite good to use for these types of
> > > > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > > > SD-card slot. So in total there are 4 devices that get probed.
> > > > 
> > > > The SDIO card isn't detected properly, but always fails in the similar
> > > > way (thus I left it out from the below data). I tested both with and
> > > > without an SD card inserted during boot, to get some more data to
> > > > compare. These are the summary from my tests:
> > > > 
> > > > v6.4-rc1 without SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.33s - MMC3 (eMMC)
> > > > ~5.91s - kernel boot complete
> > > > 
> > > > v6.4-rc1 with an SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.45s - MMC3 (eMMC)
> > > > ~3.57s - MMC2 (SD)
> > > > ~5.76s - kernel boot complete
> > > > 
> > > > v6.4-rc1 + patch without SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.58s - MMC3 (eMMC)
> > > > ~5.96s - kernel boot complete
> > > > 
> > > > v6.4-rc1 + patch with an SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.73s - MMC2 (SD)
> > > > ~3.98s - MMC3 (eMMC)
> > > > ~6.73s - kernel boot complete
> > > > 
> > > > By looking at these results, I was kind of surprised. I was thinking
> > > > that the asynchronous probe should address the parallelism problem.
> > > > Then I discovered that it in fact, hasn't been enabled for the mmci
> > > > driver that is being used for this platform. Huh, I was under the
> > > > assumption that it has been enabled for all mmc hosts by now. :-)
> > > > 
> > > > Okay, so I am going to run another round of tests, with async probe
> > > > enabled for the mmci driver too. I will let you know the results as
> > > > soon as I can.
> > > > 
> > > > Kind regards
> > > > Uffe
> > > 
> > > Hi Uffe,
> > > 
> > > Kindly this has been way too long for review. It's been over 3 months.
> > > What's going on here?
> > > 
> > > I think there's a misunderstanding too. Without this fix, the machine
> > > doesn't even boot. I'm not sure why perf is the blocking question here.
> > 
> > Well you can not degrade performance of existing machines that work
> > today, right?  That would be a regression and it seems that you are
> > doing that if I read the numbers above correctly.
> > 
> 
> I agree that we shouldn't degrade performance of existing machines, but
> this is a timing bug on existing platforms that have a slow enough cpu
> such that emmc doesn't finish probing before dm-verity progresses to
> trying to read off the device. In my opinion it's a bit unfair to trade
> performance in the common case for not supporting all use cases. I'm
> just trying to get my machines to boot without having to carry my own
> patch here.

I think the users of the systems you are going to slow down will take
objection to you slowing them down.  What if you were them, what would
you want to see here?

> As a path forward I can add a command line flag as a bool to handle this
> and that should hopefully take care of the regresion aspect to this.

command line flags are horrible and should never be used.  Why can't you
dynamically detect this type of thing and handle it that way?

And yes, we do hold off in supporting new hardware (and configurations)
to prevent existing working ones from breaking or slowing down.

What is forcing you to use dm-verity on this odd hardware?  Can you not
use other configurations instead?

> > > Greg, is there another tree I can run this through?
> > 
> > Why would you want to route around a maintainer just to get a patch that
> > would have to be reverted applied?  :)
> > 
> 
> What's your advice here as I don't feel like I'm getting adequate
> traction with Ulf. I think I've generally been quite patient here
> waiting > 3 months for this patch to be reviewed.

Maintainers are overworked, that's normal.  I suggest helping out in
reviewing other patches in the subsystem to reduce that burden.  After
all, you are asking someone to do something for you without much in
return, is that fair?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-09  8:53                 ` Greg KH
@ 2023-06-09  8:58                   ` Linus Walleij
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Walleij @ 2023-06-09  8:58 UTC (permalink / raw)
  To: Greg KH; +Cc: Dennis Zhou, Ulf Hansson, linux-mmc, linux-kernel

On Fri, Jun 9, 2023 at 10:53 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Jun 09, 2023 at 12:16:19AM -0700, Dennis Zhou wrote:

> > As a path forward I can add a command line flag as a bool to handle this
> > and that should hopefully take care of the regresion aspect to this.
>
> command line flags are horrible and should never be used.  Why can't you
> dynamically detect this type of thing and handle it that way?

If nothing else works, if this is device tree, a machine-specific quirk can be
done:

if (of_machine_is_compatible("vendor,machine")) {...}

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-09  7:16               ` Dennis Zhou
  2023-06-09  8:53                 ` Greg KH
@ 2023-06-12 14:55                 ` Ulf Hansson
  1 sibling, 0 replies; 24+ messages in thread
From: Ulf Hansson @ 2023-06-12 14:55 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Greg KH, linux-mmc, linux-kernel, Linus Walleij

On Fri, 9 Jun 2023 at 09:16, Dennis Zhou <dennis@kernel.org> wrote:
>
> Hi Greg,
>
> On Fri, Jun 09, 2023 at 08:19:51AM +0200, Greg KH wrote:
> > On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> > > On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > > > + Linus,
> > > >
> > > > Hi Dennis,
> > > >
> > > > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <dennis@kernel.org> wrote:
> > > > > >
> > > > > > Hi Ulf,
> > > > > >
> > > > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <dennis@kernel.org> wrote:
> > > > > > > >
> > > > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > > > probing code sets up the host data structure then a work item is
> > > > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > > > subsequent detect calls be handled via the workqueue.
> > > > > > >
> > > > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > > > tested.
> > > > > > >
> > > > > >
> > > > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > > > this something you could help me with?
> > > > >
> > > > > Yes, I can help to test. Allow me a few days to see what I can do.
> > > > >
> > > > > Note that, just having one eMMC and one SD card should work too. It
> > > > > doesn't have to be multiple eMMCs.
> > > > >
> > > > > >
> > > > > > > Although, more importantly, I fail to understand how this is going to
> > > > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > > > the mmc block device driver to be up and running too, which is getting
> > > > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > > > >
> > > > > > I believe the call chain is something like this:
> > > > > >
> > > > > > __mmc_rescan()
> > > > > >     mmc_rescan_try_freq()
> > > > > >         mmc_attach_mmc()
> > > > > >             mmc_add_card()
> > > > > >                 device_add()
> > > > > >                     bus_probe_device()
> > > > > >                         mmc_blk_probe()
> > > > > >
> > > > > > The initial calling of this is the host probe. So effectively if there
> > > > > > is a card attached, we're inlining the device_add() call for the card
> > > > > > attached rather than waiting for the workqueue item to kick off.
> > > > > >
> > > > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > > > init the attached emmc device.
> > > > >
> > > > > You certainly have a point!
> > > > >
> > > > > This should work when the mmc blk module is built-in. Even if that
> > > > > doesn't solve the entire problem, it should be a step in the right
> > > > > direction.
> > > > >
> > > > > I will give it some more thinking and run some tests at my side, then
> > > > > I will get back to you again.
> > > > >
> > > > > Kind regards
> > > > > Uffe
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > > > > > > > ---
> > > > > > > > Sigh.. fix missing static declaration.
> > > > > > > >
> > > > > > > >  drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > > > --- a/drivers/mmc/core/core.c
> > > > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > > > >
> > > > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > > >  {
> > > > > > > > -       struct mmc_host *host =
> > > > > > > > -               container_of(work, struct mmc_host, detect.work);
> > > > > > > >         int i;
> > > > > > > >
> > > > > > > >         if (host->rescan_disable)
> > > > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > > >                 mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > > > +{
> > > > > > > > +       struct mmc_host *host =
> > > > > > > > +               container_of(work, struct mmc_host, detect.work);
> > > > > > > > +
> > > > > > > > +       __mmc_rescan(host);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  void mmc_start_host(struct mmc_host *host)
> > > > > > > >  {
> > > > > > > >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         mmc_gpiod_request_cd_irq(host);
> > > > > > > > -       _mmc_detect_change(host, 0, false);
> > > > > > > > +       host->detect_change = 1;
> > > > > > > > +       __mmc_rescan(host);
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  void __mmc_stop_host(struct mmc_host *host)
> > > > > > > > --
> > > > > > > > 2.40.0
> > > > > > > >
> > > >
> > > > My apologies for the long delay. I finally managed to test this.
> > > >
> > > > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > > > based upon the ux500 SoC. It's quite good to use for these types of
> > > > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > > > SD-card slot. So in total there are 4 devices that get probed.
> > > >
> > > > The SDIO card isn't detected properly, but always fails in the similar
> > > > way (thus I left it out from the below data). I tested both with and
> > > > without an SD card inserted during boot, to get some more data to
> > > > compare. These are the summary from my tests:
> > > >
> > > > v6.4-rc1 without SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.33s - MMC3 (eMMC)
> > > > ~5.91s - kernel boot complete
> > > >
> > > > v6.4-rc1 with an SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.45s - MMC3 (eMMC)
> > > > ~3.57s - MMC2 (SD)
> > > > ~5.76s - kernel boot complete
> > > >
> > > > v6.4-rc1 + patch without SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.58s - MMC3 (eMMC)
> > > > ~5.96s - kernel boot complete
> > > >
> > > > v6.4-rc1 + patch with an SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.73s - MMC2 (SD)
> > > > ~3.98s - MMC3 (eMMC)
> > > > ~6.73s - kernel boot complete
> > > >
> > > > By looking at these results, I was kind of surprised. I was thinking
> > > > that the asynchronous probe should address the parallelism problem.
> > > > Then I discovered that it in fact, hasn't been enabled for the mmci
> > > > driver that is being used for this platform. Huh, I was under the
> > > > assumption that it has been enabled for all mmc hosts by now. :-)
> > > >
> > > > Okay, so I am going to run another round of tests, with async probe
> > > > enabled for the mmci driver too. I will let you know the results as
> > > > soon as I can.
> > > >
> > > > Kind regards
> > > > Uffe
> > >
> > > Hi Uffe,
> > >
> > > Kindly this has been way too long for review. It's been over 3 months.
> > > What's going on here?

I have been busier than usual. My apologies.

The main problem was also that I found a problem with the patch, as
explained with the numbers above.

> > >
> > > I think there's a misunderstanding too. Without this fix, the machine
> > > doesn't even boot. I'm not sure why perf is the blocking question here.
> >
> > Well you can not degrade performance of existing machines that work
> > today, right?  That would be a regression and it seems that you are
> > doing that if I read the numbers above correctly.
> >
>
> I agree that we shouldn't degrade performance of existing machines, but
> this is a timing bug on existing platforms that have a slow enough cpu
> such that emmc doesn't finish probing before dm-verity progresses to
> trying to read off the device. In my opinion it's a bit unfair to trade
> performance in the common case for not supporting all use cases. I'm
> just trying to get my machines to boot without having to carry my own
> patch here.
>
> As a path forward I can add a command line flag as a bool to handle this
> and that should hopefully take care of the regresion aspect to this.

Let's not go there. To be able to move forward with your suggested
approach in $subject patch *and* without causing performance
degradations, we need to enable async probe for the mmci host driver
too.

As I said, I was under the impression that we have already done so for
all mmc host drivers by now, but apparently not. So, I have just sent
a patch for that [1].

>
>
> > > Greg, is there another tree I can run this through?
> >
> > Why would you want to route around a maintainer just to get a patch that
> > would have to be reverted applied?  :)
> >
>
> What's your advice here as I don't feel like I'm getting adequate
> traction with Ulf. I think I've generally been quite patient here
> waiting > 3 months for this patch to be reviewed.

Sure, you have been patient, but in this case you needed my hands-on
help too, to run specific tests. That takes time too.

Anyway, unless something unexpected happens, I plan to apply the
$subject patch tomorrow after running another round of tests.

Kind regards
Uffe

[1]
https://lore.kernel.org/linux-mmc/20230612143730.210390-1-ulf.hansson@linaro.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-29 20:21 [PATCH] mmc: inline the first mmc_scan() on mmc_start_host() Dennis Zhou
                   ` (2 preceding siblings ...)
  2023-03-29 23:48 ` [PATCH v2] " Dennis Zhou
@ 2023-06-13 14:25 ` Ulf Hansson
  2023-06-15 16:06   ` Dennis Zhou
  3 siblings, 1 reply; 24+ messages in thread
From: Ulf Hansson @ 2023-06-13 14:25 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: linux-mmc, linux-kernel

On Wed, 29 Mar 2023 at 22:21, Dennis Zhou <dennis@kernel.org> wrote:
>
> When using dm-verity with a data partition on an emmc device, dm-verity
> races with the discovery of attached emmc devices. This is because mmc's
> probing code sets up the host data structure then a work item is
> scheduled to do discovery afterwards. To prevent this race on init,
> let's inline the first call to detection, __mm_scan(), and let
> subsequent detect calls be handled via the workqueue.
>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

Along with the patch for the mmci driver, this one applied too, for
next, thanks!

Note also that I took the liberty to clarify the commit message a bit.

Moreover, if we want this to be applied for stable kernels, we need to
manage that separately, as then the mmci patch is needed too. Please
ping if you need some pointers in regards to this.

Kind regards
Uffe

> ---
>  drivers/mmc/core/core.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 368f10405e13..c0fdc438c882 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
>  }
>  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
>
> -void mmc_rescan(struct work_struct *work)
> +void __mmc_rescan(struct mmc_host *host)
>  {
> -       struct mmc_host *host =
> -               container_of(work, struct mmc_host, detect.work);
>         int i;
>
>         if (host->rescan_disable)
> @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
>                 mmc_schedule_delayed_work(&host->detect, HZ);
>  }
>
> +void mmc_rescan(struct work_struct *work)
> +{
> +       struct mmc_host *host =
> +               container_of(work, struct mmc_host, detect.work);
> +
> +       __mmc_rescan(host);
> +}
> +
>  void mmc_start_host(struct mmc_host *host)
>  {
>         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
>         }
>
>         mmc_gpiod_request_cd_irq(host);
> -       _mmc_detect_change(host, 0, false);
> +       host->detect_change = 1;
> +       __mmc_rescan(host);
>  }
>
>  void __mmc_stop_host(struct mmc_host *host)
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-13 14:25 ` [PATCH] " Ulf Hansson
@ 2023-06-15 16:06   ` Dennis Zhou
  0 siblings, 0 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-06-15 16:06 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: linux-mmc, linux-kernel

On Tue, Jun 13, 2023 at 04:25:11PM +0200, Ulf Hansson wrote:
> On Wed, 29 Mar 2023 at 22:21, Dennis Zhou <dennis@kernel.org> wrote:
> >
> > When using dm-verity with a data partition on an emmc device, dm-verity
> > races with the discovery of attached emmc devices. This is because mmc's
> > probing code sets up the host data structure then a work item is
> > scheduled to do discovery afterwards. To prevent this race on init,
> > let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> >
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> 
> Along with the patch for the mmci driver, this one applied too, for
> next, thanks!
> 

Thank you Ulf! I'm good with this just being applied to for-next.

Thanks,
Dennis

> Note also that I took the liberty to clarify the commit message a bit.
> 
> Moreover, if we want this to be applied for stable kernels, we need to
> manage that separately, as then the mmci patch is needed too. Please
> ping if you need some pointers in regards to this.
> 
> Kind regards
> Uffe
> 
> > ---
> >  drivers/mmc/core/core.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > index 368f10405e13..c0fdc438c882 100644
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> >  }
> >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +void __mmc_rescan(struct mmc_host *host)
> >  {
> > -       struct mmc_host *host =
> > -               container_of(work, struct mmc_host, detect.work);
> >         int i;
> >
> >         if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> >                 mmc_schedule_delayed_work(&host->detect, HZ);
> >  }
> >
> > +void mmc_rescan(struct work_struct *work)
> > +{
> > +       struct mmc_host *host =
> > +               container_of(work, struct mmc_host, detect.work);
> > +
> > +       __mmc_rescan(host);
> > +}
> > +
> >  void mmc_start_host(struct mmc_host *host)
> >  {
> >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> >         }
> >
> >         mmc_gpiod_request_cd_irq(host);
> > -       _mmc_detect_change(host, 0, false);
> > +       host->detect_change = 1;
> > +       __mmc_rescan(host);
> >  }
> >
> >  void __mmc_stop_host(struct mmc_host *host)
> > --
> > 2.40.0
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-03-29 23:48 ` [PATCH v2] " Dennis Zhou
  2023-03-31 12:43   ` Ulf Hansson
@ 2023-06-27 17:20   ` Geert Uytterhoeven
  2023-06-27 18:09     ` Biju Das
  2023-06-30 11:26     ` Ulf Hansson
  1 sibling, 2 replies; 24+ messages in thread
From: Geert Uytterhoeven @ 2023-06-27 17:20 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Ulf Hansson, Linux MMC List, Linux Kernel Mailing List, Biju Das,
	Wolfram Sang, Linux-Renesas

Hi Dennis,

On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <dennis@kernel.org> wrote:
> When using dm-verity with a data partition on an emmc device, dm-verity
> races with the discovery of attached emmc devices. This is because mmc's
> probing code sets up the host data structure then a work item is
> scheduled to do discovery afterwards. To prevent this race on init,
> let's inline the first call to detection, __mm_scan(), and let
> subsequent detect calls be handled via the workqueue.
>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
core: Allow mmc_start_host() synchronously detect a card") in
linux-next/master mmc/next next-20230614 next-20230615 next-20230616

I have bisected the following failure on Renesas Salvator-XS with R-Car H3
ES2.0 to the above commit:

    renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD0)
    renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD1)
    renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD0)
    renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD1)
    mmc0: Failed to initialize a non-removable card

Reverting the commit fixes the issue for me.

> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
>  }
>  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
>
> -void mmc_rescan(struct work_struct *work)
> +static void __mmc_rescan(struct mmc_host *host)
>  {
> -       struct mmc_host *host =
> -               container_of(work, struct mmc_host, detect.work);
>         int i;
>
>         if (host->rescan_disable)
> @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
>                 mmc_schedule_delayed_work(&host->detect, HZ);
>  }
>
> +void mmc_rescan(struct work_struct *work)
> +{
> +       struct mmc_host *host =
> +               container_of(work, struct mmc_host, detect.work);
> +
> +       __mmc_rescan(host);
> +}
> +
>  void mmc_start_host(struct mmc_host *host)
>  {
>         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
>         }
>
>         mmc_gpiod_request_cd_irq(host);
> -       _mmc_detect_change(host, 0, false);
> +       host->detect_change = 1;
> +       __mmc_rescan(host);
>  }
>
>  void __mmc_stop_host(struct mmc_host *host)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-27 17:20   ` Geert Uytterhoeven
@ 2023-06-27 18:09     ` Biju Das
  2023-06-30 11:26     ` Ulf Hansson
  1 sibling, 0 replies; 24+ messages in thread
From: Biju Das @ 2023-06-27 18:09 UTC (permalink / raw)
  To: Geert Uytterhoeven, Dennis Zhou
  Cc: Ulf Hansson, Linux MMC List, Linux Kernel Mailing List,
	Wolfram Sang, Linux-Renesas

Hi Dennis,

Reverting commit 2cc83bf7d41113d9 ("mmc:core: Allow mmc_start_host() synchronously detect a card") fixes below issue[1] on RZ/G2L SMARC EVK too.

[1] Before reverting:
------------------

[    2.469266] renesas_sdhi_internal_dmac 11c10000.mmc: mmc1 base at 0x0000000011c10000, max clock rate 133 MHz
[    2.993112] mmc1: new ultra high speed SDR104 SDHC card at address aaaa
[    3.014052] mmcblk1: mmc1:aaaa SC32G 29.7 GiB
[    3.033826]  mmcblk1: p1
[    7.651447] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD52)
[   12.768118] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD52)
[   17.888240] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD0)
[   23.008122] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD8)
[   28.128098] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[   33.248115] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[   38.368119] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[   43.488112] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[   48.608110] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)
[   53.728121] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)
[   58.848115] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)
[   63.968118] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)

After reverting:
--------------
[    2.463112] renesas_sdhi_internal_dmac 11c10000.mmc: mmc1 base at 0x0000000011c10000, max clock rate 133 MHz
[    2.472569] rz-ssi-pcm-audio 10049c00.ssi: DMA enabled
[    2.474124] renesas_sdhi_internal_dmac 11c00000.mmc: mmc0 base at 0x0000000011c00000, max clock rate 133 MHz
[    2.600050] mmc0: new HS200 MMC card at address 0001
[    2.608931] mmcblk0: mmc0:0001 G1M15M 59.3 GiB
[    2.623432]  mmcblk0: p1
[    2.629863] mmcblk0boot0: mmc0:0001 G1M15M 31.5 MiB
[    2.640008] mmcblk0boot1: mmc0:0001 G1M15M 31.5 MiB
[    2.650212] mmcblk0rpmb: mmc0:0001 G1M15M 4.00 MiB, chardev (242:0)
[    2.681609] Microchip KSZ9131 Gigabit PHY 11c20000.ethernet-ffffffff:07: attached PHY driver (mii_bus:phy_addr=11c20000.ethernet-ffffffff:07, irq=54)
[    3.020085] mmc1: new ultra high speed SDR104 SDHC card at address aaaa
[    3.033079] mmcblk1: mmc1:aaaa SC32G 29.7 GiB
[    3.048402]  mmcblk1: p1

Cheers,
Biju

> -----Original Message-----
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Sent: Tuesday, June 27, 2023 6:20 PM
> To: Dennis Zhou <dennis@kernel.org>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>; Linux MMC List <linux-
> mmc@vger.kernel.org>; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>; Biju Das <biju.das.jz@bp.renesas.com>; Wolfram
> Sang <wsa+renesas@sang-engineering.com>; Linux-Renesas <linux-renesas-
> soc@vger.kernel.org>
> Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on
> mmc_start_host()
> 
> Hi Dennis,
> 
> On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <dennis@kernel.org> wrote:
> > When using dm-verity with a data partition on an emmc device,
> > dm-verity races with the discovery of attached emmc devices. This is
> > because mmc's probing code sets up the host data structure then a work
> > item is scheduled to do discovery afterwards. To prevent this race on
> > init, let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> >
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> 
> Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> core: Allow mmc_start_host() synchronously detect a card") in linux-
> next/master mmc/next next-20230614 next-20230615 next-20230616
> 
> I have bisected the following failure on Renesas Salvator-XS with R-Car
> H3
> ES2.0 to the above commit:
> 
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
>     mmc0: Failed to initialize a non-removable card
> 
> Reverting the commit fixes the issue for me.
> 
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct
> > mmc_card *card, sector_t *gpt_sector)  }
> > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +static void __mmc_rescan(struct mmc_host *host)
> >  {
> > -       struct mmc_host *host =
> > -               container_of(work, struct mmc_host, detect.work);
> >         int i;
> >
> >         if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> >                 mmc_schedule_delayed_work(&host->detect, HZ);  }
> >
> > +void mmc_rescan(struct work_struct *work) {
> > +       struct mmc_host *host =
> > +               container_of(work, struct mmc_host, detect.work);
> > +
> > +       __mmc_rescan(host);
> > +}
> > +
> >  void mmc_start_host(struct mmc_host *host)  {
> >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> >         }
> >
> >         mmc_gpiod_request_cd_irq(host);
> > -       _mmc_detect_change(host, 0, false);
> > +       host->detect_change = 1;
> > +       __mmc_rescan(host);
> >  }
> >
> >  void __mmc_stop_host(struct mmc_host *host)
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-
> m68k.org
> 
> In personal conversations with technical people, I call myself a hacker.
> But when I'm talking to journalists I just say "programmer" or something
> like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-27 17:20   ` Geert Uytterhoeven
  2023-06-27 18:09     ` Biju Das
@ 2023-06-30 11:26     ` Ulf Hansson
  2023-06-30 13:34       ` Wolfram Sang
  2023-06-30 22:09       ` Dennis Zhou
  1 sibling, 2 replies; 24+ messages in thread
From: Ulf Hansson @ 2023-06-30 11:26 UTC (permalink / raw)
  To: Dennis Zhou, Wolfram Sang, Geert Uytterhoeven
  Cc: Linux MMC List, Linux Kernel Mailing List, Biju Das, Linux-Renesas

On Tue, 27 Jun 2023 at 19:20, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Dennis,
>
> On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <dennis@kernel.org> wrote:
> > When using dm-verity with a data partition on an emmc device, dm-verity
> > races with the discovery of attached emmc devices. This is because mmc's
> > probing code sets up the host data structure then a work item is
> > scheduled to do discovery afterwards. To prevent this race on init,
> > let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> >
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
>
> Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> core: Allow mmc_start_host() synchronously detect a card") in
> linux-next/master mmc/next next-20230614 next-20230615 next-20230616
>
> I have bisected the following failure on Renesas Salvator-XS with R-Car H3
> ES2.0 to the above commit:
>
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
>     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
>     mmc0: Failed to initialize a non-removable card

Thanks for reporting!

After I had a closer look, I realize that all the renesas/tmio drivers
are suffering from the similar problem. A host driver must not call
mmc_add_host() before it's ready to serve requests.

Things like initializing an irq-handler must be done before
mmc_add_host() is called, which is not the case for renesas/tmio. In
fact, there seems to be a few other host drivers that have the similar
pattern in their probe routines.

Note that, even if the offending commit below triggers this problem
100% of the cases (as the probe path has now becomes synchronous),
there was a potential risk even before. Previously, mmc_add_host()
ended up punting a work - and if that work ended up sending a request
to the host driver, *before* the irq-handler would be ready, we would
hit the similar problem. I bet adding an msleep(1000) immediately
after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
problem too. :-)

That said, I am going to revert the offending commit to fix these
problems, for now. Then I will try to help out and fixup the relevant
host drivers  - and when that is done, we can give this whole thing a
new try.

Any objections or other suggestions to this?

Kind regards
Uffe

>
> Reverting the commit fixes the issue for me.
>
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> >  }
> >  EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +static void __mmc_rescan(struct mmc_host *host)
> >  {
> > -       struct mmc_host *host =
> > -               container_of(work, struct mmc_host, detect.work);
> >         int i;
> >
> >         if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> >                 mmc_schedule_delayed_work(&host->detect, HZ);
> >  }
> >
> > +void mmc_rescan(struct work_struct *work)
> > +{
> > +       struct mmc_host *host =
> > +               container_of(work, struct mmc_host, detect.work);
> > +
> > +       __mmc_rescan(host);
> > +}
> > +
> >  void mmc_start_host(struct mmc_host *host)
> >  {
> >         host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> >         }
> >
> >         mmc_gpiod_request_cd_irq(host);
> > -       _mmc_detect_change(host, 0, false);
> > +       host->detect_change = 1;
> > +       __mmc_rescan(host);
> >  }
> >
> >  void __mmc_stop_host(struct mmc_host *host)
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-30 11:26     ` Ulf Hansson
@ 2023-06-30 13:34       ` Wolfram Sang
  2023-06-30 22:09       ` Dennis Zhou
  1 sibling, 0 replies; 24+ messages in thread
From: Wolfram Sang @ 2023-06-30 13:34 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Dennis Zhou, Geert Uytterhoeven, Linux MMC List,
	Linux Kernel Mailing List, Biju Das, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]


> Note that, even if the offending commit below triggers this problem
> 100% of the cases (as the probe path has now becomes synchronous),
> there was a potential risk even before. Previously, mmc_add_host()
> ended up punting a work - and if that work ended up sending a request
> to the host driver, *before* the irq-handler would be ready, we would
> hit the similar problem. I bet adding an msleep(1000) immediately
> after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
> problem too. :-)
> 
> That said, I am going to revert the offending commit to fix these
> problems, for now. Then I will try to help out and fixup the relevant
> host drivers  - and when that is done, we can give this whole thing a
> new try.

I'll work on the TMIO/SDHI driver next week. Thanks for the input!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()
  2023-06-30 11:26     ` Ulf Hansson
  2023-06-30 13:34       ` Wolfram Sang
@ 2023-06-30 22:09       ` Dennis Zhou
  1 sibling, 0 replies; 24+ messages in thread
From: Dennis Zhou @ 2023-06-30 22:09 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Wolfram Sang, Geert Uytterhoeven, Linux MMC List,
	Linux Kernel Mailing List, Biju Das, Linux-Renesas

Hi Ulf,

On Fri, Jun 30, 2023 at 01:26:14PM +0200, Ulf Hansson wrote:
> On Tue, 27 Jun 2023 at 19:20, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Dennis,
> >
> > On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <dennis@kernel.org> wrote:
> > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > races with the discovery of attached emmc devices. This is because mmc's
> > > probing code sets up the host data structure then a work item is
> > > scheduled to do discovery afterwards. To prevent this race on init,
> > > let's inline the first call to detection, __mm_scan(), and let
> > > subsequent detect calls be handled via the workqueue.
> > >
> > > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> >
> > Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> > core: Allow mmc_start_host() synchronously detect a card") in
> > linux-next/master mmc/next next-20230614 next-20230615 next-20230616
> >
> > I have bisected the following failure on Renesas Salvator-XS with R-Car H3
> > ES2.0 to the above commit:
> >
> >     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD0)
> >     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD1)
> >     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD0)
> >     renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD1)
> >     mmc0: Failed to initialize a non-removable card
> 
> Thanks for reporting!
> 
> After I had a closer look, I realize that all the renesas/tmio drivers
> are suffering from the similar problem. A host driver must not call
> mmc_add_host() before it's ready to serve requests.
> 
> Things like initializing an irq-handler must be done before
> mmc_add_host() is called, which is not the case for renesas/tmio. In
> fact, there seems to be a few other host drivers that have the similar
> pattern in their probe routines.
> 
> Note that, even if the offending commit below triggers this problem
> 100% of the cases (as the probe path has now becomes synchronous),
> there was a potential risk even before. Previously, mmc_add_host()
> ended up punting a work - and if that work ended up sending a request
> to the host driver, *before* the irq-handler would be ready, we would
> hit the similar problem. I bet adding an msleep(1000) immediately
> after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
> problem too. :-)
> 

I'm deeply appreciative that you're willing to get to the bottom of the
issue.

> That said, I am going to revert the offending commit to fix these
> problems, for now. Then I will try to help out and fixup the relevant
> host drivers  - and when that is done, we can give this whole thing a
> new try.
> 
> Any objections or other suggestions to this?
> 

Acked-by: Dennis Zhou <dennis@kernel.org>

Thanks,
Dennis

> Kind regards
> Uffe
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-06-30 22:09 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29 20:21 [PATCH] mmc: inline the first mmc_scan() on mmc_start_host() Dennis Zhou
2023-03-29 22:56 ` kernel test robot
2023-03-29 23:36 ` kernel test robot
2023-03-29 23:48 ` [PATCH v2] " Dennis Zhou
2023-03-31 12:43   ` Ulf Hansson
2023-03-31 18:23     ` Dennis Zhou
2023-04-03  9:50       ` Ulf Hansson
2023-04-07  8:24         ` Dennis Zhou
2023-04-11 20:29           ` Dennis Zhou
2023-04-12 11:05           ` Ulf Hansson
2023-05-12 11:42         ` Ulf Hansson
2023-06-08 20:49           ` Dennis Zhou
2023-06-09  6:19             ` Greg KH
2023-06-09  7:16               ` Dennis Zhou
2023-06-09  8:53                 ` Greg KH
2023-06-09  8:58                   ` Linus Walleij
2023-06-12 14:55                 ` Ulf Hansson
2023-06-27 17:20   ` Geert Uytterhoeven
2023-06-27 18:09     ` Biju Das
2023-06-30 11:26     ` Ulf Hansson
2023-06-30 13:34       ` Wolfram Sang
2023-06-30 22:09       ` Dennis Zhou
2023-06-13 14:25 ` [PATCH] " Ulf Hansson
2023-06-15 16:06   ` Dennis Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.