All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
@ 2019-07-11  7:10 Weijie Gao
  2019-07-19  0:00 ` Tom Rini
  2019-08-22 13:58 ` Felix Brack
  0 siblings, 2 replies; 7+ messages in thread
From: Weijie Gao @ 2019-07-11  7:10 UTC (permalink / raw)
  To: u-boot

Some storage devices have multiple hw partitions and both address from
zero, for example eMMC.
However currently block cache invalidation only applies to block
write/erase.
This can cause a problem that data of current hw partition is cached
before switching to another hw partition. And the following read
operation of the latter hw partition will get wrong data when reading
from the addresses that have been cached previously.

To solve this problem, invalidate block cache after a successful
select_hwpart operation.

Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>
---
 drivers/block/blk-uclass.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c
index baaf431e5e0..c23b6682a6c 100644
--- a/drivers/block/blk-uclass.c
+++ b/drivers/block/blk-uclass.c
@@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart)
 	if (ret)
 		return ret;
 
-	return blk_select_hwpart(dev, hwpart);
+	ret = blk_select_hwpart(dev, hwpart);
+	if (!ret)
+		blkcache_invalidate(if_type, devnum);
+
+	return ret;
 }
 
 int blk_list_part(enum if_type if_type)
@@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
 
 int blk_dselect_hwpart(struct blk_desc *desc, int hwpart)
 {
-	return blk_select_hwpart(desc->bdev, hwpart);
+	int ret;
+
+	ret = blk_select_hwpart(desc->bdev, hwpart);
+	if (!ret)
+		blkcache_invalidate(desc->if_type, desc->devnum);
+
+	return ret;
 }
 
 int blk_first_device(int if_type, struct udevice **devp)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
  2019-07-11  7:10 [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart Weijie Gao
@ 2019-07-19  0:00 ` Tom Rini
  2019-08-22 13:58 ` Felix Brack
  1 sibling, 0 replies; 7+ messages in thread
From: Tom Rini @ 2019-07-19  0:00 UTC (permalink / raw)
  To: u-boot

On Thu, Jul 11, 2019 at 03:10:23PM +0800, Weijie Gao wrote:

> Some storage devices have multiple hw partitions and both address from
> zero, for example eMMC.
> However currently block cache invalidation only applies to block
> write/erase.
> This can cause a problem that data of current hw partition is cached
> before switching to another hw partition. And the following read
> operation of the latter hw partition will get wrong data when reading
> from the addresses that have been cached previously.
> 
> To solve this problem, invalidate block cache after a successful
> select_hwpart operation.
> 
> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20190718/f2d83dd7/attachment.sig>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
  2019-07-11  7:10 [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart Weijie Gao
  2019-07-19  0:00 ` Tom Rini
@ 2019-08-22 13:58 ` Felix Brack
  2019-08-26  8:19   ` Weijie Gao
  1 sibling, 1 reply; 7+ messages in thread
From: Felix Brack @ 2019-08-22 13:58 UTC (permalink / raw)
  To: u-boot

On 11.07.19 09:10, Weijie Gao wrote:

> Some storage devices have multiple hw partitions and both address from
> zero, for example eMMC.
> However currently block cache invalidation only applies to block
> write/erase.
> This can cause a problem that data of current hw partition is cached
> before switching to another hw partition. And the following read
> operation of the latter hw partition will get wrong data when reading
> from the addresses that have been cached previously.
> 
> To solve this problem, invalidate block cache after a successful
> select_hwpart operation.
> 
> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>
> ---
This patch breaks correct operation of PDU001 board.

>  drivers/block/blk-uclass.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c
> index baaf431e5e0..c23b6682a6c 100644
> --- a/drivers/block/blk-uclass.c
> +++ b/drivers/block/blk-uclass.c
> @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart)
>  	if (ret)
>  		return ret;
>  
> -	return blk_select_hwpart(dev, hwpart);
> +	ret = blk_select_hwpart(dev, hwpart);
> +	if (!ret)
> +		blkcache_invalidate(if_type, devnum);
> +
> +	return ret;
>  }
>  
>  int blk_list_part(enum if_type if_type)
> @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
>  
>  int blk_dselect_hwpart(struct blk_desc *desc, int hwpart)
>  {
> -	return blk_select_hwpart(desc->bdev, hwpart);
> +	int ret;
> +
> +	ret = blk_select_hwpart(desc->bdev, hwpart);
> +	if (!ret)
> +		blkcache_invalidate(desc->if_type, desc->devnum);
> +
Commenting the invalidation of the block cache on this line results in
proper working of the board.

> +	return ret;
>  }
>  
>  int blk_first_device(int if_type, struct udevice **devp)
> 
With the patch active, files from the boot FAT partition of the SD-card
(mmc device 1) do not load anymore, hence booting fails.
Using the eMMC (mmc device 0) instead works fine, files can bee loaded
and the board boots.

To isolate the problem I have modified the configuration to only load a
10k test file (test.bin) instead of loading the DTB and zImage files
followed by booting LINUX. Furthermore I have enabled debugging for some
parts of the code.

When I boot from the SD-card (which fails) I get the following logged:
===
CPU  : AM335X-GP rev 1.0
Model: EETS,PDU001
DRAM:  256 MiB
mmc_bind: alias ret=-2, devnum=-1
mmc_bind: alias ret=-2, devnum=-1
MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
In:    serial at 481a6000
Out:   serial at 481a6000
Err:   serial at 481a6000
Press SPACE to abort autoboot in 1 seconds
blk_get_devnum_by_typename: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_get_devnum_by_typename: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
clock is disabled (0Hz)
clock is enabled (400000Hz)
clock is enabled (50000000Hz)
miss: start 0, count 1
blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
fill: start 0, count 1
part_init: try 'EFI': ret=-1
hit: start 0, count 1
part_init: try 'DOS': ret=0
blk_get_devnum_by_typename: Device desc 8dfb3a98
miss: start 0, count 1
blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
fill: start 0, count 1
part_init: try 'EFI': ret=-1
hit: start 0, count 1
part_init: try 'DOS': ret=0
hit: start 0, count 1
miss: start 800, count 1
blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
fill: start 800, count 1
hit: start 800, count 1
FAT16, fat_sect: 1, fatlength: 64
Rootdir begins at cluster: 0, sector: 129, offset: 10200
Data begins at: 153
Sector size: 512, cluster size: 4
FAT read(sect=129), clust_size=4, read_size=4, DIRENTSPERBLOCK=16
miss: start 881, count 4
blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
fill: start 881, count 4
miss: start 800, count 1
blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
fill: start 800, count 1
FAT16, fat_sect: 1, fatlength: 64
Rootdir begins at cluster: 0, sector: 129, offset: 10200
Data begins at: 153
Sector size: 512, cluster size: 4
FAT read(sect=129), clust_size=4, read_size=4, DIRENTSPERBLOCK=16
miss: start 881, count 4
blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0
blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1
fill: start 881, count 4
reading test.bin at pos 0
Filesize: 0 bytes
Read position past EOF: 0
0 bytes read in 94 ms (0 Bytes/s)
===

Obviously the the file test.bin does not get loaded.

Booting from the eMMC (which works) logs the following:
===
CPU  : AM335X-GP rev 1.0
Model: EETS,PDU001
DRAM:  256 MiB
mmc_bind: alias ret=-2, devnum=-1
mmc_bind: alias ret=-2, devnum=-1
MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
In:    serial at 481a6000
Out:   serial at 481a6000
Err:   serial at 481a6000
Press SPACE to abort autoboot in 1 seconds
blk_get_devnum_by_typename: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
clock is disabled (0Hz)
clock is enabled (400000Hz)
clock is enabled (25000000Hz)
clock is enabled (52000000Hz)
miss: start 0, count 1
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 0, count 1
part_init: try 'EFI': ret=-1
hit: start 0, count 1
part_init: try 'DOS': ret=0
blk_get_devnum_by_typename: Device desc 8dfb3800
miss: start 0, count 1
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 0, count 1
part_init: try 'EFI': ret=-1
hit: start 0, count 1
part_init: try 'DOS': ret=0
hit: start 0, count 1
miss: start 1, count 1
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 1, count 1
hit: start 1, count 1
FAT16, fat_sect: 1, fatlength: 20
Rootdir begins at cluster: 0, sector: 41, offset: 5200
Data begins at: 65
Sector size: 512, cluster size: 4
FAT read(sect=41), clust_size=4, read_size=4, DIRENTSPERBLOCK=16
miss: start 2a, count 4
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 2a, count 4
miss: start 1, count 1
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 1, count 1
FAT16, fat_sect: 1, fatlength: 20
Rootdir begins at cluster: 0, sector: 41, offset: 5200
Data begins at: 65
Sector size: 512, cluster size: 4
FAT read(sect=41), clust_size=4, read_size=4, DIRENTSPERBLOCK=16
miss: start 2a, count 4
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 2a, count 4
reading test.bin at pos 0
Filesize: 10240 bytes
10240 bytes
FAT16: entry: 0x000000a3 = 163, offset: 0x00a3 = 163
miss: start 2, count 6
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
fill: start 2, count 6
FAT16: ret: 0x000000a4, entry: 0x000000a3, offset: 0x00a3
FAT16: entry: 0x000000a4 = 164, offset: 0x00a4 = 164
FAT16: ret: 0x000000a5, entry: 0x000000a4, offset: 0x00a4
FAT16: entry: 0x000000a5 = 165, offset: 0x00a5 = 165
FAT16: ret: 0x000000a6, entry: 0x000000a5, offset: 0x00a5
FAT16: entry: 0x000000a6 = 166, offset: 0x00a6 = 166
FAT16: ret: 0x000000a7, entry: 0x000000a6, offset: 0x00a6
gc - clustnum: 163, startsect: 717
miss: start 2ce, count 20
blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0
10240 bytes read in 135 ms (73.2 KiB/s)
===

This time the file test.bin gets loaded correctly.

To be honest I don't really see why things are going wrong. It looks
like the block cache gets filled but then again invalidated before it's
data is used. I also feel that this problem is related to the SD-card
not being the first (number 0) but the second device (number 1).
Can anybody shed some light on this and give me a hint?

many thanks, Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
  2019-08-22 13:58 ` Felix Brack
@ 2019-08-26  8:19   ` Weijie Gao
  2019-08-26 12:43     ` Felix Brack
  0 siblings, 1 reply; 7+ messages in thread
From: Weijie Gao @ 2019-08-26  8:19 UTC (permalink / raw)
  To: u-boot

On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
> On 11.07.19 09:10, Weijie Gao wrote:
> 
> > Some storage devices have multiple hw partitions and both address from
> > zero, for example eMMC.
> > However currently block cache invalidation only applies to block
> > write/erase.
> > This can cause a problem that data of current hw partition is cached
> > before switching to another hw partition. And the following read
> > operation of the latter hw partition will get wrong data when reading
> > from the addresses that have been cached previously.
> > 
> > To solve this problem, invalidate block cache after a successful
> > select_hwpart operation.
> > 
> > Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>
> > ---
> This patch breaks correct operation of PDU001 board.
> 
> >  drivers/block/blk-uclass.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c
> > index baaf431e5e0..c23b6682a6c 100644
> > --- a/drivers/block/blk-uclass.c
> > +++ b/drivers/block/blk-uclass.c
> > @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart)
> >  	if (ret)
> >  		return ret;
> >  
> > -	return blk_select_hwpart(dev, hwpart);
> > +	ret = blk_select_hwpart(dev, hwpart);
> > +	if (!ret)
> > +		blkcache_invalidate(if_type, devnum);
> > +
> > +	return ret;
> >  }
> >  
> >  int blk_list_part(enum if_type if_type)
> > @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
> >  
> >  int blk_dselect_hwpart(struct blk_desc *desc, int hwpart)
> >  {
> > -	return blk_select_hwpart(desc->bdev, hwpart);
> > +	int ret;
> > +
> > +	ret = blk_select_hwpart(desc->bdev, hwpart);
> > +	if (!ret)
> > +		blkcache_invalidate(desc->if_type, desc->devnum);
> > +
> Commenting the invalidation of the block cache on this line results in
> proper working of the board.
> 
> > +	return ret;
> >  }
> >  
> >  int blk_first_device(int if_type, struct udevice **devp)
> > 
> With the patch active, files from the boot FAT partition of the SD-card
> (mmc device 1) do not load anymore, hence booting fails.
> Using the eMMC (mmc device 0) instead works fine, files can bee loaded
> and the board boots.
> 
> To isolate the problem I have modified the configuration to only load a
> 10k test file (test.bin) instead of loading the DTB and zImage files
> followed by booting LINUX. Furthermore I have enabled debugging for some
> parts of the code.
> 
> When I boot from the SD-card (which fails) I get the following logged:
> ===
> CPU  : AM335X-GP rev 1.0
> ===
> 
> This time the file test.bin gets loaded correctly.
> 
> To be honest I don't really see why things are going wrong. It looks
> like the block cache gets filled but then again invalidated before it's
> data is used. I also feel that this problem is related to the SD-card
> not being the first (number 0) but the second device (number 1).
> Can anybody shed some light on this and give me a hint?
> 
> many thanks, Felix

Hi Felix,

I've found the root cause.

There is a bug in the FAT driver. In function file_fat_read_at(),
malloc_cache_aligned() allocates memory for the itr. The itr is a
pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE]
used for storing blocks read from MMC.

The FAT driver resolves the file table and stores the required dir entry
in the member block. Then the pointer of the dir entry is passed to
get_contents() to get the contents of the file.

However the itr is freed BEFORE the call to get_contents(). This means
the contents dir entry points to is no longer valid. And it results to
your situation.

But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied.
I've also found its cause.

The MMC driver calls blk_dselect_hwpart() unconditionally in
mmc_bread(). This causes block cache being invalidated and refilled many
times during the FAT load operation.

The frequent block cache invalidation and refilling finally results in
frequent memory allocation and freeing. There is a mechanism in the
dlmalloc.c: if the total free space reaches a threshold, the
malloc_trim() will be called. malloc_trim() calls sbrk(), which
fills the memory being freed with zeros and of course the data of itr is
damaged, otherwise the original data remains unchanged.

So this is the reason the bug only appears when block cache is enabled
and cache invalidation is called after every select_hwpart.

I have made a patch, could you please have a try? I've tested it worked.
If it works for you too, I will submit it to u-boot. And I will submit
another patch to optimize the cache invalidation for select_hwpart.

--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t
pos, void *buffer,
 	/* For saving default max clustersize memory allocated to malloc pool
*/
 	dir_entry *dentptr = itr->dent;
 
-	free(itr);
-
-	itr = NULL;
-
 	ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
 
 out_free_both:


BTW, I reproduced this failure on a MT7623 board, which also has an eMMC
and a SD. And this patch is indeed used for MT7623 to solve the bug on
switching hwpart on eMMC.


Best Regards,

Weijie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
  2019-08-26  8:19   ` Weijie Gao
@ 2019-08-26 12:43     ` Felix Brack
  2019-08-27  3:07       ` Weijie Gao
  0 siblings, 1 reply; 7+ messages in thread
From: Felix Brack @ 2019-08-26 12:43 UTC (permalink / raw)
  To: u-boot

Hello Weijie,

On 26.08.19 10:19, Weijie Gao wrote:
> On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
>> On 11.07.19 09:10, Weijie Gao wrote:
>>
>>> Some storage devices have multiple hw partitions and both address from
>>> zero, for example eMMC.
>>> However currently block cache invalidation only applies to block
>>> write/erase.
>>> This can cause a problem that data of current hw partition is cached
>>> before switching to another hw partition. And the following read
>>> operation of the latter hw partition will get wrong data when reading
>>> from the addresses that have been cached previously.
>>>
>>> To solve this problem, invalidate block cache after a successful
>>> select_hwpart operation.
>>>
>>> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>
>>> ---
>> This patch breaks correct operation of PDU001 board.
>>
>>>  drivers/block/blk-uclass.c | 14 ++++++++++++--
>>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c
>>> index baaf431e5e0..c23b6682a6c 100644
>>> --- a/drivers/block/blk-uclass.c
>>> +++ b/drivers/block/blk-uclass.c
>>> @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart)
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> -	return blk_select_hwpart(dev, hwpart);
>>> +	ret = blk_select_hwpart(dev, hwpart);
>>> +	if (!ret)
>>> +		blkcache_invalidate(if_type, devnum);
>>> +
>>> +	return ret;
>>>  }
>>>  
>>>  int blk_list_part(enum if_type if_type)
>>> @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
>>>  
>>>  int blk_dselect_hwpart(struct blk_desc *desc, int hwpart)
>>>  {
>>> -	return blk_select_hwpart(desc->bdev, hwpart);
>>> +	int ret;
>>> +
>>> +	ret = blk_select_hwpart(desc->bdev, hwpart);
>>> +	if (!ret)
>>> +		blkcache_invalidate(desc->if_type, desc->devnum);
>>> +
>> Commenting the invalidation of the block cache on this line results in
>> proper working of the board.
>>
>>> +	return ret;
>>>  }
>>>  
>>>  int blk_first_device(int if_type, struct udevice **devp)
>>>
>> With the patch active, files from the boot FAT partition of the SD-card
>> (mmc device 1) do not load anymore, hence booting fails.
>> Using the eMMC (mmc device 0) instead works fine, files can bee loaded
>> and the board boots.
>>
>> To isolate the problem I have modified the configuration to only load a
>> 10k test file (test.bin) instead of loading the DTB and zImage files
>> followed by booting LINUX. Furthermore I have enabled debugging for some
>> parts of the code.
>>
>> When I boot from the SD-card (which fails) I get the following logged:
>> ===
>> CPU  : AM335X-GP rev 1.0
>> ===
>>
>> This time the file test.bin gets loaded correctly.
>>
>> To be honest I don't really see why things are going wrong. It looks
>> like the block cache gets filled but then again invalidated before it's
>> data is used. I also feel that this problem is related to the SD-card
>> not being the first (number 0) but the second device (number 1).
>> Can anybody shed some light on this and give me a hint?
>>
>> many thanks, Felix
> 
> Hi Felix,
> 
> I've found the root cause.
>
Many thanks for looking into this!

> There is a bug in the FAT driver. In function file_fat_read_at(),
> malloc_cache_aligned() allocates memory for the itr. The itr is a
> pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE]
> used for storing blocks read from MMC.
> 
> The FAT driver resolves the file table and stores the required dir entry
> in the member block. Then the pointer of the dir entry is passed to
> get_contents() to get the contents of the file.
> 
> However the itr is freed BEFORE the call to get_contents(). This means
> the contents dir entry points to is no longer valid. And it results to
> your situation.
> 
Indeed this is the root of all trouble. The freeing of itr of course
also invalidates the assignment made earlier:

dir_entry *dentptr = itr->dent;

As dentptr is used in the call to get_contents() the evil takes its
course...

Again many thanks for analyzing and explaining this as detailed as you did.

> But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied.
> I've also found its cause.
> 
> The MMC driver calls blk_dselect_hwpart() unconditionally in
> mmc_bread(). This causes block cache being invalidated and refilled many
> times during the FAT load operation.
>
Yes I have noticed this frequent invalidations of the cache due to calls
to blk_dselect_hwpart(). Some optimization, if possible, would be great.

> The frequent block cache invalidation and refilling finally results in
> frequent memory allocation and freeing. There is a mechanism in the
> dlmalloc.c: if the total free space reaches a threshold, the
> malloc_trim() will be called. malloc_trim() calls sbrk(), which
> fills the memory being freed with zeros and of course the data of itr is
> damaged, otherwise the original data remains unchanged.
> 
> So this is the reason the bug only appears when block cache is enabled
> and cache invalidation is called after every select_hwpart.
>
I see and now understand the different behavior of eMMC and SD-card.

> I have made a patch, could you please have a try? I've tested it worked.
> If it works for you too, I will submit it to u-boot. And I will submit
> another patch to optimize the cache invalidation for select_hwpart.
>
> --- a/fs/fat/fat.c
> +++ b/fs/fat/fat.c
> @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t
> pos, void *buffer,
>  	/* For saving default max clustersize memory allocated to malloc pool
> */
>  	dir_entry *dentptr = itr->dent;
>  
> -	free(itr);
> -
> -	itr = NULL;
> -
>  	ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
>  
>  out_free_both:
>
I can confirm that your patch works perfect, thanks!

Please CC me when submitting the two patches you mentioned above. It
will be my pleasure to test them on my hardware.

> 
> BTW, I reproduced this failure on a MT7623 board, which also has an eMMC
> and a SD. And this patch is indeed used for MT7623 to solve the bug on
> switching hwpart on eMMC.
> 
> 
> Best Regards,
> 
> Weijie
> 

regard, Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
  2019-08-26 12:43     ` Felix Brack
@ 2019-08-27  3:07       ` Weijie Gao
  2019-08-27  7:46         ` Felix Brack
  0 siblings, 1 reply; 7+ messages in thread
From: Weijie Gao @ 2019-08-27  3:07 UTC (permalink / raw)
  To: u-boot

On Mon, 2019-08-26 at 14:43 +0200, Felix Brack wrote:
> Hello Weijie,
> 
> On 26.08.19 10:19, Weijie Gao wrote:
> > On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote:
> >> On 11.07.19 09:10, Weijie Gao wrote:
> >>
> >>> Some storage devices have multiple hw partitions and both address from
> >>> zero, for example eMMC.
> >>> However currently block cache invalidation only applies to block
> >>> write/erase.
> >>> This can cause a problem that data of current hw partition is cached
> >>> before switching to another hw partition. And the following read
> >>> operation of the latter hw partition will get wrong data when reading
> >>> from the addresses that have been cached previously.
> >>>
> >>> To solve this problem, invalidate block cache after a successful
> >>> select_hwpart operation.
> >>>
> >>> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>
> >>> ---
> >> This patch breaks correct operation of PDU001 board.
> >>
> >>>  drivers/block/blk-uclass.c | 14 ++++++++++++--
> >>>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c
> >>> index baaf431e5e0..c23b6682a6c 100644
> >>> --- a/drivers/block/blk-uclass.c
> >>> +++ b/drivers/block/blk-uclass.c
> >>> @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart)
> >>>  	if (ret)
> >>>  		return ret;
> >>>  
> >>> -	return blk_select_hwpart(dev, hwpart);
> >>> +	ret = blk_select_hwpart(dev, hwpart);
> >>> +	if (!ret)
> >>> +		blkcache_invalidate(if_type, devnum);
> >>> +
> >>> +	return ret;
> >>>  }
> >>>  
> >>>  int blk_list_part(enum if_type if_type)
> >>> @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart)
> >>>  
> >>>  int blk_dselect_hwpart(struct blk_desc *desc, int hwpart)
> >>>  {
> >>> -	return blk_select_hwpart(desc->bdev, hwpart);
> >>> +	int ret;
> >>> +
> >>> +	ret = blk_select_hwpart(desc->bdev, hwpart);
> >>> +	if (!ret)
> >>> +		blkcache_invalidate(desc->if_type, desc->devnum);
> >>> +
> >> Commenting the invalidation of the block cache on this line results in
> >> proper working of the board.
> >>
> >>> +	return ret;
> >>>  }
> >>>  
> >>>  int blk_first_device(int if_type, struct udevice **devp)
> >>>
> >> With the patch active, files from the boot FAT partition of the SD-card
> >> (mmc device 1) do not load anymore, hence booting fails.
> >> Using the eMMC (mmc device 0) instead works fine, files can bee loaded
> >> and the board boots.
> >>
> >> To isolate the problem I have modified the configuration to only load a
> >> 10k test file (test.bin) instead of loading the DTB and zImage files
> >> followed by booting LINUX. Furthermore I have enabled debugging for some
> >> parts of the code.
> >>
> >> When I boot from the SD-card (which fails) I get the following logged:
> >> ===
> >> CPU  : AM335X-GP rev 1.0
> >> ===
> >>
> >> This time the file test.bin gets loaded correctly.
> >>
> >> To be honest I don't really see why things are going wrong. It looks
> >> like the block cache gets filled but then again invalidated before it's
> >> data is used. I also feel that this problem is related to the SD-card
> >> not being the first (number 0) but the second device (number 1).
> >> Can anybody shed some light on this and give me a hint?
> >>
> >> many thanks, Felix
> > 
> > Hi Felix,
> > 
> > I've found the root cause.
> >
> Many thanks for looking into this!
> 
> > There is a bug in the FAT driver. In function file_fat_read_at(),
> > malloc_cache_aligned() allocates memory for the itr. The itr is a
> > pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE]
> > used for storing blocks read from MMC.
> > 
> > The FAT driver resolves the file table and stores the required dir entry
> > in the member block. Then the pointer of the dir entry is passed to
> > get_contents() to get the contents of the file.
> > 
> > However the itr is freed BEFORE the call to get_contents(). This means
> > the contents dir entry points to is no longer valid. And it results to
> > your situation.
> > 
> Indeed this is the root of all trouble. The freeing of itr of course
> also invalidates the assignment made earlier:
> 
> dir_entry *dentptr = itr->dent;
> 
> As dentptr is used in the call to get_contents() the evil takes its
> course...
> 
> Again many thanks for analyzing and explaining this as detailed as you did.
> 
> > But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied.
> > I've also found its cause.
> > 
> > The MMC driver calls blk_dselect_hwpart() unconditionally in
> > mmc_bread(). This causes block cache being invalidated and refilled many
> > times during the FAT load operation.
> >
> Yes I have noticed this frequent invalidations of the cache due to calls
> to blk_dselect_hwpart(). Some optimization, if possible, would be great.
> 
> > The frequent block cache invalidation and refilling finally results in
> > frequent memory allocation and freeing. There is a mechanism in the
> > dlmalloc.c: if the total free space reaches a threshold, the
> > malloc_trim() will be called. malloc_trim() calls sbrk(), which
> > fills the memory being freed with zeros and of course the data of itr is
> > damaged, otherwise the original data remains unchanged.
> > 
> > So this is the reason the bug only appears when block cache is enabled
> > and cache invalidation is called after every select_hwpart.
> >
> I see and now understand the different behavior of eMMC and SD-card.
> 
> > I have made a patch, could you please have a try? I've tested it worked.
> > If it works for you too, I will submit it to u-boot. And I will submit
> > another patch to optimize the cache invalidation for select_hwpart.
> >
> > --- a/fs/fat/fat.c
> > +++ b/fs/fat/fat.c
> > @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t
> > pos, void *buffer,
> >  	/* For saving default max clustersize memory allocated to malloc pool
> > */
> >  	dir_entry *dentptr = itr->dent;
> >  
> > -	free(itr);
> > -
> > -	itr = NULL;
> > -
> >  	ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread);
> >  
> >  out_free_both:
> >
> I can confirm that your patch works perfect, thanks!
> 
> Please CC me when submitting the two patches you mentioned above. It
> will be my pleasure to test them on my hardware.
> 
> > 
> > BTW, I reproduced this failure on a MT7623 board, which also has an eMMC
> > and a SD. And this patch is indeed used for MT7623 to solve the bug on
> > switching hwpart on eMMC.
> > 
> > 
> > Best Regards,
> > 
> > Weijie
> > 
> 
> regard, Felix

Hi Felix,

I found there's already a commit merged 6 hours ago (mailed on Aug 20)
that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876

So there's no need to send patch again. What you need is to pull the
latest master branch.

I will still send a patch to optimize the invalidation in select_hwpart.

Best Regards,

Weijie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart
  2019-08-27  3:07       ` Weijie Gao
@ 2019-08-27  7:46         ` Felix Brack
  0 siblings, 0 replies; 7+ messages in thread
From: Felix Brack @ 2019-08-27  7:46 UTC (permalink / raw)
  To: u-boot

Hello Weijie,

On 27.08.19 05:07, Weijie Gao wrote:
> Hi Felix,
> 
> I found there's already a commit merged 6 hours ago (mailed on Aug 20)
> that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876
> 
Thanks for the information. The patch referenced can be found here:
https://patchwork.ozlabs.org/patch/1151091/

> So there's no need to send patch again. What you need is to pull the
> latest master branch.
> 
> I will still send a patch to optimize the invalidation in select_hwpart.
> 
> Best Regards,
> 
> Weijie
> 
> 

regards, Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-27  7:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-11  7:10 [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart Weijie Gao
2019-07-19  0:00 ` Tom Rini
2019-08-22 13:58 ` Felix Brack
2019-08-26  8:19   ` Weijie Gao
2019-08-26 12:43     ` Felix Brack
2019-08-27  3:07       ` Weijie Gao
2019-08-27  7:46         ` Felix Brack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.