The contradictory nature of spl_nand_fit_read

From: Sean Anderson <sean.anderson@seco.com>
To: U-Boot Mailing List <u-boot@lists.denx.de>,
	Michael Trimarchi <michael@amarulasolutions.com>,
	Dario Binacchi <dariobin@libero.it>,
	Michael Trimarchi <michael@amarulasolutions.com>,
	Tim Harvey <tharvey@gateworks.com>, Tom Rini <trini@konsulko.com>,
	Lokesh Vutla <lokeshvutla@ti.com>
Subject: The contradictory nature of spl_nand_fit_read
Date: Fri, 1 Apr 2022 14:46:20 -0400	[thread overview]
Message-ID: <c2ea7097-9e85-b9ec-e404-bd46eb83dd5b@seco.com> (raw)

Hi all,

I don't understand how spl_nand_fit_read is supposed to work. This
function has been seemingly hacked together by multiple people over the
years, and it (presumably) (somehow) works despite mass confusion of
units. I tried to do some refactoring, but the contradictions make it
very difficult to determine what is a bug and what is intentional.

Lets start with the basics. spl_nand_fit_read is used to implement
spl_load_info.load. I've reproduced the documentation for this function
below:

>  read() - Read from device
> 
>  @load: Information about the load state
>  @sector: Sector number to read from (each @load->bl_len bytes)
>  @count: Number of sectors to read
>  @buf: Buffer to read into
>  @return number of sectors read, 0 on error

In particular, both @sector and @count are in units of load.bl_len. In
addition, the return value should be @count on success.

> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
> 			       ulong size, void *dst)
> {
> 	int err;
> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE

The following logic was introduced in commit 9f6a14c47f ("spl: fit:
nand: fix fit loading in case of bad blocks"). However, at the time it
was not guarded by the above ifdef. nand_spl_adjust_offset is not
defined for all nand drivers. It is defined for at least mxs, am335x,
atmel, and simple. Additionally, some drivers (such as rockchip)
implement bad block skipping in nand_spl_load_image.

It's not at all clear to me that there is common config which determines
whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
defined.

> 	ulong sector;
> 
> 	sector = *(int *)load->priv;

This is the offset passed to nand_spl_load_image. This
offset is in units of bytes, and is aligned to the block size. However,
it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.

> 	offs = sector + nand_spl_adjust_offset(sector, offs - sector);

The documentation for this this function is as follows

> nand_spl_adjust_offset - Adjust offset from a starting sector
> @sector:	Address of the sector
> @offs:	Offset starting from @sector
>
> If one or more bad blocks are in the address space between @sector
> and @sector + @offs, @offs is increased by the NAND block size for
> each bad block found.

In particular, note that offs is in units of bytes. However, we know
from above that at this point in the function, offs is in units of
bl_len. Oops.

> #else
> 	offs *= load->bl_len;
> 	size *= load->bl_len;

This code path adjusts both offs and size to be in units of 

> #endif
> 	err = nand_spl_load_image(offs, size, dst);

So by the time we get here, one code path is in units of bytes, and the
other is in units of bl_len. nand_spl_load_image seems to expect bytes
(though there is no documentation, so I can't be sure).  Which of course
begs the question: how did the code introduced in 9f6a14c47f ever work?

> 	if (err)
> 		return 0;
> 
> 	return size / load->bl_len;

And of course, this should only be done if size is in units of bytes.

> }

I have some patches for what I think are bugs, but the logic of this
function is so confused that I am unable to determine if they actually
are bugs. I would greatly appreciate if some of the authors of the
mentioned commits could comment on their intent, and what they thought
the units of offs and size were.

--Sean