u-boot.lists.denx.de archive mirror
 help / color / mirror / Atom feed
* The contradictory nature of spl_nand_fit_read
@ 2022-04-01 18:46 Sean Anderson
  2022-04-01 18:53 ` Sean Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Anderson @ 2022-04-01 18:46 UTC (permalink / raw)
  To: U-Boot Mailing List, Michael Trimarchi, Dario Binacchi,
	Michael Trimarchi, Tim Harvey, Tom Rini, Lokesh Vutla

Hi all,

I don't understand how spl_nand_fit_read is supposed to work. This
function has been seemingly hacked together by multiple people over the
years, and it (presumably) (somehow) works despite mass confusion of
units. I tried to do some refactoring, but the contradictions make it
very difficult to determine what is a bug and what is intentional.

Lets start with the basics. spl_nand_fit_read is used to implement
spl_load_info.load. I've reproduced the documentation for this function
below:

>  read() - Read from device
> 
>  @load: Information about the load state
>  @sector: Sector number to read from (each @load->bl_len bytes)
>  @count: Number of sectors to read
>  @buf: Buffer to read into
>  @return number of sectors read, 0 on error

In particular, both @sector and @count are in units of load.bl_len. In
addition, the return value should be @count on success.

> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
> 			       ulong size, void *dst)
> {
> 	int err;
> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE

The following logic was introduced in commit 9f6a14c47f ("spl: fit:
nand: fix fit loading in case of bad blocks"). However, at the time it
was not guarded by the above ifdef. nand_spl_adjust_offset is not
defined for all nand drivers. It is defined for at least mxs, am335x,
atmel, and simple. Additionally, some drivers (such as rockchip)
implement bad block skipping in nand_spl_load_image.

It's not at all clear to me that there is common config which determines
whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
defined.

> 	ulong sector;
> 
> 	sector = *(int *)load->priv;

This is the offset passed to nand_spl_load_image. This
offset is in units of bytes, and is aligned to the block size. However,
it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.

> 	offs = sector + nand_spl_adjust_offset(sector, offs - sector);

The documentation for this this function is as follows

> nand_spl_adjust_offset - Adjust offset from a starting sector
> @sector:	Address of the sector
> @offs:	Offset starting from @sector
>
> If one or more bad blocks are in the address space between @sector
> and @sector + @offs, @offs is increased by the NAND block size for
> each bad block found.

In particular, note that offs is in units of bytes. However, we know
from above that at this point in the function, offs is in units of
bl_len. Oops.

> #else
> 	offs *= load->bl_len;
> 	size *= load->bl_len;

This code path adjusts both offs and size to be in units of 

> #endif
> 	err = nand_spl_load_image(offs, size, dst);

So by the time we get here, one code path is in units of bytes, and the
other is in units of bl_len. nand_spl_load_image seems to expect bytes
(though there is no documentation, so I can't be sure).  Which of course
begs the question: how did the code introduced in 9f6a14c47f ever work?

> 	if (err)
> 		return 0;
> 
> 	return size / load->bl_len;

And of course, this should only be done if size is in units of bytes.

> }

I have some patches for what I think are bugs, but the logic of this
function is so confused that I am unable to determine if they actually
are bugs. I would greatly appreciate if some of the authors of the
mentioned commits could comment on their intent, and what they thought
the units of offs and size were.

--Sean

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The contradictory nature of spl_nand_fit_read
  2022-04-01 18:46 The contradictory nature of spl_nand_fit_read Sean Anderson
@ 2022-04-01 18:53 ` Sean Anderson
  2022-04-01 21:43   ` Michael Nazzareno Trimarchi
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Anderson @ 2022-04-01 18:53 UTC (permalink / raw)
  To: U-Boot Mailing List, Michael Trimarchi, Dario Binacchi,
	Michael Trimarchi, Tim Harvey, Tom Rini, Lokesh Vutla



On 4/1/22 2:46 PM, Sean Anderson wrote:
> Hi all,
> 
> I don't understand how spl_nand_fit_read is supposed to work. This
> function has been seemingly hacked together by multiple people over the
> years, and it (presumably) (somehow) works despite mass confusion of
> units. I tried to do some refactoring, but the contradictions make it
> very difficult to determine what is a bug and what is intentional.
> 
> Lets start with the basics. spl_nand_fit_read is used to implement
> spl_load_info.load. I've reproduced the documentation for this function
> below:
> 
>>  read() - Read from device
>> 
>>  @load: Information about the load state
>>  @sector: Sector number to read from (each @load->bl_len bytes)
>>  @count: Number of sectors to read
>>  @buf: Buffer to read into
>>  @return number of sectors read, 0 on error
> 
> In particular, both @sector and @count are in units of load.bl_len. In
> addition, the return value should be @count on success.
> 
>> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
>> 			       ulong size, void *dst)
>> {
>> 	int err;
>> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE
> 
> The following logic was introduced in commit 9f6a14c47f ("spl: fit:
> nand: fix fit loading in case of bad blocks"). However, at the time it
> was not guarded by the above ifdef. nand_spl_adjust_offset is not
> defined for all nand drivers. It is defined for at least mxs, am335x,
> atmel, and simple. Additionally, some drivers (such as rockchip)
> implement bad block skipping in nand_spl_load_image.
> 
> It's not at all clear to me that there is common config which determines
> whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
> aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
> above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
> defined.
> 
>> 	ulong sector;
>> 
>> 	sector = *(int *)load->priv;
> 
> This is the offset passed to nand_spl_load_image. This
> offset is in units of bytes, and is aligned to the block size. However,
> it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
> 
>> 	offs = sector + nand_spl_adjust_offset(sector, offs - sector);

I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:

> return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);

Which means that we add in sector twice (but with different units!)

> The documentation for this this function is as follows
> 
>> nand_spl_adjust_offset - Adjust offset from a starting sector
>> @sector:	Address of the sector
>> @offs:	Offset starting from @sector
>>
>> If one or more bad blocks are in the address space between @sector
>> and @sector + @offs, @offs is increased by the NAND block size for
>> each bad block found.
> 
> In particular, note that offs is in units of bytes. However, we know
> from above that at this point in the function, offs is in units of
> bl_len. Oops.
> 
>> #else
>> 	offs *= load->bl_len;
>> 	size *= load->bl_len;
> 
> This code path adjusts both offs and size to be in units of 

*units of bytes

>> #endif
>> 	err = nand_spl_load_image(offs, size, dst);
> 
> So by the time we get here, one code path is in units of bytes, and the
> other is in units of bl_len. nand_spl_load_image seems to expect bytes
> (though there is no documentation, so I can't be sure).  Which of course
> begs the question: how did the code introduced in 9f6a14c47f ever work?
> 
>> 	if (err)
>> 		return 0;
>> 
>> 	return size / load->bl_len;
> 
> And of course, this should only be done if size is in units of bytes.
> 
>> }
> 
> I have some patches for what I think are bugs, but the logic of this
> function is so confused that I am unable to determine if they actually
> are bugs. I would greatly appreciate if some of the authors of the
> mentioned commits could comment on their intent, and what they thought
> the units of offs and size were.
> 
> --Sean
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The contradictory nature of spl_nand_fit_read
  2022-04-01 18:53 ` Sean Anderson
@ 2022-04-01 21:43   ` Michael Nazzareno Trimarchi
  2022-04-03 15:56     ` Dario Binacchi
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Nazzareno Trimarchi @ 2022-04-01 21:43 UTC (permalink / raw)
  To: Sean Anderson
  Cc: U-Boot Mailing List, Dario Binacchi, Tim Harvey, Tom Rini, Lokesh Vutla

Hi Sean

On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson <sean.anderson@seco.com> wrote:
>
>
>
> On 4/1/22 2:46 PM, Sean Anderson wrote:
> > Hi all,
> >
> > I don't understand how spl_nand_fit_read is supposed to work. This
> > function has been seemingly hacked together by multiple people over the
> > years, and it (presumably) (somehow) works despite mass confusion of
> > units. I tried to do some refactoring, but the contradictions make it
> > very difficult to determine what is a bug and what is intentional.
> >
> > Lets start with the basics. spl_nand_fit_read is used to implement
> > spl_load_info.load. I've reproduced the documentation for this function
> > below:
> >
> >>  read() - Read from device
> >>
> >>  @load: Information about the load state
> >>  @sector: Sector number to read from (each @load->bl_len bytes)
> >>  @count: Number of sectors to read
> >>  @buf: Buffer to read into
> >>  @return number of sectors read, 0 on error
> >
> > In particular, both @sector and @count are in units of load.bl_len. In
> > addition, the return value should be @count on success.
> >
> >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
> >>                             ulong size, void *dst)
> >> {
> >>      int err;
> >> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE
> >
> > The following logic was introduced in commit 9f6a14c47f ("spl: fit:
> > nand: fix fit loading in case of bad blocks"). However, at the time it
> > was not guarded by the above ifdef. nand_spl_adjust_offset is not
> > defined for all nand drivers. It is defined for at least mxs, am335x,
> > atmel, and simple. Additionally, some drivers (such as rockchip)
> > implement bad block skipping in nand_spl_load_image.
> >
> > It's not at all clear to me that there is common config which determines
> > whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
> > aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
> > above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
> > defined.
> >
> >>      ulong sector;
> >>
> >>      sector = *(int *)load->priv;
> >
> > This is the offset passed to nand_spl_load_image. This
> > offset is in units of bytes, and is aligned to the block size. However,
> > it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
> >
> >>      offs = sector + nand_spl_adjust_offset(sector, offs - sector);
>
> I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:
>
> > return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);

The spl_nand_fit_read was called from info->read that was in units of sectors
count = info->read(info, sector, sectors, fit);
debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n",
              sector, sectors, fit, count, size);


load.dev = NULL;
load.priv = &offset;
load.filename = NULL;
load.bl_len = 1;
load.read = spl_nand_fit_read;

static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
                               ulong size, void *dst)
{
        ulong sector;
        int ret;

        sector = *(int *)load->priv;
        offs = sector + nand_spl_adjust_offset(sector, offs - sector);
        ret = nand_spl_load_image(offs, size, dst);
        if (!ret)
                return size;
        else
                return 0;
}

Sorry it's a bit of time and I think that we are discussing about this
commit at the time was introduced

Michael

>
> Which means that we add in sector twice (but with different units!)
>
> > The documentation for this this function is as follows
> >
> >> nand_spl_adjust_offset - Adjust offset from a starting sector
> >> @sector:     Address of the sector
> >> @offs:       Offset starting from @sector
> >>
> >> If one or more bad blocks are in the address space between @sector
> >> and @sector + @offs, @offs is increased by the NAND block size for
> >> each bad block found.
> >
> > In particular, note that offs is in units of bytes. However, we know
> > from above that at this point in the function, offs is in units of
> > bl_len. Oops.
> >
> >> #else
> >>      offs *= load->bl_len;
> >>      size *= load->bl_len;
> >
> > This code path adjusts both offs and size to be in units of
>
> *units of bytes
>
> >> #endif
> >>      err = nand_spl_load_image(offs, size, dst);
> >
> > So by the time we get here, one code path is in units of bytes, and the
> > other is in units of bl_len. nand_spl_load_image seems to expect bytes
> > (though there is no documentation, so I can't be sure).  Which of course
> > begs the question: how did the code introduced in 9f6a14c47f ever work?
> >
> >>      if (err)
> >>              return 0;
> >>
> >>      return size / load->bl_len;
> >
> > And of course, this should only be done if size is in units of bytes.
> >
> >> }
> >
> > I have some patches for what I think are bugs, but the logic of this
> > function is so confused that I am unable to determine if they actually
> > are bugs. I would greatly appreciate if some of the authors of the
> > mentioned commits could comment on their intent, and what they thought
> > the units of offs and size were.
> >
> > --Sean
> >



-- 
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael@amarulasolutions.com
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info@amarulasolutions.com
www.amarulasolutions.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The contradictory nature of spl_nand_fit_read
  2022-04-01 21:43   ` Michael Nazzareno Trimarchi
@ 2022-04-03 15:56     ` Dario Binacchi
  2022-04-04 16:26       ` Sean Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Binacchi @ 2022-04-03 15:56 UTC (permalink / raw)
  To: Michael Nazzareno Trimarchi, Sean Anderson
  Cc: U-Boot Mailing List, Tim Harvey, Tom Rini, Lokesh Vutla

Hi Sean,

> Il 01/04/2022 23:43 Michael Nazzareno Trimarchi <michael@amarulasolutions.com> ha scritto:
> 
>  
> Hi Sean
> 
> On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson <sean.anderson@seco.com> wrote:
> >
> >
> >
> > On 4/1/22 2:46 PM, Sean Anderson wrote:
> > > Hi all,
> > >
> > > I don't understand how spl_nand_fit_read is supposed to work. This
> > > function has been seemingly hacked together by multiple people over the
> > > years, and it (presumably) (somehow) works despite mass confusion of
> > > units. I tried to do some refactoring, but the contradictions make it
> > > very difficult to determine what is a bug and what is intentional.
> > >
> > > Lets start with the basics. spl_nand_fit_read is used to implement
> > > spl_load_info.load. I've reproduced the documentation for this function
> > > below:
> > >
> > >>  read() - Read from device
> > >>
> > >>  @load: Information about the load state
> > >>  @sector: Sector number to read from (each @load->bl_len bytes)
> > >>  @count: Number of sectors to read
> > >>  @buf: Buffer to read into
> > >>  @return number of sectors read, 0 on error
> > >
> > > In particular, both @sector and @count are in units of load.bl_len. In
> > > addition, the return value should be @count on success.
> > >
> > >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
> > >>                             ulong size, void *dst)
> > >> {
> > >>      int err;
> > >> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE
> > >
> > > The following logic was introduced in commit 9f6a14c47f ("spl: fit:
> > > nand: fix fit loading in case of bad blocks"). However, at the time it
> > > was not guarded by the above ifdef. nand_spl_adjust_offset is not
> > > defined for all nand drivers. It is defined for at least mxs, am335x,
> > > atmel, and simple. Additionally, some drivers (such as rockchip)
> > > implement bad block skipping in nand_spl_load_image.
> > >
> > > It's not at all clear to me that there is common config which determines
> > > whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
> > > aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
> > > above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
> > > defined.
> > >
> > >>      ulong sector;
> > >>
> > >>      sector = *(int *)load->priv;
> > >
> > > This is the offset passed to nand_spl_load_image. This
> > > offset is in units of bytes, and is aligned to the block size. However,
> > > it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
> > >
> > >>      offs = sector + nand_spl_adjust_offset(sector, offs - sector);
> >
> > I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:
> >
> > > return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);
> 
> The spl_nand_fit_read was called from info->read that was in units of sectors
> count = info->read(info, sector, sectors, fit);
> debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n",
>               sector, sectors, fit, count, size);
> 
> 
> load.dev = NULL;
> load.priv = &offset;
> load.filename = NULL;
> load.bl_len = 1;
> load.read = spl_nand_fit_read;
> 
> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
>                                ulong size, void *dst)
> {
>         ulong sector;
>         int ret;
> 
>         sector = *(int *)load->priv;
>         offs = sector + nand_spl_adjust_offset(sector, offs - sector);
>         ret = nand_spl_load_image(offs, size, dst);
>         if (!ret)
>                 return size;
>         else
>                 return 0;
> }
> 
> Sorry it's a bit of time and I think that we are discussing about this
> commit at the time was introduced
> 
> Michael
> 
> >
> > Which means that we add in sector twice (but with different units!)
> >
> > > The documentation for this this function is as follows
> > >
> > >> nand_spl_adjust_offset - Adjust offset from a starting sector
> > >> @sector:     Address of the sector
> > >> @offs:       Offset starting from @sector
> > >>
> > >> If one or more bad blocks are in the address space between @sector
> > >> and @sector + @offs, @offs is increased by the NAND block size for
> > >> each bad block found.
> > >
> > > In particular, note that offs is in units of bytes. However, we know
> > > from above that at this point in the function, offs is in units of
> > > bl_len. Oops.
> > >
> > >> #else
> > >>      offs *= load->bl_len;
> > >>      size *= load->bl_len;
> > >
> > > This code path adjusts both offs and size to be in units of
> >
> > *units of bytes
> >
> > >> #endif
> > >>      err = nand_spl_load_image(offs, size, dst);
> > >
> > > So by the time we get here, one code path is in units of bytes, and the
> > > other is in units of bl_len. nand_spl_load_image seems to expect bytes
> > > (though there is no documentation, so I can't be sure).  Which of course
> > > begs the question: how did the code introduced in 9f6a14c47f ever work?

The commit 9f6a14c47f only added the code strictly necessary to skip a bad block
that was in that memory area. I heavily tested it on a custom board that presented
this problem and that without it it would not have been usable. The patch itself 
was also tested on SOCs that did not have bad block, therefore backward compatible. 
I think that your doubts arise from the patches following mine and that, however, 
I have not had the opportunity to test.

thanks and regards,
Dario
> > >
> > >>      if (err)
> > >>              return 0;
> > >>
> > >>      return size / load->bl_len;
> > >
> > > And of course, this should only be done if size is in units of bytes.
> > >
> > >> }
> > >
> > > I have some patches for what I think are bugs, but the logic of this
> > > function is so confused that I am unable to determine if they actually
> > > are bugs. I would greatly appreciate if some of the authors of the
> > > mentioned commits could comment on their intent, and what they thought
> > > the units of offs and size were.
> > >
> > > --Sean
> > >
> 
> 
> 
> -- 
> Michael Nazzareno Trimarchi
> Co-Founder & Chief Executive Officer
> M. +39 347 913 2170
> michael@amarulasolutions.com
> __________________________________
> 
> Amarula Solutions BV
> Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
> T. +31 (0)85 111 9172
> info@amarulasolutions.com
> www.amarulasolutions.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The contradictory nature of spl_nand_fit_read
  2022-04-03 15:56     ` Dario Binacchi
@ 2022-04-04 16:26       ` Sean Anderson
  2022-04-04 18:03         ` Tim Harvey
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Anderson @ 2022-04-04 16:26 UTC (permalink / raw)
  To: Dario Binacchi, Michael Nazzareno Trimarchi
  Cc: U-Boot Mailing List, Tim Harvey, Tom Rini, Lokesh Vutla



On 4/3/22 11:56 AM, Dario Binacchi wrote:
> Hi Sean,
> 
>> Il 01/04/2022 23:43 Michael Nazzareno Trimarchi <michael@amarulasolutions.com> ha scritto:
>> 
>>  
>> Hi Sean
>> 
>> On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson <sean.anderson@seco.com> wrote:
>> >
>> >
>> >
>> > On 4/1/22 2:46 PM, Sean Anderson wrote:
>> > > Hi all,
>> > >
>> > > I don't understand how spl_nand_fit_read is supposed to work. This
>> > > function has been seemingly hacked together by multiple people over the
>> > > years, and it (presumably) (somehow) works despite mass confusion of
>> > > units. I tried to do some refactoring, but the contradictions make it
>> > > very difficult to determine what is a bug and what is intentional.
>> > >
>> > > Lets start with the basics. spl_nand_fit_read is used to implement
>> > > spl_load_info.load. I've reproduced the documentation for this function
>> > > below:
>> > >
>> > >>  read() - Read from device
>> > >>
>> > >>  @load: Information about the load state
>> > >>  @sector: Sector number to read from (each @load->bl_len bytes)
>> > >>  @count: Number of sectors to read
>> > >>  @buf: Buffer to read into
>> > >>  @return number of sectors read, 0 on error
>> > >
>> > > In particular, both @sector and @count are in units of load.bl_len. In
>> > > addition, the return value should be @count on success.
>> > >
>> > >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
>> > >>                             ulong size, void *dst)
>> > >> {
>> > >>      int err;
>> > >> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE
>> > >
>> > > The following logic was introduced in commit 9f6a14c47f ("spl: fit:
>> > > nand: fix fit loading in case of bad blocks"). However, at the time it
>> > > was not guarded by the above ifdef. nand_spl_adjust_offset is not
>> > > defined for all nand drivers. It is defined for at least mxs, am335x,
>> > > atmel, and simple. Additionally, some drivers (such as rockchip)
>> > > implement bad block skipping in nand_spl_load_image.
>> > >
>> > > It's not at all clear to me that there is common config which determines
>> > > whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
>> > > aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
>> > > above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
>> > > defined.
>> > >
>> > >>      ulong sector;
>> > >>
>> > >>      sector = *(int *)load->priv;
>> > >
>> > > This is the offset passed to nand_spl_load_image. This
>> > > offset is in units of bytes, and is aligned to the block size. However,
>> > > it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
>> > >
>> > >>      offs = sector + nand_spl_adjust_offset(sector, offs - sector);
>> >
>> > I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:
>> >
>> > > return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);
>> 
>> The spl_nand_fit_read was called from info->read that was in units of sectors
>> count = info->read(info, sector, sectors, fit);
>> debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n",
>>               sector, sectors, fit, count, size);
>> 
>> 
>> load.dev = NULL;
>> load.priv = &offset;
>> load.filename = NULL;
>> load.bl_len = 1;

Ah, here is the problem. If bl_len is always 1, then there is no problem
going back and forth between units. But aa0032f672 ("spl: fit: nand: allow
for non-page-aligned elements") means that bl_len is not always 1, so
now the conversion matters.

So I suppose my question is now for Tim: how did you test your patch?

>> load.read = spl_nand_fit_read;
>> 
>> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
>>                                ulong size, void *dst)
>> {
>>         ulong sector;
>>         int ret;
>> 
>>         sector = *(int *)load->priv;
>>         offs = sector + nand_spl_adjust_offset(sector, offs - sector);

This line is ONLY ok if bl_len = 1

>>         ret = nand_spl_load_image(offs, size, dst);
>>         if (!ret)
>>                 return size;
>>         else
>>                 return 0;
>> }
>> 
>> Sorry it's a bit of time and I think that we are discussing about this
>> commit at the time was introduced
>> Michael
>> 
>> >
>> > Which means that we add in sector twice (but with different units!)
>> >
>> > > The documentation for this this function is as follows
>> > >
>> > >> nand_spl_adjust_offset - Adjust offset from a starting sector
>> > >> @sector:     Address of the sector
>> > >> @offs:       Offset starting from @sector
>> > >>
>> > >> If one or more bad blocks are in the address space between @sector
>> > >> and @sector + @offs, @offs is increased by the NAND block size for
>> > >> each bad block found.
>> > >
>> > > In particular, note that offs is in units of bytes. However, we know
>> > > from above that at this point in the function, offs is in units of
>> > > bl_len. Oops.
>> > >
>> > >> #else
>> > >>      offs *= load->bl_len;
>> > >>      size *= load->bl_len;
>> > >
>> > > This code path adjusts both offs and size to be in units of
>> >
>> > *units of bytes
>> >
>> > >> #endif
>> > >>      err = nand_spl_load_image(offs, size, dst);
>> > >
>> > > So by the time we get here, one code path is in units of bytes, and the
>> > > other is in units of bl_len. nand_spl_load_image seems to expect bytes
>> > > (though there is no documentation, so I can't be sure).  Which of course
>> > > begs the question: how did the code introduced in 9f6a14c47f ever work?
> 
> The commit 9f6a14c47f only added the code strictly necessary to skip a bad block
> that was in that memory area. I heavily tested it on a custom board that presented
> this problem and that without it it would not have been usable. The patch itself 
> was also tested on SOCs that did not have bad block, therefore backward compatible. 
> I think that your doubts arise from the patches following mine and that, however, 
> I have not had the opportunity to test.

Have you tested v2021.07 or later?

--Sean

>> > >
>> > >>      if (err)
>> > >>              return 0;
>> > >>
>> > >>      return size / load->bl_len;
>> > >
>> > > And of course, this should only be done if size is in units of bytes.
>> > >
>> > >> }
>> > >
>> > > I have some patches for what I think are bugs, but the logic of this
>> > > function is so confused that I am unable to determine if they actually
>> > > are bugs. I would greatly appreciate if some of the authors of the
>> > > mentioned commits could comment on their intent, and what they thought
>> > > the units of offs and size were.
>> > >
>> > > --Sean
>> > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The contradictory nature of spl_nand_fit_read
  2022-04-04 16:26       ` Sean Anderson
@ 2022-04-04 18:03         ` Tim Harvey
  0 siblings, 0 replies; 6+ messages in thread
From: Tim Harvey @ 2022-04-04 18:03 UTC (permalink / raw)
  To: Sean Anderson
  Cc: Dario Binacchi, Michael Nazzareno Trimarchi, U-Boot Mailing List,
	Tom Rini, Lokesh Vutla

On Mon, Apr 4, 2022 at 9:26 AM Sean Anderson <sean.anderson@seco.com> wrote:
>
>
>
> On 4/3/22 11:56 AM, Dario Binacchi wrote:
> > Hi Sean,
> >
> >> Il 01/04/2022 23:43 Michael Nazzareno Trimarchi <michael@amarulasolutions.com> ha scritto:
> >>
> >>
> >> Hi Sean
> >>
> >> On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson <sean.anderson@seco.com> wrote:
> >> >
> >> >
> >> >
> >> > On 4/1/22 2:46 PM, Sean Anderson wrote:
> >> > > Hi all,
> >> > >
> >> > > I don't understand how spl_nand_fit_read is supposed to work. This
> >> > > function has been seemingly hacked together by multiple people over the
> >> > > years, and it (presumably) (somehow) works despite mass confusion of
> >> > > units. I tried to do some refactoring, but the contradictions make it
> >> > > very difficult to determine what is a bug and what is intentional.
> >> > >
> >> > > Lets start with the basics. spl_nand_fit_read is used to implement
> >> > > spl_load_info.load. I've reproduced the documentation for this function
> >> > > below:
> >> > >
> >> > >>  read() - Read from device
> >> > >>
> >> > >>  @load: Information about the load state
> >> > >>  @sector: Sector number to read from (each @load->bl_len bytes)
> >> > >>  @count: Number of sectors to read
> >> > >>  @buf: Buffer to read into
> >> > >>  @return number of sectors read, 0 on error
> >> > >
> >> > > In particular, both @sector and @count are in units of load.bl_len. In
> >> > > addition, the return value should be @count on success.
> >> > >
> >> > >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
> >> > >>                             ulong size, void *dst)
> >> > >> {
> >> > >>      int err;
> >> > >> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE
> >> > >
> >> > > The following logic was introduced in commit 9f6a14c47f ("spl: fit:
> >> > > nand: fix fit loading in case of bad blocks"). However, at the time it
> >> > > was not guarded by the above ifdef. nand_spl_adjust_offset is not
> >> > > defined for all nand drivers. It is defined for at least mxs, am335x,
> >> > > atmel, and simple. Additionally, some drivers (such as rockchip)
> >> > > implement bad block skipping in nand_spl_load_image.
> >> > >
> >> > > It's not at all clear to me that there is common config which determines
> >> > > whether nand_spl_adjust_offset is implemented. Nevertheless, in commit
> >> > > aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the
> >> > > above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is
> >> > > defined.
> >> > >
> >> > >>      ulong sector;
> >> > >>
> >> > >>      sector = *(int *)load->priv;
> >> > >
> >> > > This is the offset passed to nand_spl_load_image. This
> >> > > offset is in units of bytes, and is aligned to the block size. However,
> >> > > it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops.
> >> > >
> >> > >>      offs = sector + nand_spl_adjust_offset(sector, offs - sector);
> >> >
> >> > I forgot to mention this, but this also adds sector to offs, but offs already includes the sector:
> >> >
> >> > > return spl_load_simple_fit(spl_image, &load, offset / bl_len, header);
> >>
> >> The spl_nand_fit_read was called from info->read that was in units of sectors
> >> count = info->read(info, sector, sectors, fit);
> >> debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n",
> >>               sector, sectors, fit, count, size);
> >>
> >>
> >> load.dev = NULL;
> >> load.priv = &offset;
> >> load.filename = NULL;
> >> load.bl_len = 1;
>
> Ah, here is the problem. If bl_len is always 1, then there is no problem
> going back and forth between units. But aa0032f672 ("spl: fit: nand: allow
> for non-page-aligned elements") means that bl_len is not always 1, so
> now the conversion matters.
>
> So I suppose my question is now for Tim: how did you test your patch?
>

Hi Sean,

I maintain a set of boards using gwventana_nand_defconfg that have
various NAND parts.

Yes, I vaguely recall running into problems when moving my boards to
DM. It was some time ago but if I recall I found
nand_spl_adjust_offset didn't exist for MXS and when I noticed the
requirement for it was introduced in 9f6a14c47ff9 ("spl: fit: nand:
fix fit loading in case of bad blocks") I looked at that patch and
noticed it added the notion of sectors which lead me to submitting
39cb85043cdb "(spl: fit: nand: skip bad block handling if NAND chip
not fully defined)" which added the ifdef check around it to
essentially move it back to bytes.

As far as testing goes I test with IMX6 boards using NAND as a boot
device with board/gateworks/gw_ventana and gwventana_nand_defconfig

With some debugging added my boot looks like this:
Booting from NAND
PMIC:  PFUZE100 ID=0x10
Trying to boot from NAND
spl: nand - using hw ecc
spl_nand_load_element bl_len=0xe00000 offset=0x1000 mtd=18200268
Found FIT
spl_nand_fit_read offset=0xe00 size=0x3 bl_len=0x1000
DTB:   imx6q-gw54xx
spl_nand_fit_read offset=0xe02 size=0xca bl_len=0x1000
spl_nand_fit_read offset=0xf07 size=0xc bl_len=0x1000

For my config CONFIG_SYS_NAND_BLOCK_SIZE is not defined because the
boards supported use a variety of devices:
2GiB Micron Technology MT29F16G08ADACAH4 02120768 (4k page size, 256k
erase size, 224 oob size, legacy layout ecc str=16)
2GiB Cypress S34ML16G202BHI000 02120939 (2k page size, 128k erase
size, 128B oob size, legacy layout ecc str=16)
1GiB Micron MT29F8G08ABACAH4 02120782 (4k page size, 156k erase size,
224 oob size, legacy layout ecc str=16)
256MiB Micron MT29F2G08ABAEAH4 02120770 (2k page size, 128k erase
size, 64B oob size, legacy layout ecc str=16)

So maybe the confusion is all around CONFIG_SYS_NAND_BLOCK_SIZE?

I appreciate you trying to unwind and clean all of this up. I recall
being thoroughly confused when it came time for me to move my boards
to DM and there were so many changes to unravel.

Best Regards,

Tim

> >> load.read = spl_nand_fit_read;
> >>
> >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs,
> >>                                ulong size, void *dst)
> >> {
> >>         ulong sector;
> >>         int ret;
> >>
> >>         sector = *(int *)load->priv;
> >>         offs = sector + nand_spl_adjust_offset(sector, offs - sector);
>
> This line is ONLY ok if bl_len = 1
>
> >>         ret = nand_spl_load_image(offs, size, dst);
> >>         if (!ret)
> >>                 return size;
> >>         else
> >>                 return 0;
> >> }
> >>
> >> Sorry it's a bit of time and I think that we are discussing about this
> >> commit at the time was introduced
> >> Michael
> >>
> >> >
> >> > Which means that we add in sector twice (but with different units!)
> >> >
> >> > > The documentation for this this function is as follows
> >> > >
> >> > >> nand_spl_adjust_offset - Adjust offset from a starting sector
> >> > >> @sector:     Address of the sector
> >> > >> @offs:       Offset starting from @sector
> >> > >>
> >> > >> If one or more bad blocks are in the address space between @sector
> >> > >> and @sector + @offs, @offs is increased by the NAND block size for
> >> > >> each bad block found.
> >> > >
> >> > > In particular, note that offs is in units of bytes. However, we know
> >> > > from above that at this point in the function, offs is in units of
> >> > > bl_len. Oops.
> >> > >
> >> > >> #else
> >> > >>      offs *= load->bl_len;
> >> > >>      size *= load->bl_len;
> >> > >
> >> > > This code path adjusts both offs and size to be in units of
> >> >
> >> > *units of bytes
> >> >
> >> > >> #endif
> >> > >>      err = nand_spl_load_image(offs, size, dst);
> >> > >
> >> > > So by the time we get here, one code path is in units of bytes, and the
> >> > > other is in units of bl_len. nand_spl_load_image seems to expect bytes
> >> > > (though there is no documentation, so I can't be sure).  Which of course
> >> > > begs the question: how did the code introduced in 9f6a14c47f ever work?
> >
> > The commit 9f6a14c47f only added the code strictly necessary to skip a bad block
> > that was in that memory area. I heavily tested it on a custom board that presented
> > this problem and that without it it would not have been usable. The patch itself
> > was also tested on SOCs that did not have bad block, therefore backward compatible.
> > I think that your doubts arise from the patches following mine and that, however,
> > I have not had the opportunity to test.
>
> Have you tested v2021.07 or later?
>
> --Sean
>
> >> > >
> >> > >>      if (err)
> >> > >>              return 0;
> >> > >>
> >> > >>      return size / load->bl_len;
> >> > >
> >> > > And of course, this should only be done if size is in units of bytes.
> >> > >
> >> > >> }
> >> > >
> >> > > I have some patches for what I think are bugs, but the logic of this
> >> > > function is so confused that I am unable to determine if they actually
> >> > > are bugs. I would greatly appreciate if some of the authors of the
> >> > > mentioned commits could comment on their intent, and what they thought
> >> > > the units of offs and size were.
> >> > >
> >> > > --Sean
> >> > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-04 18:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 18:46 The contradictory nature of spl_nand_fit_read Sean Anderson
2022-04-01 18:53 ` Sean Anderson
2022-04-01 21:43   ` Michael Nazzareno Trimarchi
2022-04-03 15:56     ` Dario Binacchi
2022-04-04 16:26       ` Sean Anderson
2022-04-04 18:03         ` Tim Harvey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).