All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [OE] help with Equivalence Server
       [not found] <88B62901-053C-4E95-9896-868EE16A4467@getmailspring.com>
@ 2023-08-06 19:15 ` Joshua Watt
  2023-08-07 10:01   ` [OE-core] " Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Joshua Watt @ 2023-08-06 19:15 UTC (permalink / raw)
  To: Maksim Chichikalov; +Cc: OE-core

[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]

On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov <max.chichikalov@gmail.com>
wrote:

> Hi Joshua.
>
> My name is Max; nice to e-meet you. I need your help with Equivalence
> Server :)
> I watched your presentation on Youtube - it 100% helped me to understand
> the logic better and debug the issue we are facing. Thank you for all your
> input into OE development.
>
> I found in OE repo that you introduced "def OEOuthashBasic()", so I
> decided to write to you first before opening the topic on an email list.
>
> *The problem:*
>
> Input data are propagated to the output hash generation function, which
> generates a different out-hash.
>
> *Description:*
>
> All "_git.bb" recipes have to append SRCPV to PV. As a result, PV is
> different on each commit.
>
> The OEOuthashBasic function includes SSTATE_PKGSPEC to generate hash,
> which contains PV (PV contains git hash). As a result, there is no way to
> generate the same out-hash even if changes introduced within a commit were
> trivial.
>

Right, this is sort of on purpose, because the hash equivalence is
basically trying to say that an sstate object can be used in place of
another one, even when the task hashes aren't the same (but the output
hashes are). However, the sstate code itself will only look for sstate
object with a certain name (which include PV); hash equivalency does have
_some_ control over the file name sstate looks for, since it will replace
the taskhash portion of the name with the unified hash, but it doesn't have
complete control.


>
> In our codebase, our components have API part, which is managed by an
> independent recipe per component. The described above problem caused the
> recompilation of all components dependent on API, even in cases when API
> was not changed. CI for pull requests recompiles mostly the entire code
> base, I need to do something with it.  (sorry, quite hard for me to explain
> it in a nutshell, let me know if you like to know slightly more details)
>

Ya, sounds like a typical mono-repo design?


>
> I see a couple of options for us:
>
>    - Add a custom implementation of out-hash generated function and
>    overwrite SSTATE_HASHEQUIV_METHOD.
>    - Better understand why it's mandatory to append SRCPV to PV, and
>    maybe it's flexible in our cases to do it.
>
>
This might be the best option, at least for your recipes, but I've CC'd the
list for additional feedback


>
>    - Propose a patch to fix OEOuthashBasic().
>
> In my humble opinion, the commit's hash shouldn't be included in out-hash
> generation, it doesn't make sense. Unless I'm missing something important -
> What are your thoughts?
>

Yes and no. It's not intentional, but a side effect of hash equivalency
trying to make sure that the things it's marking as equivalent can actually
be found in sstate (basically, because sstate include the commit hash, hash
equivalency kinda has to include it).


>
> I appreciate it if you reply with your thoughts; many thanks for
> considering my request.
>
> What makes you *smile* today?
>
> Sincerely, Max Chichikalov
>

[-- Attachment #2: Type: text/html, Size: 4876 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-06 19:15 ` [OE] help with Equivalence Server Joshua Watt
@ 2023-08-07 10:01   ` Richard Purdie
  2023-08-07 14:15     ` Joshua Watt
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2023-08-07 10:01 UTC (permalink / raw)
  To: Joshua Watt, Maksim Chichikalov; +Cc: OE-core

On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> <max.chichikalov@gmail.com> wrote:
> > My name is Max; nice to e-meet you. I need your help with
> > Equivalence Server :)
> > I watched your presentation on Youtube - it 100% helped me to
> > understand the logic better and debug the issue we are facing.
> > Thank you for all your input into OE development.
> > 
> > I found in OE repo that you introduced "def OEOuthashBasic()", so I
> > decided to write to you first before opening the topic on an email
> > list.
> > 
> > The problem:
> > 
> > Input data are propagated to the output hash generation function,
> > which generates a different out-hash.
> > 
> > Description:
> > 
> > All "_git.bb" recipes have to append SRCPV to PV. As a result, PV
> > is different on each commit.
> > 
> > The OEOuthashBasic function includes SSTATE_PKGSPEC to generate
> > hash, which contains PV (PV contains git hash). As a result, there
> > is no way to generate the same out-hash even if changes introduced
> > within a commit were trivial.
> > 
> 
> Right, this is sort of on purpose, because the hash equivalence is
> basically trying to say that an sstate object can be used in place of
> another one, even when the task hashes aren't the same (but the
> output hashes are). However, the sstate code itself will only look
> for sstate object with a certain name (which include PV); hash
> equivalency does have _some_ control over the file name sstate looks
> for, since it will replace the taskhash portion of the name with the
> unified hash, but it doesn't have complete control.
>  
> > 
> > In our codebase, our components have API part, which is managed by
> > an independent recipe per component. The described above problem
> > caused the recompilation of all components dependent on API, even
> > in cases when API was not changed. CI for pull requests recompiles
> > mostly the entire code base, I need to do something with it. 
> > (sorry, quite hard for me to explain it in a nutshell, let me know
> > if you like to know slightly more details) 
> > 
> 
> Ya, sounds like a typical mono-repo design?
>  
> > 
> > I see a couple of options for us:
> >  * Add a custom implementation of out-hash generated function and
> > overwrite SSTATE_HASHEQUIV_METHOD.
> >  * Better understand why it's mandatory to append SRCPV to PV, and
> > maybe it's flexible in our cases to do it.
> > 
> 
> This might be the best option, at least for your recipes, but I've
> CC'd the list for additional feedback
>  
> >  * Propose a patch to fix OEOuthashBasic().
> > 
> > In my humble opinion, the commit's hash shouldn't be included in
> > out-hash generation, it doesn't make sense. Unless I'm missing
> > something important - What are your thoughts?
> > 
> 
> Yes and no. It's not intentional, but a side effect of hash
> equivalency trying to make sure that the things it's marking as
> equivalent can actually be found in sstate (basically, because sstate
> include the commit hash, hash equivalency kinda has to include it).

This all sounds a bit unfortunate.

sstate only works as long as the filenames are predictable. Some
elements of the sstate filenames are essential to operation, e.g. the
package architecture since one input would result in multiple files
with the same hash in the filename of the output otherwise. The recipe
name and version are there mainly for debugging to allow someone to
more easily know where an sstate object came from and what it
represents. This is summarised by the comment in sstate.bbclass "Fields
0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for information"
in generate_sstatefn().

When we added hash equivalence, we added the ability to equate the
hashes but we'd not considered that the version string mismatch may
stop significant artefact reuse. I suspect at the time we reasoned that
if the version changes, the output probably does too.

Sadly fixing this isn't simple. Changing the hash algorithm isn't
enough, we need to stop the SRCREV part of PV being used in the sstate
filename. If we stop that happening, the output hash algorithm may well
"just work" at that point, I'm not sure if it directly encodes PV or
just the sstate filename, hopefully the latter.

The hard part is how to do this generically without adding a lot of
complex and potentially fragile code. The datastore context in that
function is the core configuration, not the recipe's datastore. The
options I can think of offhand:

a) Live with the issue

b) Put a hack into generate_sstatefn() which changed the PV element if
it matched a pattern but we'd have to do this for each SCM as needed.
Ugly but lowest overhead.

c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV to drop
SRCPV to a fixed string. Nicer code in some ways but horrible
parsing/performance overhead since every recipe, even non-SCM ones
would be affected.

d) A partial version of c) where recipes can set SSTATEPV to a function
if they need it. Solves your specific case with overhead without
affecting everyone else. Would not solve the issue generally without
manual user intervention.

I'm not sure which of these will end up making the most sense. These
assume the output hash code uses the sstate filename and not PV. If it
uses PV there would be more work needed.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-07 10:01   ` [OE-core] " Richard Purdie
@ 2023-08-07 14:15     ` Joshua Watt
  2023-08-07 14:41       ` Maksim Chichikalov
  0 siblings, 1 reply; 8+ messages in thread
From: Joshua Watt @ 2023-08-07 14:15 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Maksim Chichikalov, OE-core

On Mon, Aug 7, 2023 at 4:01 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> > On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> > <max.chichikalov@gmail.com> wrote:
> > > My name is Max; nice to e-meet you. I need your help with
> > > Equivalence Server :)
> > > I watched your presentation on Youtube - it 100% helped me to
> > > understand the logic better and debug the issue we are facing.
> > > Thank you for all your input into OE development.
> > >
> > > I found in OE repo that you introduced "def OEOuthashBasic()", so I
> > > decided to write to you first before opening the topic on an email
> > > list.
> > >
> > > The problem:
> > >
> > > Input data are propagated to the output hash generation function,
> > > which generates a different out-hash.
> > >
> > > Description:
> > >
> > > All "_git.bb" recipes have to append SRCPV to PV. As a result, PV
> > > is different on each commit.
> > >
> > > The OEOuthashBasic function includes SSTATE_PKGSPEC to generate
> > > hash, which contains PV (PV contains git hash). As a result, there
> > > is no way to generate the same out-hash even if changes introduced
> > > within a commit were trivial.
> > >
> >
> > Right, this is sort of on purpose, because the hash equivalence is
> > basically trying to say that an sstate object can be used in place of
> > another one, even when the task hashes aren't the same (but the
> > output hashes are). However, the sstate code itself will only look
> > for sstate object with a certain name (which include PV); hash
> > equivalency does have _some_ control over the file name sstate looks
> > for, since it will replace the taskhash portion of the name with the
> > unified hash, but it doesn't have complete control.
> >
> > >
> > > In our codebase, our components have API part, which is managed by
> > > an independent recipe per component. The described above problem
> > > caused the recompilation of all components dependent on API, even
> > > in cases when API was not changed. CI for pull requests recompiles
> > > mostly the entire code base, I need to do something with it.
> > > (sorry, quite hard for me to explain it in a nutshell, let me know
> > > if you like to know slightly more details)
> > >
> >
> > Ya, sounds like a typical mono-repo design?
> >
> > >
> > > I see a couple of options for us:
> > >  * Add a custom implementation of out-hash generated function and
> > > overwrite SSTATE_HASHEQUIV_METHOD.
> > >  * Better understand why it's mandatory to append SRCPV to PV, and
> > > maybe it's flexible in our cases to do it.
> > >
> >
> > This might be the best option, at least for your recipes, but I've
> > CC'd the list for additional feedback
> >
> > >  * Propose a patch to fix OEOuthashBasic().
> > >
> > > In my humble opinion, the commit's hash shouldn't be included in
> > > out-hash generation, it doesn't make sense. Unless I'm missing
> > > something important - What are your thoughts?
> > >
> >
> > Yes and no. It's not intentional, but a side effect of hash
> > equivalency trying to make sure that the things it's marking as
> > equivalent can actually be found in sstate (basically, because sstate
> > include the commit hash, hash equivalency kinda has to include it).
>
> This all sounds a bit unfortunate.
>
> sstate only works as long as the filenames are predictable. Some
> elements of the sstate filenames are essential to operation, e.g. the
> package architecture since one input would result in multiple files
> with the same hash in the filename of the output otherwise. The recipe
> name and version are there mainly for debugging to allow someone to
> more easily know where an sstate object came from and what it
> represents. This is summarised by the comment in sstate.bbclass "Fields
> 0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for information"
> in generate_sstatefn().
>
> When we added hash equivalence, we added the ability to equate the
> hashes but we'd not considered that the version string mismatch may
> stop significant artefact reuse. I suspect at the time we reasoned that
> if the version changes, the output probably does too.
>
> Sadly fixing this isn't simple. Changing the hash algorithm isn't
> enough, we need to stop the SRCREV part of PV being used in the sstate
> filename. If we stop that happening, the output hash algorithm may well
> "just work" at that point, I'm not sure if it directly encodes PV or
> just the sstate filename, hopefully the latter.

It's including SSTATE_PKGSPEC in the output hash calculation,
specifically so that a change in sstate filename changes the outhash
(that way, hash equivalence will never map two difference sstate files
as equivalent).

>
> The hard part is how to do this generically without adding a lot of
> complex and potentially fragile code. The datastore context in that
> function is the core configuration, not the recipe's datastore. The
> options I can think of offhand:
>
> a) Live with the issue
>
> b) Put a hack into generate_sstatefn() which changed the PV element if
> it matched a pattern but we'd have to do this for each SCM as needed.
> Ugly but lowest overhead.
>
> c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV to drop
> SRCPV to a fixed string. Nicer code in some ways but horrible
> parsing/performance overhead since every recipe, even non-SCM ones
> would be affected.
>
> d) A partial version of c) where recipes can set SSTATEPV to a function
> if they need it. Solves your specific case with overhead without
> affecting everyone else. Would not solve the issue generally without
> manual user intervention.
>
> I'm not sure which of these will end up making the most sense. These
> assume the output hash code uses the sstate filename and not PV. If it
> uses PV there would be more work needed.

I'm not seeing where SRCPV or SRCREV are being explicitly added to
SSTATE_PKGSPEC; PV is added, but SRCPV or SRCREV would have to be
manually added to PV in the recipe itself for that to happen correct?
Maybe it's as simple as not doing that in these particular recipes?

>
> Cheers,
>
> Richard


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-07 14:15     ` Joshua Watt
@ 2023-08-07 14:41       ` Maksim Chichikalov
  2023-08-07 14:54         ` Joshua Watt
  0 siblings, 1 reply; 8+ messages in thread
From: Maksim Chichikalov @ 2023-08-07 14:41 UTC (permalink / raw)
  To: Joshua Watt; +Cc: Richard Purdie, OE-core

[-- Attachment #1: Type: text/plain, Size: 7110 bytes --]

> On Mon, Aug 7, 2023 at 4:01 AM Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> >
> > On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> > > On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> > > <max.chichikalov@gmail.com> wrote:
> > > > My name is Max; nice to e-meet you. I need your help with
> > > > Equivalence Server :)
> > > > I watched your presentation on Youtube - it 100% helped me to
> > > > understand the logic better and debug the issue we are facing.
> > > > Thank you for all your input into OE development.
> > > >
> > > > I found in OE repo that you introduced "def OEOuthashBasic()", so I
> > > > decided to write to you first before opening the topic on an email
> > > > list.
> > > >
> > > > The problem:
> > > >
> > > > Input data are propagated to the output hash generation function,
> > > > which generates a different out-hash.
> > > >
> > > > Description:
> > > >
> > > > All "_git.bb" recipes have to append SRCPV to PV. As a result, PV
> > > > is different on each commit.
> > > >
> > > > The OEOuthashBasic function includes SSTATE_PKGSPEC to generate
> > > > hash, which contains PV (PV contains git hash). As a result, there
> > > > is no way to generate the same out-hash even if changes introduced
> > > > within a commit were trivial.
> > > >
> > >
> > > Right, this is sort of on purpose, because the hash equivalence is
> > > basically trying to say that an sstate object can be used in place of
> > > another one, even when the task hashes aren't the same (but the
> > > output hashes are). However, the sstate code itself will only look
> > > for sstate object with a certain name (which include PV); hash
> > > equivalency does have _some_ control over the file name sstate looks
> > > for, since it will replace the taskhash portion of the name with the
> > > unified hash, but it doesn't have complete control.
> > >
> > > >
> > > > In our codebase, our components have API part, which is managed by
> > > > an independent recipe per component. The described above problem
> > > > caused the recompilation of all components dependent on API, even
> > > > in cases when API was not changed. CI for pull requests recompiles
> > > > mostly the entire code base, I need to do something with it.
> > > > (sorry, quite hard for me to explain it in a nutshell, let me know
> > > > if you like to know slightly more details)
> > > >
> > >
> > > Ya, sounds like a typical mono-repo design?
> > >
> > > >
> > > > I see a couple of options for us:
> > > > * Add a custom implementation of out-hash generated function and
> > > > overwrite SSTATE_HASHEQUIV_METHOD.
> > > > * Better understand why it's mandatory to append SRCPV to PV, and
> > > > maybe it's flexible in our cases to do it.
> > > >
> > >
> > > This might be the best option, at least for your recipes, but I've
> > > CC'd the list for additional feedback
> > >
> > > > * Propose a patch to fix OEOuthashBasic().
> > > >
> > > > In my humble opinion, the commit's hash shouldn't be included in
> > > > out-hash generation, it doesn't make sense. Unless I'm missing
> > > > something important - What are your thoughts?
> > > >
> > >
> > > Yes and no. It's not intentional, but a side effect of hash
> > > equivalency trying to make sure that the things it's marking as
> > > equivalent can actually be found in sstate (basically, because sstate
> > > include the commit hash, hash equivalency kinda has to include it).
> >
> > This all sounds a bit unfortunate.
> >
> > sstate only works as long as the filenames are predictable. Some
> > elements of the sstate filenames are essential to operation, e.g. the
> > package architecture since one input would result in multiple files
> > with the same hash in the filename of the output otherwise. The recipe
> > name and version are there mainly for debugging to allow someone to
> > more easily know where an sstate object came from and what it
> > represents. This is summarised by the comment in sstate.bbclass "Fields
> > 0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for information"
> > in generate_sstatefn().
> >
> > When we added hash equivalence, we added the ability to equate the
> > hashes but we'd not considered that the version string mismatch may
> > stop significant artefact reuse. I suspect at the time we reasoned that
> > if the version changes, the output probably does too.
> >
> > Sadly fixing this isn't simple. Changing the hash algorithm isn't
> > enough, we need to stop the SRCREV part of PV being used in the sstate
> > filename. If we stop that happening, the output hash algorithm may well
> > "just work" at that point, I'm not sure if it directly encodes PV or
> > just the sstate filename, hopefully the latter.
>
> It's including SSTATE_PKGSPEC in the output hash calculation,
> specifically so that a change in sstate filename changes the outhash
> (that way, hash equivalence will never map two difference sstate files
> as equivalent).
>
> >
> > The hard part is how to do this generically without adding a lot of
> > complex and potentially fragile code. The datastore context in that
> > function is the core configuration, not the recipe's datastore. The
> > options I can think of offhand:
> >
> > a) Live with the issue
> >
> > b) Put a hack into generate_sstatefn() which changed the PV element if
> > it matched a pattern but we'd have to do this for each SCM as needed.
> > Ugly but lowest overhead.
> >
> > c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV to drop
> > SRCPV to a fixed string. Nicer code in some ways but horrible
> > parsing/performance overhead since every recipe, even non-SCM ones
> > would be affected.
> >
> > d) A partial version of c) where recipes can set SSTATEPV to a function
> > if they need it. Solves your specific case with overhead without
> > affecting everyone else. Would not solve the issue generally without
> > manual user intervention.
> >
> > I'm not sure which of these will end up making the most sense. These
> > assume the output hash code uses the sstate filename and not PV. If it
> > uses PV there would be more work needed.
>
> I'm not seeing where SRCPV or SRCREV are being explicitly added to
> SSTATE_PKGSPEC; PV is added, but SRCPV or SRCREV would have to be
> manually added to PV in the recipe itself for that to happen correct?
> Maybe it's as simple as not doing that in these particular recipes?
>

What are pitfalls of not adding SRCPV to PV for packages with SRCREV:${PN} = ${AUTOREV}? Is it recommended just for improving traceability and debugging?
When I remove, bitbake is not happy - for example: if I remove explicit addition of SRCPV, the bitbake throws the following exception (need to dig deeper tp figure out why it happens only for certain packages)
raise bb.fetch2.FetchError("Recipe uses a floating tag/branch without a fixed SRCREV yet doesn't call bb.fetch2.get_srcrev() (use SRCPV in PV for OE).")
BR,
Max Chichikalov

>
> Cheers,
>
> Richard


[-- Attachment #2: Type: text/html, Size: 8914 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-07 14:41       ` Maksim Chichikalov
@ 2023-08-07 14:54         ` Joshua Watt
  2023-08-07 15:30           ` Maksim Chichikalov
  0 siblings, 1 reply; 8+ messages in thread
From: Joshua Watt @ 2023-08-07 14:54 UTC (permalink / raw)
  To: Maksim Chichikalov; +Cc: Richard Purdie, OE-core

On Mon, Aug 7, 2023 at 8:41 AM Maksim Chichikalov
<max.chichikalov@gmail.com> wrote:
>
> On Mon, Aug 7, 2023 at 4:01 AM Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> >
> > On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> > > On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> > > <max.chichikalov@gmail.com> wrote:
> > > > My name is Max; nice to e-meet you. I need your help with
> > > > Equivalence Server :)
> > > > I watched your presentation on Youtube - it 100% helped me to
> > > > understand the logic better and debug the issue we are facing.
> > > > Thank you for all your input into OE development.
> > > >
> > > > I found in OE repo that you introduced "def OEOuthashBasic()", so I
> > > > decided to write to you first before opening the topic on an email
> > > > list.
> > > >
> > > > The problem:
> > > >
> > > > Input data are propagated to the output hash generation function,
> > > > which generates a different out-hash.
> > > >
> > > > Description:
> > > >
> > > > All "_git.bb" recipes have to append SRCPV to PV. As a result, PV
> > > > is different on each commit.
> > > >
> > > > The OEOuthashBasic function includes SSTATE_PKGSPEC to generate
> > > > hash, which contains PV (PV contains git hash). As a result, there
> > > > is no way to generate the same out-hash even if changes introduced
> > > > within a commit were trivial.
> > > >
> > >
> > > Right, this is sort of on purpose, because the hash equivalence is
> > > basically trying to say that an sstate object can be used in place of
> > > another one, even when the task hashes aren't the same (but the
> > > output hashes are). However, the sstate code itself will only look
> > > for sstate object with a certain name (which include PV); hash
> > > equivalency does have _some_ control over the file name sstate looks
> > > for, since it will replace the taskhash portion of the name with the
> > > unified hash, but it doesn't have complete control.
> > >
> > > >
> > > > In our codebase, our components have API part, which is managed by
> > > > an independent recipe per component. The described above problem
> > > > caused the recompilation of all components dependent on API, even
> > > > in cases when API was not changed. CI for pull requests recompiles
> > > > mostly the entire code base, I need to do something with it.
> > > > (sorry, quite hard for me to explain it in a nutshell, let me know
> > > > if you like to know slightly more details)
> > > >
> > >
> > > Ya, sounds like a typical mono-repo design?
> > >
> > > >
> > > > I see a couple of options for us:
> > > > * Add a custom implementation of out-hash generated function and
> > > > overwrite SSTATE_HASHEQUIV_METHOD.
> > > > * Better understand why it's mandatory to append SRCPV to PV, and
> > > > maybe it's flexible in our cases to do it.
> > > >
> > >
> > > This might be the best option, at least for your recipes, but I've
> > > CC'd the list for additional feedback
> > >
> > > > * Propose a patch to fix OEOuthashBasic().
> > > >
> > > > In my humble opinion, the commit's hash shouldn't be included in
> > > > out-hash generation, it doesn't make sense. Unless I'm missing
> > > > something important - What are your thoughts?
> > > >
> > >
> > > Yes and no. It's not intentional, but a side effect of hash
> > > equivalency trying to make sure that the things it's marking as
> > > equivalent can actually be found in sstate (basically, because sstate
> > > include the commit hash, hash equivalency kinda has to include it).
> >
> > This all sounds a bit unfortunate.
> >
> > sstate only works as long as the filenames are predictable. Some
> > elements of the sstate filenames are essential to operation, e.g. the
> > package architecture since one input would result in multiple files
> > with the same hash in the filename of the output otherwise. The recipe
> > name and version are there mainly for debugging to allow someone to
> > more easily know where an sstate object came from and what it
> > represents. This is summarised by the comment in sstate.bbclass "Fields
> > 0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for information"
> > in generate_sstatefn().
> >
> > When we added hash equivalence, we added the ability to equate the
> > hashes but we'd not considered that the version string mismatch may
> > stop significant artefact reuse. I suspect at the time we reasoned that
> > if the version changes, the output probably does too.
> >
> > Sadly fixing this isn't simple. Changing the hash algorithm isn't
> > enough, we need to stop the SRCREV part of PV being used in the sstate
> > filename. If we stop that happening, the output hash algorithm may well
> > "just work" at that point, I'm not sure if it directly encodes PV or
> > just the sstate filename, hopefully the latter.
>
> It's including SSTATE_PKGSPEC in the output hash calculation,
> specifically so that a change in sstate filename changes the outhash
> (that way, hash equivalence will never map two difference sstate files
> as equivalent).
>
> >
> > The hard part is how to do this generically without adding a lot of
> > complex and potentially fragile code. The datastore context in that
> > function is the core configuration, not the recipe's datastore. The
> > options I can think of offhand:
> >
> > a) Live with the issue
> >
> > b) Put a hack into generate_sstatefn() which changed the PV element if
> > it matched a pattern but we'd have to do this for each SCM as needed.
> > Ugly but lowest overhead.
> >
> > c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV to drop
> > SRCPV to a fixed string. Nicer code in some ways but horrible
> > parsing/performance overhead since every recipe, even non-SCM ones
> > would be affected.
> >
> > d) A partial version of c) where recipes can set SSTATEPV to a function
> > if they need it. Solves your specific case with overhead without
> > affecting everyone else. Would not solve the issue generally without
> > manual user intervention.
> >
> > I'm not sure which of these will end up making the most sense. These
> > assume the output hash code uses the sstate filename and not PV. If it
> > uses PV there would be more work needed.
>
> I'm not seeing where SRCPV or SRCREV are being explicitly added to
> SSTATE_PKGSPEC; PV is added, but SRCPV or SRCREV would have to be
> manually added to PV in the recipe itself for that to happen correct?
> Maybe it's as simple as not doing that in these particular recipes?
>
>
> What are pitfalls of not adding SRCPV to PV for packages with SRCREV:${PN} = ${AUTOREV}? Is it recommended just for improving traceability and debugging?
>
> When I remove, bitbake is not happy - for example: if I remove explicit addition of SRCPV, the bitbake throws the following exception (need to dig deeper tp figure out why it happens only for certain packages)
>
> raise bb.fetch2.FetchError("Recipe uses a floating tag/branch without a fixed SRCREV yet doesn't call bb.fetch2.get_srcrev() (use SRCPV in PV for OE).")

Ah, I didn't realize you were using AUTOREV; I would recommend not
doing that and instead manually specifying the SRCREV you want to
build from instead; aside from giving a more reproducible history, I
think it means you can remove SRCPV from PV and hash equivalency will
work

At my $DAYJOB, we don't use AUTOREV. Instead, we have a (rather dumb)
script that makes a commit to update the SRCREV to match the current
head revision of the BRANCH specified in a recipe. We run this every
night for recipes where we want AUTOREV-like behavior. Upstream also
has the Auto-Upgrade-Helper (AUH) that does something similar if you
want to look at that too.

>
> BR,
> Max Chichikalov
>
> >
> > Cheers,
> >
> > Richard
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-07 14:54         ` Joshua Watt
@ 2023-08-07 15:30           ` Maksim Chichikalov
  2023-08-07 15:39             ` Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Maksim Chichikalov @ 2023-08-07 15:30 UTC (permalink / raw)
  To: Joshua Watt; +Cc: Richard Purdie, OE-core

[-- Attachment #1: Type: text/plain, Size: 8789 bytes --]

> On Mon, Aug 7, 2023 at 8:41 AM Maksim Chichikalov
> <max.chichikalov@gmail.com> wrote:
> >
> > On Mon, Aug 7, 2023 at 4:01 AM Richard Purdie
> > <richard.purdie@linuxfoundation.org> wrote:
> > >
> > > On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> > > > On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> > > > <max.chichikalov@gmail.com> wrote:
> > > > > My name is Max; nice to e-meet you. I need your help with
> > > > > Equivalence Server :)
> > > > > I watched your presentation on Youtube - it 100% helped me to
> > > > > understand the logic better and debug the issue we are facing.
> > > > > Thank you for all your input into OE development.
> > > > >
> > > > > I found in OE repo that you introduced "def OEOuthashBasic()", so I
> > > > > decided to write to you first before opening the topic on an email
> > > > > list.
> > > > >
> > > > > The problem:
> > > > >
> > > > > Input data are propagated to the output hash generation function,
> > > > > which generates a different out-hash.
> > > > >
> > > > > Description:
> > > > >
> > > > > All "_git.bb" recipes have to append SRCPV to PV. As a result, PV
> > > > > is different on each commit.
> > > > >
> > > > > The OEOuthashBasic function includes SSTATE_PKGSPEC to generate
> > > > > hash, which contains PV (PV contains git hash). As a result, there
> > > > > is no way to generate the same out-hash even if changes introduced
> > > > > within a commit were trivial.
> > > > >
> > > >
> > > > Right, this is sort of on purpose, because the hash equivalence is
> > > > basically trying to say that an sstate object can be used in place of
> > > > another one, even when the task hashes aren't the same (but the
> > > > output hashes are). However, the sstate code itself will only look
> > > > for sstate object with a certain name (which include PV); hash
> > > > equivalency does have _some_ control over the file name sstate looks
> > > > for, since it will replace the taskhash portion of the name with the
> > > > unified hash, but it doesn't have complete control.
> > > >
> > > > >
> > > > > In our codebase, our components have API part, which is managed by
> > > > > an independent recipe per component. The described above problem
> > > > > caused the recompilation of all components dependent on API, even
> > > > > in cases when API was not changed. CI for pull requests recompiles
> > > > > mostly the entire code base, I need to do something with it.
> > > > > (sorry, quite hard for me to explain it in a nutshell, let me know
> > > > > if you like to know slightly more details)
> > > > >
> > > >
> > > > Ya, sounds like a typical mono-repo design?
> > > >
> > > > >
> > > > > I see a couple of options for us:
> > > > > * Add a custom implementation of out-hash generated function and
> > > > > overwrite SSTATE_HASHEQUIV_METHOD.
> > > > > * Better understand why it's mandatory to append SRCPV to PV, and
> > > > > maybe it's flexible in our cases to do it.
> > > > >
> > > >
> > > > This might be the best option, at least for your recipes, but I've
> > > > CC'd the list for additional feedback
> > > >
> > > > > * Propose a patch to fix OEOuthashBasic().
> > > > >
> > > > > In my humble opinion, the commit's hash shouldn't be included in
> > > > > out-hash generation, it doesn't make sense. Unless I'm missing
> > > > > something important - What are your thoughts?
> > > > >
> > > >
> > > > Yes and no. It's not intentional, but a side effect of hash
> > > > equivalency trying to make sure that the things it's marking as
> > > > equivalent can actually be found in sstate (basically, because sstate
> > > > include the commit hash, hash equivalency kinda has to include it).
> > >
> > > This all sounds a bit unfortunate.
> > >
> > > sstate only works as long as the filenames are predictable. Some
> > > elements of the sstate filenames are essential to operation, e.g. the
> > > package architecture since one input would result in multiple files
> > > with the same hash in the filename of the output otherwise. The recipe
> > > name and version are there mainly for debugging to allow someone to
> > > more easily know where an sstate object came from and what it
> > > represents. This is summarised by the comment in sstate.bbclass "Fields
> > > 0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for information"
> > > in generate_sstatefn().
> > >
> > > When we added hash equivalence, we added the ability to equate the
> > > hashes but we'd not considered that the version string mismatch may
> > > stop significant artefact reuse. I suspect at the time we reasoned that
> > > if the version changes, the output probably does too.
> > >
> > > Sadly fixing this isn't simple. Changing the hash algorithm isn't
> > > enough, we need to stop the SRCREV part of PV being used in the sstate
> > > filename. If we stop that happening, the output hash algorithm may well
> > > "just work" at that point, I'm not sure if it directly encodes PV or
> > > just the sstate filename, hopefully the latter.
> >
> > It's including SSTATE_PKGSPEC in the output hash calculation,
> > specifically so that a change in sstate filename changes the outhash
> > (that way, hash equivalence will never map two difference sstate files
> > as equivalent).
> >
> > >
> > > The hard part is how to do this generically without adding a lot of
> > > complex and potentially fragile code. The datastore context in that
> > > function is the core configuration, not the recipe's datastore. The
> > > options I can think of offhand:
> > >
> > > a) Live with the issue
> > >
> > > b) Put a hack into generate_sstatefn() which changed the PV element if
> > > it matched a pattern but we'd have to do this for each SCM as needed.
> > > Ugly but lowest overhead.
> > >
> > > c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV to drop
> > > SRCPV to a fixed string. Nicer code in some ways but horrible
> > > parsing/performance overhead since every recipe, even non-SCM ones
> > > would be affected.
> > >
> > > d) A partial version of c) where recipes can set SSTATEPV to a function
> > > if they need it. Solves your specific case with overhead without
> > > affecting everyone else. Would not solve the issue generally without
> > > manual user intervention.
> > >
> > > I'm not sure which of these will end up making the most sense. These
> > > assume the output hash code uses the sstate filename and not PV. If it
> > > uses PV there would be more work needed.
> >
> > I'm not seeing where SRCPV or SRCREV are being explicitly added to
> > SSTATE_PKGSPEC; PV is added, but SRCPV or SRCREV would have to be
> > manually added to PV in the recipe itself for that to happen correct?
> > Maybe it's as simple as not doing that in these particular recipes?
> >
> >
> > What are pitfalls of not adding SRCPV to PV for packages with SRCREV:${PN} = ${AUTOREV}? Is it recommended just for improving traceability and debugging?
> >
> > When I remove, bitbake is not happy - for example: if I remove explicit addition of SRCPV, the bitbake throws the following exception (need to dig deeper tp figure out why it happens only for certain packages)
> >
> > raise bb.fetch2.FetchError("Recipe uses a floating tag/branch without a fixed SRCREV yet doesn't call bb.fetch2.get_srcrev() (use SRCPV in PV for OE).")
>
> Ah, I didn't realize you were using AUTOREV; I would recommend not
> doing that and instead manually specifying the SRCREV you want to
> build from instead; aside from giving a more reproducible history, I
> think it means you can remove SRCPV from PV and hash equivalency will
> work
>
> At my $DAYJOB, we don't use AUTOREV. Instead, we have a (rather dumb)
> script that makes a commit to update the SRCREV to match the current
> head revision of the BRANCH specified in a recipe. We run this every
> night for recipes where we want AUTOREV-like behavior. Upstream also
> has the Auto-Upgrade-Helper (AUH) that does something similar if you
> want to look at that too.
>

TY Joshua!
AUTOREV is in use for pull request builds, when a package(s -> multi PR builds) should be built from the latest commit of the specific branch. I don't think we can stop using it, unless we add some mechanism of setting SRCREV to hash of the latest commit on each build invocation.

Richard, what are your thoughts about:
What are pitfalls of not adding SRCPV to PV for packages with SRCREV:${PN} = ${AUTOREV}?

Is it recommended just for improving traceability and debugging, or it has some bitabke/OE business logic implication?

BR,
Max Chichikalov

>
> BR,
> Max Chichikalov
>
> >
> > Cheers,
> >
> > Richard
>


[-- Attachment #2: Type: text/html, Size: 11294 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-07 15:30           ` Maksim Chichikalov
@ 2023-08-07 15:39             ` Richard Purdie
  2023-08-08 13:16               ` Maksim Chichikalov
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2023-08-07 15:39 UTC (permalink / raw)
  To: Maksim Chichikalov, Joshua Watt; +Cc: OE-core

On Mon, 2023-08-07 at 11:30 -0400, Maksim Chichikalov wrote:
> > On Mon, Aug 7, 2023 at 8:41 AM Maksim Chichikalov
> > <max.chichikalov@gmail.com> wrote:
> > > 
> > > On Mon, Aug 7, 2023 at 4:01 AM Richard Purdie
> > > <richard.purdie@linuxfoundation.org> wrote:
> > > > 
> > > > On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> > > > > On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> > > > > <max.chichikalov@gmail.com> wrote:
> > > > > > My name is Max; nice to e-meet you. I need your help with
> > > > > > Equivalence Server :)
> > > > > > I watched your presentation on Youtube - it 100% helped me
> > > > > > to
> > > > > > understand the logic better and debug the issue we are
> > > > > > facing.
> > > > > > Thank you for all your input into OE development.
> > > > > > 
> > > > > > I found in OE repo that you introduced "def
> > > > > > OEOuthashBasic()", so I
> > > > > > decided to write to you first before opening the topic on
> > > > > > an email
> > > > > > list.
> > > > > > 
> > > > > > The problem:
> > > > > > 
> > > > > > Input data are propagated to the output hash generation
> > > > > > function,
> > > > > > which generates a different out-hash.
> > > > > > 
> > > > > > Description:
> > > > > > 
> > > > > > All "_git.bb" recipes have to append SRCPV to PV. As a
> > > > > > result, PV
> > > > > > is different on each commit.
> > > > > > 
> > > > > > The OEOuthashBasic function includes SSTATE_PKGSPEC to
> > > > > > generate
> > > > > > hash, which contains PV (PV contains git hash). As a
> > > > > > result, there
> > > > > > is no way to generate the same out-hash even if changes
> > > > > > introduced
> > > > > > within a commit were trivial.
> > > > > > 
> > > > > 
> > > > > Right, this is sort of on purpose, because the hash
> > > > > equivalence is
> > > > > basically trying to say that an sstate object can be used in
> > > > > place of
> > > > > another one, even when the task hashes aren't the same (but
> > > > > the
> > > > > output hashes are). However, the sstate code itself will only
> > > > > look
> > > > > for sstate object with a certain name (which include PV);
> > > > > hash
> > > > > equivalency does have _some_ control over the file name
> > > > > sstate looks
> > > > > for, since it will replace the taskhash portion of the name
> > > > > with the
> > > > > unified hash, but it doesn't have complete control.
> > > > > 
> > > > > > 
> > > > > > In our codebase, our components have API part, which is
> > > > > > managed by
> > > > > > an independent recipe per component. The described above
> > > > > > problem
> > > > > > caused the recompilation of all components dependent on
> > > > > > API, even
> > > > > > in cases when API was not changed. CI for pull requests
> > > > > > recompiles
> > > > > > mostly the entire code base, I need to do something with
> > > > > > it.
> > > > > > (sorry, quite hard for me to explain it in a nutshell, let
> > > > > > me know
> > > > > > if you like to know slightly more details)
> > > > > > 
> > > > > 
> > > > > Ya, sounds like a typical mono-repo design?
> > > > > 
> > > > > > 
> > > > > > I see a couple of options for us:
> > > > > > * Add a custom implementation of out-hash generated
> > > > > > function and
> > > > > > overwrite SSTATE_HASHEQUIV_METHOD.
> > > > > > * Better understand why it's mandatory to append SRCPV to
> > > > > > PV, and
> > > > > > maybe it's flexible in our cases to do it.
> > > > > > 
> > > > > 
> > > > > This might be the best option, at least for your recipes, but
> > > > > I've
> > > > > CC'd the list for additional feedback
> > > > > 
> > > > > > * Propose a patch to fix OEOuthashBasic().
> > > > > > 
> > > > > > In my humble opinion, the commit's hash shouldn't be
> > > > > > included in
> > > > > > out-hash generation, it doesn't make sense. Unless I'm
> > > > > > missing
> > > > > > something important - What are your thoughts?
> > > > > > 
> > > > > 
> > > > > Yes and no. It's not intentional, but a side effect of hash
> > > > > equivalency trying to make sure that the things it's marking
> > > > > as
> > > > > equivalent can actually be found in sstate (basically,
> > > > > because sstate
> > > > > include the commit hash, hash equivalency kinda has to
> > > > > include it).
> > > > 
> > > > This all sounds a bit unfortunate.
> > > > 
> > > > sstate only works as long as the filenames are predictable.
> > > > Some
> > > > elements of the sstate filenames are essential to operation,
> > > > e.g. the
> > > > package architecture since one input would result in multiple
> > > > files
> > > > with the same hash in the filename of the output otherwise. The
> > > > recipe
> > > > name and version are there mainly for debugging to allow
> > > > someone to
> > > > more easily know where an sstate object came from and what it
> > > > represents. This is summarised by the comment in sstate.bbclass
> > > > "Fields
> > > > 0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for
> > > > information"
> > > > in generate_sstatefn().
> > > > 
> > > > When we added hash equivalence, we added the ability to equate
> > > > the
> > > > hashes but we'd not considered that the version string mismatch
> > > > may
> > > > stop significant artefact reuse. I suspect at the time we
> > > > reasoned that
> > > > if the version changes, the output probably does too.
> > > > 
> > > > Sadly fixing this isn't simple. Changing the hash algorithm
> > > > isn't
> > > > enough, we need to stop the SRCREV part of PV being used in the
> > > > sstate
> > > > filename. If we stop that happening, the output hash algorithm
> > > > may well
> > > > "just work" at that point, I'm not sure if it directly encodes
> > > > PV or
> > > > just the sstate filename, hopefully the latter.
> > > 
> > > It's including SSTATE_PKGSPEC in the output hash calculation,
> > > specifically so that a change in sstate filename changes the
> > > outhash
> > > (that way, hash equivalence will never map two difference sstate
> > > files
> > > as equivalent).
> > > 
> > > > 
> > > > The hard part is how to do this generically without adding a
> > > > lot of
> > > > complex and potentially fragile code. The datastore context in
> > > > that
> > > > function is the core configuration, not the recipe's datastore.
> > > > The
> > > > options I can think of offhand:
> > > > 
> > > > a) Live with the issue
> > > > 
> > > > b) Put a hack into generate_sstatefn() which changed the PV
> > > > element if
> > > > it matched a pattern but we'd have to do this for each SCM as
> > > > needed.
> > > > Ugly but lowest overhead.
> > > > 
> > > > c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV
> > > > to drop
> > > > SRCPV to a fixed string. Nicer code in some ways but horrible
> > > > parsing/performance overhead since every recipe, even non-SCM
> > > > ones
> > > > would be affected.
> > > > 
> > > > d) A partial version of c) where recipes can set SSTATEPV to a
> > > > function
> > > > if they need it. Solves your specific case with overhead
> > > > without
> > > > affecting everyone else. Would not solve the issue generally
> > > > without
> > > > manual user intervention.
> > > > 
> > > > I'm not sure which of these will end up making the most sense.
> > > > These
> > > > assume the output hash code uses the sstate filename and not
> > > > PV. If it
> > > > uses PV there would be more work needed.
> > > 
> > > I'm not seeing where SRCPV or SRCREV are being explicitly added
> > > to
> > > SSTATE_PKGSPEC; PV is added, but SRCPV or SRCREV would have to be
> > > manually added to PV in the recipe itself for that to happen
> > > correct?
> > > Maybe it's as simple as not doing that in these particular
> > > recipes?
> > > 
> > > 
> > > What are pitfalls of not adding SRCPV to PV for packages with
> > > SRCREV:${PN} = ${AUTOREV}? Is it recommended just for improving
> > > traceability and debugging?
> > > 
> > > When I remove, bitbake is not happy - for example: if I remove
> > > explicit addition of SRCPV, the bitbake throws the following
> > > exception (need to dig deeper tp figure out why it happens only
> > > for certain packages)
> > > 
> > > raise bb.fetch2.FetchError("Recipe uses a floating tag/branch
> > > without a fixed SRCREV yet doesn't call bb.fetch2.get_srcrev()
> > > (use SRCPV in PV for OE).")
> > 
> > Ah, I didn't realize you were using AUTOREV; I would recommend not
> > doing that and instead manually specifying the SRCREV you want to
> > build from instead; aside from giving a more reproducible history,
> > I
> > think it means you can remove SRCPV from PV and hash equivalency
> > will
> > work
> > 
> > At my $DAYJOB, we don't use AUTOREV. Instead, we have a (rather
> > dumb)
> > script that makes a commit to update the SRCREV to match the
> > current
> > head revision of the BRANCH specified in a recipe. We run this
> > every
> > night for recipes where we want AUTOREV-like behavior. Upstream
> > also
> > has the Auto-Upgrade-Helper (AUH) that does something similar if
> > you
> > want to look at that too.
> 
> TY Joshua!
> AUTOREV is in use for pull request builds, when a package(s -> multi
> PR builds) should be built from the latest commit of the specific
> branch. I don't think we can stop using it, unless we add some
> mechanism of setting SRCREV to hash of the latest commit on each
> build invocation.
> 
> Richard, what are your thoughts about:
>  * What are pitfalls of not adding SRCPV to PV for packages with
> SRCREV:${PN} = ${AUTOREV}?

If you don't add SRCREV (via SRCPV) to PV, PV doesn't change when the
source control revision changes.

If the version doesn't change, how does bitbake know to rebuild? How do
packages get the right version in them so upgrades work?

I guess in modern times with hashes in task stamps, SRCREV can be
included directly into the task hashes so that part of this
functionality is partly obsolete. Reworking things to adapt to that
would be interesting but a fair amount of change/work.

>  * Is it recommended just for improving traceability and debugging,
> or it has some bitabke/OE business logic implication?

As it stands today, it is relied upon to trigger rebuilds with a number
of follow on expectations.

I'm not against changing this as when I step back and think about it,
the reasoning for some of what we do is obsolete and we have better
mechanisms for it now. It will be a lot of work to change though.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [OE] help with Equivalence Server
  2023-08-07 15:39             ` Richard Purdie
@ 2023-08-08 13:16               ` Maksim Chichikalov
  0 siblings, 0 replies; 8+ messages in thread
From: Maksim Chichikalov @ 2023-08-08 13:16 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Joshua Watt, OE-core

[-- Attachment #1: Type: text/plain, Size: 11939 bytes --]

> On Mon, 2023-08-07 at 11:30 -0400, Maksim Chichikalov wrote:
> > > On Mon, Aug 7, 2023 at 8:41 AM Maksim Chichikalov
> > > <max.chichikalov@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 7, 2023 at 4:01 AM Richard Purdie
> > > > <richard.purdie@linuxfoundation.org> wrote:
> > > > >
> > > > > On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> > > > > > On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> > > > > > <max.chichikalov@gmail.com> wrote:
> > > > > > > My name is Max; nice to e-meet you. I need your help with
> > > > > > > Equivalence Server :)
> > > > > > > I watched your presentation on Youtube - it 100% helped me
> > > > > > > to
> > > > > > > understand the logic better and debug the issue we are
> > > > > > > facing.
> > > > > > > Thank you for all your input into OE development.
> > > > > > >
> > > > > > > I found in OE repo that you introduced "def
> > > > > > > OEOuthashBasic()", so I
> > > > > > > decided to write to you first before opening the topic on
> > > > > > > an email
> > > > > > > list.
> > > > > > >
> > > > > > > The problem:
> > > > > > >
> > > > > > > Input data are propagated to the output hash generation
> > > > > > > function,
> > > > > > > which generates a different out-hash.
> > > > > > >
> > > > > > > Description:
> > > > > > >
> > > > > > > All "_git.bb" recipes have to append SRCPV to PV. As a
> > > > > > > result, PV
> > > > > > > is different on each commit.
> > > > > > >
> > > > > > > The OEOuthashBasic function includes SSTATE_PKGSPEC to
> > > > > > > generate
> > > > > > > hash, which contains PV (PV contains git hash). As a
> > > > > > > result, there
> > > > > > > is no way to generate the same out-hash even if changes
> > > > > > > introduced
> > > > > > > within a commit were trivial.
> > > > > > >
> > > > > >
> > > > > > Right, this is sort of on purpose, because the hash
> > > > > > equivalence is
> > > > > > basically trying to say that an sstate object can be used in
> > > > > > place of
> > > > > > another one, even when the task hashes aren't the same (but
> > > > > > the
> > > > > > output hashes are). However, the sstate code itself will only
> > > > > > look
> > > > > > for sstate object with a certain name (which include PV);
> > > > > > hash
> > > > > > equivalency does have _some_ control over the file name
> > > > > > sstate looks
> > > > > > for, since it will replace the taskhash portion of the name
> > > > > > with the
> > > > > > unified hash, but it doesn't have complete control.
> > > > > >
> > > > > > >
> > > > > > > In our codebase, our components have API part, which is
> > > > > > > managed by
> > > > > > > an independent recipe per component. The described above
> > > > > > > problem
> > > > > > > caused the recompilation of all components dependent on
> > > > > > > API, even
> > > > > > > in cases when API was not changed. CI for pull requests
> > > > > > > recompiles
> > > > > > > mostly the entire code base, I need to do something with
> > > > > > > it.
> > > > > > > (sorry, quite hard for me to explain it in a nutshell, let
> > > > > > > me know
> > > > > > > if you like to know slightly more details)
> > > > > > >
> > > > > >
> > > > > > Ya, sounds like a typical mono-repo design?
> > > > > >
> > > > > > >
> > > > > > > I see a couple of options for us:
> > > > > > > * Add a custom implementation of out-hash generated
> > > > > > > function and
> > > > > > > overwrite SSTATE_HASHEQUIV_METHOD.
> > > > > > > * Better understand why it's mandatory to append SRCPV to
> > > > > > > PV, and
> > > > > > > maybe it's flexible in our cases to do it.
> > > > > > >
> > > > > >
> > > > > > This might be the best option, at least for your recipes, but
> > > > > > I've
> > > > > > CC'd the list for additional feedback
> > > > > >
> > > > > > > * Propose a patch to fix OEOuthashBasic().
> > > > > > >
> > > > > > > In my humble opinion, the commit's hash shouldn't be
> > > > > > > included in
> > > > > > > out-hash generation, it doesn't make sense. Unless I'm
> > > > > > > missing
> > > > > > > something important - What are your thoughts?
> > > > > > >
> > > > > >
> > > > > > Yes and no. It's not intentional, but a side effect of hash
> > > > > > equivalency trying to make sure that the things it's marking
> > > > > > as
> > > > > > equivalent can actually be found in sstate (basically,
> > > > > > because sstate
> > > > > > include the commit hash, hash equivalency kinda has to
> > > > > > include it).
> > > > >
> > > > > This all sounds a bit unfortunate.
> > > > >
> > > > > sstate only works as long as the filenames are predictable.
> > > > > Some
> > > > > elements of the sstate filenames are essential to operation,
> > > > > e.g. the
> > > > > package architecture since one input would result in multiple
> > > > > files
> > > > > with the same hash in the filename of the output otherwise. The
> > > > > recipe
> > > > > name and version are there mainly for debugging to allow
> > > > > someone to
> > > > > more easily know where an sstate object came from and what it
> > > > > represents. This is summarised by the comment in sstate.bbclass
> > > > > "Fields
> > > > > 0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for
> > > > > information"
> > > > > in generate_sstatefn().
> > > > >
> > > > > When we added hash equivalence, we added the ability to equate
> > > > > the
> > > > > hashes but we'd not considered that the version string mismatch
> > > > > may
> > > > > stop significant artefact reuse. I suspect at the time we
> > > > > reasoned that
> > > > > if the version changes, the output probably does too.
> > > > >
> > > > > Sadly fixing this isn't simple. Changing the hash algorithm
> > > > > isn't
> > > > > enough, we need to stop the SRCREV part of PV being used in the
> > > > > sstate
> > > > > filename. If we stop that happening, the output hash algorithm
> > > > > may well
> > > > > "just work" at that point, I'm not sure if it directly encodes
> > > > > PV or
> > > > > just the sstate filename, hopefully the latter.
> > > >
> > > > It's including SSTATE_PKGSPEC in the output hash calculation,
> > > > specifically so that a change in sstate filename changes the
> > > > outhash
> > > > (that way, hash equivalence will never map two difference sstate
> > > > files
> > > > as equivalent).
> > > >
> > > > >
> > > > > The hard part is how to do this generically without adding a
> > > > > lot of
> > > > > complex and potentially fragile code. The datastore context in
> > > > > that
> > > > > function is the core configuration, not the recipe's datastore.
> > > > > The
> > > > > options I can think of offhand:
> > > > >
> > > > > a) Live with the issue
> > > > >
> > > > > b) Put a hack into generate_sstatefn() which changed the PV
> > > > > element if
> > > > > it matched a pattern but we'd have to do this for each SCM as
> > > > > needed.
> > > > > Ugly but lowest overhead.
> > > > >
> > > > > c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV
> > > > > to drop
> > > > > SRCPV to a fixed string. Nicer code in some ways but horrible
> > > > > parsing/performance overhead since every recipe, even non-SCM
> > > > > ones
> > > > > would be affected.
> > > > >
> > > > > d) A partial version of c) where recipes can set SSTATEPV to a
> > > > > function
> > > > > if they need it. Solves your specific case with overhead
> > > > > without
> > > > > affecting everyone else. Would not solve the issue generally
> > > > > without
> > > > > manual user intervention.
> > > > >
> > > > > I'm not sure which of these will end up making the most sense.
> > > > > These
> > > > > assume the output hash code uses the sstate filename and not
> > > > > PV. If it
> > > > > uses PV there would be more work needed.
> > > >
> > > > I'm not seeing where SRCPV or SRCREV are being explicitly added
> > > > to
> > > > SSTATE_PKGSPEC; PV is added, but SRCPV or SRCREV would have to be
> > > > manually added to PV in the recipe itself for that to happen
> > > > correct?
> > > > Maybe it's as simple as not doing that in these particular
> > > > recipes?
> > > >
> > > >
> > > > What are pitfalls of not adding SRCPV to PV for packages with
> > > > SRCREV:${PN} = ${AUTOREV}? Is it recommended just for improving
> > > > traceability and debugging?
> > > >
> > > > When I remove, bitbake is not happy - for example: if I remove
> > > > explicit addition of SRCPV, the bitbake throws the following
> > > > exception (need to dig deeper tp figure out why it happens only
> > > > for certain packages)
> > > >
> > > > raise bb.fetch2.FetchError("Recipe uses a floating tag/branch
> > > > without a fixed SRCREV yet doesn't call bb.fetch2.get_srcrev()
> > > > (use SRCPV in PV for OE).")
> > >
> > > Ah, I didn't realize you were using AUTOREV; I would recommend not
> > > doing that and instead manually specifying the SRCREV you want to
> > > build from instead; aside from giving a more reproducible history,
> > > I
> > > think it means you can remove SRCPV from PV and hash equivalency
> > > will
> > > work
> > >
> > > At my $DAYJOB, we don't use AUTOREV. Instead, we have a (rather
> > > dumb)
> > > script that makes a commit to update the SRCREV to match the
> > > current
> > > head revision of the BRANCH specified in a recipe. We run this
> > > every
> > > night for recipes where we want AUTOREV-like behavior. Upstream
> > > also
> > > has the Auto-Upgrade-Helper (AUH) that does something similar if
> > > you
> > > want to look at that too.
> >
> > TY Joshua!
> > AUTOREV is in use for pull request builds, when a package(s -> multi
> > PR builds) should be built from the latest commit of the specific
> > branch. I don't think we can stop using it, unless we add some
> > mechanism of setting SRCREV to hash of the latest commit on each
> > build invocation.
> >
> > Richard, what are your thoughts about:
> > * What are pitfalls of not adding SRCPV to PV for packages with
> > SRCREV:${PN} = ${AUTOREV}?
>
> If you don't add SRCREV (via SRCPV) to PV, PV doesn't change when the
> source control revision changes.
>
> If the version doesn't change, how does bitbake know to rebuild? How do
> packages get the right version in them so upgrades work?
>
> I guess in modern times with hashes in task stamps, SRCREV can be
> included directly into the task hashes so that part of this
> functionality is partly obsolete. Reworking things to adapt to that
> would be interesting but a fair amount of change/work.
>
> > * Is it recommended just for improving traceability and debugging,
> > or it has some bitabke/OE business logic implication?
>
> As it stands today, it is relied upon to trigger rebuilds with a number
> of follow on expectations.
>
> I'm not against changing this as when I step back and think about it,
> the reasoning for some of what we do is obsolete and we have better
> mechanisms for it now. It will be a lot of work to change though.
>
> Cheers,
> Richard

Richard, Joshua, thank you so much for being responsive and helping to understand the situation better. I need time to grasp it, consult with my colleagues, and decide how to overcome it.
JFYI, I observed it, and was under the impression that bitbake rebuilds even if SRCPV is not part of PV for packages using AUTORV. Because of it, I decided to try it out, and yes, bitbake fetches updated code (new commits) and recompiling package - not sure how it works and what information it's using :) Based on details you shared above, it shouldn't. Maybe it's some sort of side effect of something else in my environment.
Decided to share it with you.

I have a question about the discussion above - Do you guys want me to create a bug in the Bugzila?
BR,
Max


[-- Attachment #2: Type: text/html, Size: 17312 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-08 13:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <88B62901-053C-4E95-9896-868EE16A4467@getmailspring.com>
2023-08-06 19:15 ` [OE] help with Equivalence Server Joshua Watt
2023-08-07 10:01   ` [OE-core] " Richard Purdie
2023-08-07 14:15     ` Joshua Watt
2023-08-07 14:41       ` Maksim Chichikalov
2023-08-07 14:54         ` Joshua Watt
2023-08-07 15:30           ` Maksim Chichikalov
2023-08-07 15:39             ` Richard Purdie
2023-08-08 13:16               ` Maksim Chichikalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.