All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/1] initial documentation for hash equivalence
@ 2021-06-17 19:55 Michael Opdenacker
  2021-06-17 19:55 ` [RFC 1/1] overview-manual: " Michael Opdenacker
  2021-06-17 20:47 ` [docs] [RFC 0/1] " Richard Purdie
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Opdenacker @ 2021-06-17 19:55 UTC (permalink / raw)
  To: JPEWhacker, docs; +Cc: Michael Opdenacker

Greetings

This is a first documentation attempt for hash equivalence.
I need to know if I'm headed in the right direction before expanding
the text if needed.

So, I would be interested in your opinion about:

- Is this the right level of detail?
  Don't forget that this is turned on by default in Poky
  and therefore easy to use.

  Would I need to add more details about the implementation,
  such as using explaining unihashes and adding diagrams such
  as the ones in Joshua Watt's presentation on the topic?

- It this the right place to add such documentation?

- Doesn't this need a more thorough update of the whole
  documentation about share state cache to give the new
  full picture right away, rather than adding this new
  feature later in the manual ?

- Anything else that I should mention?

- Anything wrong or that I misunderstood?

Michael Opdenacker (1):
  overview-manual: initial documentation for hash equivalence

 documentation/overview-manual/concepts.rst | 46 ++++++++++++++++++++++
 1 file changed, 46 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC 1/1] overview-manual: initial documentation for hash equivalence
  2021-06-17 19:55 [RFC 0/1] initial documentation for hash equivalence Michael Opdenacker
@ 2021-06-17 19:55 ` Michael Opdenacker
  2021-06-17 20:40   ` [docs] " Richard Purdie
  2021-06-17 20:47 ` [docs] [RFC 0/1] " Richard Purdie
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Opdenacker @ 2021-06-17 19:55 UTC (permalink / raw)
  To: JPEWhacker, docs; +Cc: Michael Opdenacker

Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
---
 documentation/overview-manual/concepts.rst | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
index ab882ff778..bb8318ea46 100644
--- a/documentation/overview-manual/concepts.rst
+++ b/documentation/overview-manual/concepts.rst
@@ -1950,6 +1950,69 @@ another reason why a task-based approach is preferred over a
 recipe-based approach, which would have to install the output from every
 task.
 
+Hash Equivalence
+----------------
+
+The above section explained how BitBake skips the execution of tasks
+which output can already be found in the Shared State Cache.
+
+BitBake can go beyond this and also skip the execution of tasks that,
+even though the hash for their dependencies has changed, still generate
+identical output.
+
+So, even if two executions of a given task can have a different hash,
+because of potentially different sources, different environment variables
+or different output hashes for the tasks they depend on, they can be considered
+as "equivalent" as long as they generate the same output hash.
+
+Thanks to this equivalence, a change in one of the tasks in BitBake's run queue
+doesn't have to propagate to all the downstream tasks that depend on the output
+of this task, causing a full rebuild of such tasks, and so on with the next
+depending tasks. Instead, BitBake can safely retrieve all the downstream
+task output from the Shared State Cache.
+
+This applies to multiple scenarios:
+
+-  A "trivial" change to a recipe that doesn't impact its generated output,
+   such as whitespace changes, modifications to unused code paths or
+   in the ordering of variables.
+
+-  Shared library updates, for example to fix a security vulnerability.
+   For sure, the programs using such a library should be rebuilt, but
+   their new binaries should remain identical. The corresponding tasks should
+   have a different output hash because of the change in the hash of their
+   library dependency, but thanks to their output being identical, hash
+   equivalence will stop the propagation down the dependency chain.
+
+-  Native tool updates. Though the depending tasks should be rebuilt,
+   it's likely that they will generate the same output and be marked
+   as equivalent.
+
+This mechanism is enabled by default in Poky, and is controlled by two
+variables:
+
+-  :term:`bitbake:BB_HASHSERVE`, specifying a local or remote hash
+   equivalence server to use.
+
+-  :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set  to ``OEEquivHash``.
+
+Therefore, the default configuration in Poky corresponds to the
+below settings::
+
+   BB_HASHSERVE = "auto"
+   BB_SIGNATURE_HANDLER = "OEEquivHash"
+
+Another possibility is to share a hash equivalence server on a network,
+by setting::
+
+   BB_HASHSERVE = "<HOSTNAME>:<PORT>"
+
+.. note::
+
+   The hash equivalence server needs to be maintained together with the
+   share state cache. Otherwise, the server could report shared state hashes
+   that do not exist.
+
 Automatically Added Runtime Dependencies
 ========================================
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [docs] [RFC 1/1] overview-manual: initial documentation for hash equivalence
  2021-06-17 19:55 ` [RFC 1/1] overview-manual: " Michael Opdenacker
@ 2021-06-17 20:40   ` Richard Purdie
  2021-06-18 16:55     ` Michael Opdenacker
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Purdie @ 2021-06-17 20:40 UTC (permalink / raw)
  To: Michael Opdenacker, JPEWhacker, docs

Hi Michael,

Thanks for starting this. I don't think the explanation is as clear as it 
needs to be. I've added some suggestions below on how to improve that. I'd 
also comment that when describing this, you need to be specific about which 
hash you refer to in all case. The task's hash for sstate purposes or the 
hash of it's output for example. There are too many hashes!

On Thu, 2021-06-17 at 21:55 +0200, Michael Opdenacker wrote:
> Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
> ---
>  documentation/overview-manual/concepts.rst | 63 ++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
> index ab882ff778..bb8318ea46 100644
> --- a/documentation/overview-manual/concepts.rst
> +++ b/documentation/overview-manual/concepts.rst
> @@ -1950,6 +1950,69 @@ another reason why a task-based approach is preferred over a
>  recipe-based approach, which would have to install the output from every
>  task.
>  
> 
> +Hash Equivalence
> +----------------
> +
> +The above section explained how BitBake skips the execution of tasks
> +which output can already be found in the Shared State Cache.

I think here it may make sense to next define "equivalence", something like:

During a build, it may often be the case that the output/result of a task might 
be unchanged despite changes in the task's input values. An example might be 
whitespace changes some input C code. In project terms, this is what we define 
as "equivalence". We can create a hash/checksum which represents a task and two 
input task hashes are said to be equivalent if the hash of the generated output 
(as stored/restored by sstate) is the same.

You can then go on to say that this means:

> +BitBake can go beyond this and also skip the execution of tasks that,
> +even though the hash for their dependencies has changed, still generate
> +identical output.
> +
> +So, even if two executions of a given task can have a different hash,
> +because of potentially different sources, different environment variables
> +or different output hashes for the tasks they depend on, they can be considered
> +as "equivalent" as long as they generate the same output hash.

Once bitbake knows that two input hashes for a task have equivalent output, 
this has important and useful implications for all tasks depending on this task.


> +Thanks to this equivalence, a change in one of the tasks in BitBake's run queue
> +doesn't have to propagate to all the downstream tasks that depend on the output
> +of this task, causing a full rebuild of such tasks, and so on with the next
> +depending tasks. Instead, BitBake can safely retrieve all the downstream
> +task output from the Shared State Cache.
> +
> +This applies to multiple scenarios:
> +
> +-  A "trivial" change to a recipe that doesn't impact its generated output,
> +   such as whitespace changes, modifications to unused code paths or
> +   in the ordering of variables.
> +
> +-  Shared library updates, for example to fix a security vulnerability.
> +   For sure, the programs using such a library should be rebuilt, but
> +   their new binaries should remain identical. The corresponding tasks should
> +   have a different output hash because of the change in the hash of their
> +   library dependency, but thanks to their output being identical, hash
> +   equivalence will stop the propagation down the dependency chain.
> +
> +-  Native tool updates. Though the depending tasks should be rebuilt,
> +   it's likely that they will generate the same output and be marked
> +   as equivalent.
> +
> +This mechanism is enabled by default in Poky, and is controlled by two
> +variables:
> +
> +-  :term:`bitbake:BB_HASHSERVE`, specifying a local or remote hash
> +   equivalence server to use.
> +
> +-  :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set  to ``OEEquivHash``.
> +
> +Therefore, the default configuration in Poky corresponds to the
> +below settings::
> +
> +   BB_HASHSERVE = "auto"
> +   BB_SIGNATURE_HANDLER = "OEEquivHash"
> +
> +Another possibility is to share a hash equivalence server on a network,
> +by setting::
> +
> +   BB_HASHSERVE = "<HOSTNAME>:<PORT>"
> +
> +.. note::
> +
> +   The hash equivalence server needs to be maintained together with the
> +   share state cache. Otherwise, the server could report shared state hashes
> +   that do not exist.

We therefore recommend that one hash equivalence server be setup to correspond 
with a given sstate cache.


Cheers,

Richard


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [docs] [RFC 0/1] initial documentation for hash equivalence
  2021-06-17 19:55 [RFC 0/1] initial documentation for hash equivalence Michael Opdenacker
  2021-06-17 19:55 ` [RFC 1/1] overview-manual: " Michael Opdenacker
@ 2021-06-17 20:47 ` Richard Purdie
  2021-06-18 16:59   ` Michael Opdenacker
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Purdie @ 2021-06-17 20:47 UTC (permalink / raw)
  To: Michael Opdenacker, JPEWhacker, docs

On Thu, 2021-06-17 at 21:55 +0200, Michael Opdenacker wrote:
> Greetings
> 
> This is a first documentation attempt for hash equivalence.
> I need to know if I'm headed in the right direction before expanding
> the text if needed.
> 
> So, I would be interested in your opinion about:
> 
> - Is this the right level of detail?
>   Don't forget that this is turned on by default in Poky
>   and therefore easy to use.

We definitely need a high level overview so people can understand what it 
is doing. I suspect we'll need this and some more in depth docs too.

>   Would I need to add more details about the implementation,
>   such as using explaining unihashes and adding diagrams such
>   as the ones in Joshua Watt's presentation on the topic?

People will need to know more about it so I do think further manual 
sections on topics like that will be needed, yes.

> - It this the right place to add such documentation?

I think this piece is fine at the level and place it is at.

> - Doesn't this need a more thorough update of the whole
>   documentation about share state cache to give the new
>   full picture right away, rather than adding this new
>   feature later in the manual ?

We do need to update other sections and tie this in, yes.

> - Anything else that I should mention?

There is a read only mode and pass through for the server which will be 
needed. We need to document how to launch a standalone server instance?


> - Anything wrong or that I misunderstood?

Nothing wrong as such but it wasn't as clear as I think it needs to be.
I replied to the email directly about that.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [docs] [RFC 1/1] overview-manual: initial documentation for hash equivalence
  2021-06-17 20:40   ` [docs] " Richard Purdie
@ 2021-06-18 16:55     ` Michael Opdenacker
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Opdenacker @ 2021-06-18 16:55 UTC (permalink / raw)
  To: Richard Purdie, JPEWhacker, docs

Hi Richard,

Many thanks for your review and for your guidance.

On 6/17/21 10:40 PM, Richard Purdie wrote:
> Hi Michael,
>
> Thanks for starting this. I don't think the explanation is as clear as it 
> needs to be. I've added some suggestions below on how to improve that. I'd 
> also comment that when describing this, you need to be specific about which 
> hash you refer to in all case. The task's hash for sstate purposes or the 
> hash of it's output for example. There are too many hashes!
>
> On Thu, 2021-06-17 at 21:55 +0200, Michael Opdenacker wrote:
>> Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
>> ---
>>  documentation/overview-manual/concepts.rst | 63 ++++++++++++++++++++++
>>  1 file changed, 63 insertions(+)
>>
>> diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
>> index ab882ff778..bb8318ea46 100644
>> --- a/documentation/overview-manual/concepts.rst
>> +++ b/documentation/overview-manual/concepts.rst
>> @@ -1950,6 +1950,69 @@ another reason why a task-based approach is preferred over a
>>  recipe-based approach, which would have to install the output from every
>>  task.
>>  
>>
>> +Hash Equivalence
>> +----------------
>> +
>> +The above section explained how BitBake skips the execution of tasks
>> +which output can already be found in the Shared State Cache.
> I think here it may make sense to next define "equivalence", something like:
>
> During a build, it may often be the case that the output/result of a task might 
> be unchanged despite changes in the task's input values. An example might be 
> whitespace changes some input C code. In project terms, this is what we define 
> as "equivalence". We can create a hash/checksum which represents a task and two 
> input task hashes are said to be equivalent if the hash of the generated output 
> (as stored/restored by sstate) is the same.
>
> You can then go on to say that this means:
>
>> +BitBake can go beyond this and also skip the execution of tasks that,
>> +even though the hash for their dependencies has changed, still generate
>> +identical output.
>> +
>> +So, even if two executions of a given task can have a different hash,
>> +because of potentially different sources, different environment variables
>> +or different output hashes for the tasks they depend on, they can be considered
>> +as "equivalent" as long as they generate the same output hash.
> Once bitbake knows that two input hashes for a task have equivalent output, 
> this has important and useful implications for all tasks depending on this task.
>
>
>> +Thanks to this equivalence, a change in one of the tasks in BitBake's run queue
>> +doesn't have to propagate to all the downstream tasks that depend on the output
>> +of this task, causing a full rebuild of such tasks, and so on with the next
>> +depending tasks. Instead, BitBake can safely retrieve all the downstream
>> +task output from the Shared State Cache.
>> +
>> +This applies to multiple scenarios:
>> +
>> +-  A "trivial" change to a recipe that doesn't impact its generated output,
>> +   such as whitespace changes, modifications to unused code paths or
>> +   in the ordering of variables.
>> +
>> +-  Shared library updates, for example to fix a security vulnerability.
>> +   For sure, the programs using such a library should be rebuilt, but
>> +   their new binaries should remain identical. The corresponding tasks should
>> +   have a different output hash because of the change in the hash of their
>> +   library dependency, but thanks to their output being identical, hash
>> +   equivalence will stop the propagation down the dependency chain.
>> +
>> +-  Native tool updates. Though the depending tasks should be rebuilt,
>> +   it's likely that they will generate the same output and be marked
>> +   as equivalent.
>> +
>> +This mechanism is enabled by default in Poky, and is controlled by two
>> +variables:
>> +
>> +-  :term:`bitbake:BB_HASHSERVE`, specifying a local or remote hash
>> +   equivalence server to use.
>> +
>> +-  :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set  to ``OEEquivHash``.
>> +
>> +Therefore, the default configuration in Poky corresponds to the
>> +below settings::
>> +
>> +   BB_HASHSERVE = "auto"
>> +   BB_SIGNATURE_HANDLER = "OEEquivHash"
>> +
>> +Another possibility is to share a hash equivalence server on a network,
>> +by setting::
>> +
>> +   BB_HASHSERVE = "<HOSTNAME>:<PORT>"
>> +
>> +.. note::
>> +
>> +   The hash equivalence server needs to be maintained together with the
>> +   share state cache. Otherwise, the server could report shared state hashes
>> +   that do not exist.
> We therefore recommend that one hash equivalence server be setup to correspond 
> with a given sstate cache.


Thanks, I took these suggestions in my branch, removing some of the
replaced lines.

I will also take care of properly explaining the different hashes.

Cheers,

Michael.

-- 
Michael Opdenacker, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [docs] [RFC 0/1] initial documentation for hash equivalence
  2021-06-17 20:47 ` [docs] [RFC 0/1] " Richard Purdie
@ 2021-06-18 16:59   ` Michael Opdenacker
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Opdenacker @ 2021-06-18 16:59 UTC (permalink / raw)
  To: Richard Purdie, JPEWhacker, docs

Hi Richard,

Many thanks for your answers to my questions!

On 6/17/21 10:47 PM, Richard Purdie wrote:
> We definitely need a high level overview so people can understand what it 
> is doing. I suspect we'll need this and some more in depth docs too.

Right, this makes sense.

>
>>   Would I need to add more details about the implementation,
>>   such as using explaining unihashes and adding diagrams such
>>   as the ones in Joshua Watt's presentation on the topic?
> People will need to know more about it so I do think further manual 
> sections on topics like that will be needed, yes.


Ok, will do :)

>> - Doesn't this need a more thorough update of the whole
>>   documentation about share state cache to give the new
>>   full picture right away, rather than adding this new
>>   feature later in the manual ?
> We do need to update other sections and tie this in, yes.


Yes, I suspected that. Thanks for confirming. I'll see what I can do.

>
>> - Anything else that I should mention?
> There is a read only mode and pass through for the server which will be 
> needed. We need to document how to launch a standalone server instance?


Good to know. I'll investigate it.

Thanks again!

Michael.

-- 
Michael Opdenacker, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-18 16:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17 19:55 [RFC 0/1] initial documentation for hash equivalence Michael Opdenacker
2021-06-17 19:55 ` [RFC 1/1] overview-manual: " Michael Opdenacker
2021-06-17 20:40   ` [docs] " Richard Purdie
2021-06-18 16:55     ` Michael Opdenacker
2021-06-17 20:47 ` [docs] [RFC 0/1] " Richard Purdie
2021-06-18 16:59   ` Michael Opdenacker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.