All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Sequence number out of range fix
@ 2020-01-15 17:08 Jakub Witowski
  2020-01-15 17:08 ` [PATCH 1/1] mesh: sequence " Jakub Witowski
  2020-01-15 18:35 ` [PATCH 0/1] Sequence " Gix, Brian
  0 siblings, 2 replies; 8+ messages in thread
From: Jakub Witowski @ 2020-01-15 17:08 UTC (permalink / raw)
  To: linux-bluetooth

During sending messages with high speed, the 'cached' value may have exceeded the 24-bit range
and this value will save into storage.

The current implementation protects us against exceeding max sequence number (0xFFFFFF).

Jakub Witowski (1):
  mesh: sequence number out of range fix

 mesh/mesh-config-json.c | 4 ++++
 mesh/net.c              | 3 +++
 2 files changed, 7 insertions(+)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] mesh: sequence number out of range fix
  2020-01-15 17:08 [PATCH 0/1] Sequence number out of range fix Jakub Witowski
@ 2020-01-15 17:08 ` Jakub Witowski
  2020-01-15 18:35 ` [PATCH 0/1] Sequence " Gix, Brian
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Witowski @ 2020-01-15 17:08 UTC (permalink / raw)
  To: linux-bluetooth

---
 mesh/mesh-config-json.c | 4 ++++
 mesh/net.c              | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/mesh/mesh-config-json.c b/mesh/mesh-config-json.c
index 755caab0e..3ee3317d9 100644
--- a/mesh/mesh-config-json.c
+++ b/mesh/mesh-config-json.c
@@ -40,6 +40,7 @@
 #include "mesh/mesh-defs.h"
 #include "mesh/util.h"
 #include "mesh/mesh-config.h"
+#include "mesh/net.h"
 
 /* To prevent local node JSON cache thrashing, minimum update times */
 #define MIN_SEQ_CACHE_TRIGGER	32
@@ -2019,6 +2020,9 @@ bool mesh_config_write_seq_number(struct mesh_config *cfg, uint32_t seq,
 		if (cached < seq + MIN_SEQ_CACHE_VALUE)
 			cached = seq + MIN_SEQ_CACHE_VALUE;
 
+		if (cached >= SEQ_MASK)
+			cached = SEQ_MASK;
+
 		l_debug("Seq Cache: %d -> %d", seq, cached);
 
 		cfg->write_seq = seq;
diff --git a/mesh/net.c b/mesh/net.c
index f0f0dbdbd..10dfd5dd3 100644
--- a/mesh/net.c
+++ b/mesh/net.c
@@ -511,6 +511,9 @@ uint32_t mesh_net_next_seq_num(struct mesh_net *net)
 {
 	uint32_t seq = net->seq_num++;
 
+	if (net->seq_num > SEQ_MASK)
+		net->seq_num = SEQ_MASK;
+
 	node_set_sequence_number(net->node, net->seq_num);
 	return seq;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Sequence number out of range fix
  2020-01-15 17:08 [PATCH 0/1] Sequence number out of range fix Jakub Witowski
  2020-01-15 17:08 ` [PATCH 1/1] mesh: sequence " Jakub Witowski
@ 2020-01-15 18:35 ` Gix, Brian
  2020-01-15 19:09   ` michal.lowas-rzechonek
  1 sibling, 1 reply; 8+ messages in thread
From: Gix, Brian @ 2020-01-15 18:35 UTC (permalink / raw)
  To: jakub.witowski, linux-bluetooth; +Cc: istotlan, michal.lowas-rzechonek

Hi Jakub,

On Wed, 2020-01-15 at 18:08 +0100, Jakub Witowski wrote:
> During sending messages with high speed, the 'cached' value may have exceeded the 24-bit range
> and this value will save into storage.
> 
> The current implementation protects us against exceeding max sequence number (0xFFFFFF).
> 
> Jakub Witowski (1):
>   mesh: sequence number out of range fix
> 
>  mesh/mesh-config-json.c | 4 ++++
>  mesh/net.c              | 3 +++
>  2 files changed, 7 insertions(+)
> 

This is a very dangerous course of action.  The suggested patch might potentially cause a node to re-use a
sequence number more than once, which would cause a "dirty nonce" condition, and allow unauthorized entities to
derive the encrypted data without the keys.

The Mesh working group put a lot of work into determining what should be range of Sequence Numbers and IV
Indexes, and came up with the 24 bits of sequence numbers, which can be reset to 0 with an IV Index update
every 192 hours.  This will allow a node to send 24 packets (every 42ms) a second constantly without running
out of sequence numbers.  While this is technically possible, especially with a daemon that might be looping
back messages internally without ever using the OTA interface, it is not really possible in an actual BT driven
real life system.


So here is the solution I would suggest:

Beacuse we store sequence numbers internally with a u32 sized data type, we should *let* the value go over the
max legal sequence nunmbver of 0xFFFFFF (perhaps capping it at 0x01000000 to prevent "super overflows").  Then
we *reject* all send requests with a sequence number > 0xFFFFFF.

Assuming the IV Index update logic is working correctly (triggered internally when the Sequence number goes
over 0x00800000... half the max) the node will work just fine as long as it never tries to send more than one
packet every 42ms for 192 hours without rest...  Just as the specification intended.  If it does, then the
specification and code will disallow it. 

BR,
Brian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Sequence number out of range fix
  2020-01-15 18:35 ` [PATCH 0/1] Sequence " Gix, Brian
@ 2020-01-15 19:09   ` michal.lowas-rzechonek
  2020-01-15 19:53     ` Gix, Brian
  0 siblings, 1 reply; 8+ messages in thread
From: michal.lowas-rzechonek @ 2020-01-15 19:09 UTC (permalink / raw)
  To: Gix, Brian; +Cc: jakub.witowski, linux-bluetooth, istotlan

Brian, Jakub,

On 01/15, Gix, Brian wrote:
> This is a very dangerous course of action.  The suggested patch might
> potentially cause a node to re-use a sequence number more than once,
> which would cause a "dirty nonce" condition, and allow unauthorized
> entities to derive the encrypted data without the keys.

Good point. Note that his is also possible in the current
implementation: since seq_num is applied to nonce with a 24bit mask,
it's going to wrap around.

> While this is technically possible, especially with a daemon that
> might be looping back messages internally without ever using the OTA
> interface, it is not really possible in an actual BT driven real life
> system.

There is another option to exhaust the 24 bit range: we have the
overcommit mechanism in the storage. Let's say the daemon starts, sends
a few messages (and overcommits the sequence by a certain value), then
either the daemon, or the system crashes.

Do that a few times, and you end up with storage containing sequence
number exceeding 24 bits.

> Beacuse we store sequence numbers internally with a u32 sized data
> type, we should *let* the value go over the max legal sequence
> nunmbver of 0xFFFFFF (perhaps capping it at 0x01000000 to prevent
> "super overflows").  Then we *reject* all send requests with a
> sequence number > 0xFFFFFF.

Sounds good.

-- 
Michał Lowas-Rzechonek <michal.lowas-rzechonek@silvair.com>
Silvair http://silvair.com
Jasnogórska 44, 31-358 Krakow, POLAND

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Sequence number out of range fix
  2020-01-15 19:09   ` michal.lowas-rzechonek
@ 2020-01-15 19:53     ` Gix, Brian
  2020-01-15 21:04       ` michal.lowas-rzechonek
  0 siblings, 1 reply; 8+ messages in thread
From: Gix, Brian @ 2020-01-15 19:53 UTC (permalink / raw)
  To: michal.lowas-rzechonek; +Cc: jakub.witowski, linux-bluetooth, istotlan

Hi Michał 

> On Jan 15, 2020, at 11:09 AM, "michal.lowas-rzechonek@silvair.com" <michal.lowas-rzechonek@silvair.com> wrote:
> 
> Brian, Jakub,
> 
>> On 01/15, Gix, Brian wrote:
>> This is a very dangerous course of action.  The suggested patch might
>> potentially cause a node to re-use a sequence number more than once,
>> which would cause a "dirty nonce" condition, and allow unauthorized
>> entities to derive the encrypted data without the keys.
> 
> Good point. Note that his is also possible in the current
> implementation: since seq_num is applied to nonce with a 24bit mask,
> it's going to wrap around.

The full IV Index is in the nonce, and at 192 hours per IV index, will be unique for something like 1.4 million years.

> 
>> While this is technically possible, especially with a daemon that
>> might be looping back messages internally without ever using the OTA
>> interface, it is not really possible in an actual BT driven real life
>> system.
> 
> There is another option to exhaust the 24 bit range: we have the
> overcommit mechanism in the storage. Let's say the daemon starts, sends
> a few messages (and overcommits the sequence by a certain value), then
> either the daemon, or the system crashes.
> 
> Do that a few times, and you end up with storage containing sequence
> number exceeding 24 bits.


The over commit is calculated based on the usage rate, and the daemon would need to unexpectedly abort (not just ctrl-c or exit) for us to use the over-commit value, as on deliberate exit, the actual sequence used is saved. If an unexpected abort occurs, I think the default daemon restart is 5 seconds?
> 
>> Beacuse we store sequence numbers internally with a u32 sized data
>> type, we should *let* the value go over the max legal sequence
>> nunmbver of 0xFFFFFF (perhaps capping it at 0x01000000 to prevent
>> "super overflows").  Then we *reject* all send requests with a
>> sequence number > 0xFFFFFF.
> 
> Sounds good.
> 
> -- 
> Michał Lowas-Rzechonek <michal.lowas-rzechonek@silvair.com>
> Silvair http://silvair.com
> Jasnogórska 44, 31-358 Krakow, POLAND

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Sequence number out of range fix
  2020-01-15 19:53     ` Gix, Brian
@ 2020-01-15 21:04       ` michal.lowas-rzechonek
  2020-01-15 21:21         ` Gix, Brian
  0 siblings, 1 reply; 8+ messages in thread
From: michal.lowas-rzechonek @ 2020-01-15 21:04 UTC (permalink / raw)
  To: Gix, Brian; +Cc: jakub.witowski, linux-bluetooth, istotlan

Hi Brian,

On 01/15, Gix, Brian wrote:
> > Good point. Note that his is also possible in the current
> > implementation: since seq_num is applied to nonce with a 24bit mask,
> > it's going to wrap around.
> 
> The full IV Index is in the nonce, and at 192 hours per IV index, will
> be unique for something like 1.4 million years.

Yes it is, but that's not the point.

At the moment, net->seq_num is a 32 bit value that *can* exceed 24bit
range, because mesh_net_next_seq_num() doesn't check ranges. So the
raw value can reach 0x1000000 and above.

Now, this raw value is used in send_seg, passed to
mesh_crypto_packet_build, which effectively applies a 24bit mask in line
640:

	l_put_be32(seq, packet + 1);
	packet[1] = (ctl ? CTL : 0) | (ttl & TTL_MASK);

So this means that when:

 - the network is already in iv update (so that you can't increase the
   iv_index, maybe you even started the procedure yourself because your
   seq_num is above the threshold)

 - your sequence number is sufficiently large (because of the "repeated
   crash" scenario described below)

Then the actual value used in the nonce will be net->seq_num & 0xffffff,
which is something you have *already* used before. All of that happens
with the same IV index.

> The over commit is calculated based on the usage rate, and the daemon
> would need to unexpectedly abort (not just ctrl-c or exit) for us to
> use the over-commit value

Indeed, that's precisely what I'm talking about - repeated, unhandled
process terminations. We're trying to come up with the patch simply
because this situation has *already happened* on one of our instances.

-- 
Michał Lowas-Rzechonek <michal.lowas-rzechonek@silvair.com>
Silvair http://silvair.com
Jasnogórska 44, 31-358 Krakow, POLAND

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Sequence number out of range fix
  2020-01-15 21:04       ` michal.lowas-rzechonek
@ 2020-01-15 21:21         ` Gix, Brian
  2020-01-15 22:02           ` michal.lowas-rzechonek
  0 siblings, 1 reply; 8+ messages in thread
From: Gix, Brian @ 2020-01-15 21:21 UTC (permalink / raw)
  To: michal.lowas-rzechonek; +Cc: istotlan, jakub.witowski, linux-bluetooth

On Wed, 2020-01-15 at 22:04 +0100, michal.lowas-rzechonek@silvair.com wrote:
> Hi Brian,
> 
> On 01/15, Gix, Brian wrote:
> > > Good point. Note that his is also possible in the current
> > > implementation: since seq_num is applied to nonce with a 24bit mask,
> > > it's going to wrap around.
> > 
> > The full IV Index is in the nonce, and at 192 hours per IV index, will
> > be unique for something like 1.4 million years.
> 
> Yes it is, but that's not the point.
> 
> At the moment, net->seq_num is a 32 bit value that *can* exceed 24bit
> range, because mesh_net_next_seq_num() doesn't check ranges. So the
> raw value can reach 0x1000000 and above.
> 
> Now, this raw value is used in send_seg, passed to
> mesh_crypto_packet_build, which effectively applies a 24bit mask in line
> 640:

Yes, we should definitely be sanity checking this, and not sending if SeqNum out of range.

> 
> 	l_put_be32(seq, packet + 1);
> 	packet[1] = (ctl ? CTL : 0) | (ttl & TTL_MASK);
> 
> So this means that when:
> 
>  - the network is already in iv update (so that you can't increase the
>    iv_index, maybe you even started the procedure yourself because your
>    seq_num is above the threshold)
> 
>  - your sequence number is sufficiently large (because of the "repeated
>    crash" scenario described below)

I think if we are repeatedly crashing, and it is causing a runaway sequence number increase, that being
forbidden to send more packets is a natural consequence, and people should fix the code that was causing the
crash in the first place.

> 
> Then the actual value used in the nonce will be net->seq_num & 0xffffff,
> which is something you have *already* used before. All of that happens
> with the same IV index.
> 
> > The over commit is calculated based on the usage rate, and the daemon
> > would need to unexpectedly abort (not just ctrl-c or exit) for us to
> > use the over-commit value
> 
> Indeed, that's precisely what I'm talking about - repeated, unhandled
> process terminations. We're trying to come up with the patch simply
> because this situation has *already happened* on one of our instances.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Sequence number out of range fix
  2020-01-15 21:21         ` Gix, Brian
@ 2020-01-15 22:02           ` michal.lowas-rzechonek
  0 siblings, 0 replies; 8+ messages in thread
From: michal.lowas-rzechonek @ 2020-01-15 22:02 UTC (permalink / raw)
  To: Gix, Brian; +Cc: istotlan, jakub.witowski, linux-bluetooth

On 01/15, Gix, Brian wrote:
> > Now, this raw value is used in send_seg, passed to
> > mesh_crypto_packet_build, which effectively applies a 24bit mask in line
> > 640:
> 
> Yes, we should definitely be sanity checking this, and not sending if
> SeqNum out of range.

Ok, we'll do that then.

> I think if we are repeatedly crashing, and it is causing a runaway
> sequence number increase, that being forbidden to send more packets is
> a natural consequence, and people should fix the code that was causing
> the crash in the first place.

You don't say. But it's not a given that the code is in the daemon
itself. Two words: OOM killer.

-- 
Michał Lowas-Rzechonek <michal.lowas-rzechonek@silvair.com>
Silvair http://silvair.com
Jasnogórska 44, 31-358 Krakow, POLAND

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-01-15 22:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 17:08 [PATCH 0/1] Sequence number out of range fix Jakub Witowski
2020-01-15 17:08 ` [PATCH 1/1] mesh: sequence " Jakub Witowski
2020-01-15 18:35 ` [PATCH 0/1] Sequence " Gix, Brian
2020-01-15 19:09   ` michal.lowas-rzechonek
2020-01-15 19:53     ` Gix, Brian
2020-01-15 21:04       ` michal.lowas-rzechonek
2020-01-15 21:21         ` Gix, Brian
2020-01-15 22:02           ` michal.lowas-rzechonek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.