All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] um: time-travel: fix time corruption
@ 2023-10-25 20:45 Johannes Berg
  2023-10-26  7:23 ` Vincent Whitchurch
  0 siblings, 1 reply; 5+ messages in thread
From: Johannes Berg @ 2023-10-25 20:45 UTC (permalink / raw)
  To: linux-um; +Cc: Johannes Berg, Vincent Whitchurch

From: Johannes Berg <johannes.berg@intel.com>

In 'basic' time-travel mode (without =inf-cpu or =ext), we
still get timer interrupts. These can happen at arbitrary
points in time, i.e. while in timer_read(), which pushes
time forward just a little bit. Then, if we happen to get
the interrupt after calculating the new time to push to,
but before actually finishing that, the interrupt will set
the time to a value that's incompatible with the forward,
and we'll crash because time goes backwards when we do the
forwarding.

Fix this by reading the time_travel_time, calculating the
adjustment, and doing the adjustment all with interrupts
disabled.

Reported-by: Vincent Whitchurch <Vincent.Whitchurch@axis.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
v2: remove stray debug code
---
 arch/um/kernel/time.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 8ff46bc86d09..0c01674e14d5 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -551,9 +551,29 @@ static void time_travel_update_time(unsigned long long next, bool idle)
 	time_travel_del_event(&ne);
 }
 
+static void time_travel_update_time_rel(unsigned long long offs)
+{
+	unsigned long flags;
+
+	/*
+	 * Disable interrupts before calculating the new time so
+	 * that a real timer interrupt (signal) can't happen at
+	 * a bad time e.g. after we read time_travel_time but
+	 * before we've completed updating the time.
+	 */
+	local_irq_save(flags);
+	time_travel_update_time(time_travel_time + offs, false);
+	local_irq_restore(flags);
+}
+
 void time_travel_ndelay(unsigned long nsec)
 {
-	time_travel_update_time(time_travel_time + nsec, false);
+	/*
+	 * Not strictly needed to use _rel() version since this is
+	 * only used in INFCPU/EXT modes, but it doesn't hurt and
+	 * is more readable too.
+	 */
+	time_travel_update_time_rel(nsec);
 }
 EXPORT_SYMBOL(time_travel_ndelay);
 
@@ -687,7 +707,11 @@ static void time_travel_set_start(void)
 #define time_travel_time 0
 #define time_travel_ext_waiting 0
 
-static inline void time_travel_update_time(unsigned long long ns, bool retearly)
+static inline void time_travel_update_time(unsigned long long ns, bool idle)
+{
+}
+
+static inline void time_travel_update_time_rel(unsigned long long offs)
 {
 }
 
@@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
 		 */
 		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
 		    !time_travel_ext_waiting)
-			time_travel_update_time(time_travel_time +
-						TIMER_MULTIPLIER,
-						false);
+			time_travel_update_time_rel(TIMER_MULTIPLIER);
 		return time_travel_time / TIMER_MULTIPLIER;
 	}
 
-- 
2.41.0


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] um: time-travel: fix time corruption
  2023-10-25 20:45 [PATCH v2] um: time-travel: fix time corruption Johannes Berg
@ 2023-10-26  7:23 ` Vincent Whitchurch
  2023-10-26  7:38   ` Johannes Berg
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Whitchurch @ 2023-10-26  7:23 UTC (permalink / raw)
  To: johannes, linux-um; +Cc: johannes.berg, Vincent Whitchurch

On Wed, 2023-10-25 at 22:45 +0200, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> In 'basic' time-travel mode (without =inf-cpu or =ext), we
> still get timer interrupts. These can happen at arbitrary
> points in time, i.e. while in timer_read(), which pushes
> time forward just a little bit. Then, if we happen to get
> the interrupt after calculating the new time to push to,
> but before actually finishing that, the interrupt will set
> the time to a value that's incompatible with the forward,
> and we'll crash because time goes backwards when we do the
> forwarding.
> 
> Fix this by reading the time_travel_time, calculating the
> adjustment, and doing the adjustment all with interrupts
> disabled.
> 
> Reported-by: Vincent Whitchurch <Vincent.Whitchurch@axis.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Thanks, this works for me too.  However, one question below.

> ---
> v2: remove stray debug code
> ---
>  arch/um/kernel/time.c | 32 +++++++++++++++++++++++++++-----
>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
> index 8ff46bc86d09..0c01674e14d5 100644
> --- a/arch/um/kernel/time.c
> +++ b/arch/um/kernel/time.c
> @@ -551,9 +551,29 @@ static void time_travel_update_time(unsigned long long next, bool idle)
>  	time_travel_del_event(&ne);
>  }
>  
> 
> +static void time_travel_update_time_rel(unsigned long long offs)
> +{
> +	unsigned long flags;
> +
> +	/*
> +	 * Disable interrupts before calculating the new time so
> +	 * that a real timer interrupt (signal) can't happen at
> +	 * a bad time e.g. after we read time_travel_time but
> +	 * before we've completed updating the time.
> +	 */
> +	local_irq_save(flags);
> +	time_travel_update_time(time_travel_time + offs, false);
> +	local_irq_restore(flags);
> +}
> +
>  void time_travel_ndelay(unsigned long nsec)
>  {
> -	time_travel_update_time(time_travel_time + nsec, false);
> +	/*
> +	 * Not strictly needed to use _rel() version since this is
> +	 * only used in INFCPU/EXT modes, but it doesn't hurt and
> +	 * is more readable too.
> +	 */
> +	time_travel_update_time_rel(nsec);
>  }
>  EXPORT_SYMBOL(time_travel_ndelay);
>  
> 
> @@ -687,7 +707,11 @@ static void time_travel_set_start(void)
>  #define time_travel_time 0
>  #define time_travel_ext_waiting 0
>  
> 
> -static inline void time_travel_update_time(unsigned long long ns, bool retearly)
> +static inline void time_travel_update_time(unsigned long long ns, bool idle)
> +{
> +}
> +
> +static inline void time_travel_update_time_rel(unsigned long long offs)
>  {
>  }
>  
> 
> @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
>  		 */
>  		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
>  		    !time_travel_ext_waiting)
> -			time_travel_update_time(time_travel_time +
> -						TIMER_MULTIPLIER,
> -						false);
> +			time_travel_update_time_rel(TIMER_MULTIPLIER);
>  		return time_travel_time / TIMER_MULTIPLIER;
>  	}

The reason I hesitated with putting the whole of
time_travel_update_time() under local_irq_save() in my attempt was
because I didn't quite understand the reason for the !irqs_disabled()
condition here and the comment just above it about recursion and things
getting messed up.  If it's OK to disable interrupts as this patch does,
is the !irqs_disabled() condition valid?
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] um: time-travel: fix time corruption
  2023-10-26  7:23 ` Vincent Whitchurch
@ 2023-10-26  7:38   ` Johannes Berg
  2023-10-26  8:49     ` Vincent Whitchurch
  0 siblings, 1 reply; 5+ messages in thread
From: Johannes Berg @ 2023-10-26  7:38 UTC (permalink / raw)
  To: Vincent Whitchurch, linux-um

On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> >  		 */
> >  		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> >  		    !time_travel_ext_waiting)
> > -			time_travel_update_time(time_travel_time +
> > -						TIMER_MULTIPLIER,
> > -						false);
> > +			time_travel_update_time_rel(TIMER_MULTIPLIER);
> >  		return time_travel_time / TIMER_MULTIPLIER;
> >  	}
> 
> The reason I hesitated with putting the whole of
> time_travel_update_time() under local_irq_save() in my attempt was
> because I didn't quite understand the reason for the !irqs_disabled()
> condition here and the comment just above it about recursion and things
> getting messed up.  If it's OK to disable interrupts as this patch does,
> is the !irqs_disabled() condition valid?

Hmm. I was going to say that's different, because it wants to only
prevent us from doing this while we're *already* in IRQ context, and the
bug you found is calling timer_read() not in IRQ context, but getting an
event queued by the signal.

But ... now that I think about it, I have a feeling that this was a
workaround for the exact same problem, and I just didn't understand it
at the time? I mean, recursing into our own processing is now impossible
here after this patch - either we're running normally, or the interrupt
cannot hit timer_read() in the middle, same as it cannot hit
time_travel_handle_real_alarm() in the middle now.

Removing that still seems to work with your test, but it's also not a
good test for this, since there are no devices etc. that could have
interrupts, not sure how to test it right now?

Maybe I'll add a comment there saying this might no longer be needed?

johannes

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] um: time-travel: fix time corruption
  2023-10-26  7:38   ` Johannes Berg
@ 2023-10-26  8:49     ` Vincent Whitchurch
  2023-10-26  9:10       ` Johannes Berg
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Whitchurch @ 2023-10-26  8:49 UTC (permalink / raw)
  To: johannes, Vincent Whitchurch, linux-um

On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote:
> On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> > >  		 */
> > >  		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> > >  		    !time_travel_ext_waiting)
> > > -			time_travel_update_time(time_travel_time +
> > > -						TIMER_MULTIPLIER,
> > > -						false);
> > > +			time_travel_update_time_rel(TIMER_MULTIPLIER);
> > >  		return time_travel_time / TIMER_MULTIPLIER;
> > >  	}
> > 
> > The reason I hesitated with putting the whole of
> > time_travel_update_time() under local_irq_save() in my attempt was
> > because I didn't quite understand the reason for the !irqs_disabled()
> > condition here and the comment just above it about recursion and things
> > getting messed up.  If it's OK to disable interrupts as this patch does,
> > is the !irqs_disabled() condition valid?
> 
> Hmm. I was going to say that's different, because it wants to only
> prevent us from doing this while we're *already* in IRQ context, and the
> bug you found is calling timer_read() not in IRQ context, but getting an
> event queued by the signal.
> 
> But ... now that I think about it, I have a feeling that this was a
> workaround for the exact same problem, and I just didn't understand it
> at the time? I mean, recursing into our own processing is now impossible
> here after this patch - either we're running normally, or the interrupt
> cannot hit timer_read() in the middle, same as it cannot hit
> time_travel_handle_real_alarm() in the middle now.
> 
> Removing that still seems to work with your test, but it's also not a
> good test for this, since there are no devices etc. that could have
> interrupts, not sure how to test it right now?
> 
> Maybe I'll add a comment there saying this might no longer be needed?

I tried removing the !irqs_disabled() check and that blew up pretty
quickly (below) when running the full roadtest suite.  It works fine
with your unmodified patch so no need to change the comment.

 Kernel panic - not syncing: time-travel: time goes backwards 26374790000864 -> 26374790000853
 show_stack.cold (arch/um/kernel/sysrq.c:56) 
 dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
 dump_stack (lib/dump_stack.c:114) 
 panic (kernel/panic.c:262 kernel/panic.c:361) 
 timer_handler.cold (arch/um/kernel/time.c:51 arch/um/kernel/time.c:510 arch/um/kernel/time.c:634) 
 timer_real_alarm_handler (arch/um/os-Linux/signal.c:109) 
 unblock_signals (arch/um/os-Linux/signal.c:338) 
 tick_nohz_idle_exit (kernel/time/tick-sched.c:1364) 
 do_idle (kernel/sched/idle.c:310) 
 cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1)) 
 kernel_init (init/main.c:1435) 
 0x60001ce6 
 0x6000220e 
 0x60004961 
 new_thread_handler (arch/um/include/asm/thread_info.h:46 arch/um/kernel/process.c:136) 
 uml_finishsetup (arch/um/kernel/um_arch.c:268) 

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] um: time-travel: fix time corruption
  2023-10-26  8:49     ` Vincent Whitchurch
@ 2023-10-26  9:10       ` Johannes Berg
  0 siblings, 0 replies; 5+ messages in thread
From: Johannes Berg @ 2023-10-26  9:10 UTC (permalink / raw)
  To: Vincent Whitchurch, linux-um

On Thu, 2023-10-26 at 08:49 +0000, Vincent Whitchurch wrote:
> On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote:
> > On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> > > >  		 */
> > > >  		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> > > >  		    !time_travel_ext_waiting)
> > > > -			time_travel_update_time(time_travel_time +
> > > > -						TIMER_MULTIPLIER,
> > > > -						false);
> > > > +			time_travel_update_time_rel(TIMER_MULTIPLIER);
> > > >  		return time_travel_time / TIMER_MULTIPLIER;
> > > >  	}
> > > 
> > > The reason I hesitated with putting the whole of
> > > time_travel_update_time() under local_irq_save() in my attempt was
> > > because I didn't quite understand the reason for the !irqs_disabled()
> > > condition here and the comment just above it about recursion and things
> > > getting messed up.  If it's OK to disable interrupts as this patch does,
> > > is the !irqs_disabled() condition valid?
> > 
> > Hmm. I was going to say that's different, because it wants to only
> > prevent us from doing this while we're *already* in IRQ context, and the
> > bug you found is calling timer_read() not in IRQ context, but getting an
> > event queued by the signal.
> > 
> > But ... now that I think about it, I have a feeling that this was a
> > workaround for the exact same problem, and I just didn't understand it
> > at the time? I mean, recursing into our own processing is now impossible
> > here after this patch - either we're running normally, or the interrupt
> > cannot hit timer_read() in the middle, same as it cannot hit
> > time_travel_handle_real_alarm() in the middle now.
> > 
> > Removing that still seems to work with your test, but it's also not a
> > good test for this, since there are no devices etc. that could have
> > interrupts, not sure how to test it right now?
> > 
> > Maybe I'll add a comment there saying this might no longer be needed?
> 
> I tried removing the !irqs_disabled() check and that blew up pretty
> quickly (below) when running the full roadtest suite.  It works fine
> with your unmodified patch so no need to change the comment.
> 

Hah, OK. So maybe when/if I remember what happens there or can figure it
out again, I can update the comment :)

johannes

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-26  9:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-25 20:45 [PATCH v2] um: time-travel: fix time corruption Johannes Berg
2023-10-26  7:23 ` Vincent Whitchurch
2023-10-26  7:38   ` Johannes Berg
2023-10-26  8:49     ` Vincent Whitchurch
2023-10-26  9:10       ` Johannes Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.