* [PATCH v2] um: time-travel: fix time corruption
@ 2023-10-25 20:45 Johannes Berg
2023-10-26 7:23 ` Vincent Whitchurch
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Berg @ 2023-10-25 20:45 UTC (permalink / raw)
To: linux-um; +Cc: Johannes Berg, Vincent Whitchurch
From: Johannes Berg <johannes.berg@intel.com>
In 'basic' time-travel mode (without =inf-cpu or =ext), we
still get timer interrupts. These can happen at arbitrary
points in time, i.e. while in timer_read(), which pushes
time forward just a little bit. Then, if we happen to get
the interrupt after calculating the new time to push to,
but before actually finishing that, the interrupt will set
the time to a value that's incompatible with the forward,
and we'll crash because time goes backwards when we do the
forwarding.
Fix this by reading the time_travel_time, calculating the
adjustment, and doing the adjustment all with interrupts
disabled.
Reported-by: Vincent Whitchurch <Vincent.Whitchurch@axis.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
v2: remove stray debug code
---
arch/um/kernel/time.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 8ff46bc86d09..0c01674e14d5 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -551,9 +551,29 @@ static void time_travel_update_time(unsigned long long next, bool idle)
time_travel_del_event(&ne);
}
+static void time_travel_update_time_rel(unsigned long long offs)
+{
+ unsigned long flags;
+
+ /*
+ * Disable interrupts before calculating the new time so
+ * that a real timer interrupt (signal) can't happen at
+ * a bad time e.g. after we read time_travel_time but
+ * before we've completed updating the time.
+ */
+ local_irq_save(flags);
+ time_travel_update_time(time_travel_time + offs, false);
+ local_irq_restore(flags);
+}
+
void time_travel_ndelay(unsigned long nsec)
{
- time_travel_update_time(time_travel_time + nsec, false);
+ /*
+ * Not strictly needed to use _rel() version since this is
+ * only used in INFCPU/EXT modes, but it doesn't hurt and
+ * is more readable too.
+ */
+ time_travel_update_time_rel(nsec);
}
EXPORT_SYMBOL(time_travel_ndelay);
@@ -687,7 +707,11 @@ static void time_travel_set_start(void)
#define time_travel_time 0
#define time_travel_ext_waiting 0
-static inline void time_travel_update_time(unsigned long long ns, bool retearly)
+static inline void time_travel_update_time(unsigned long long ns, bool idle)
+{
+}
+
+static inline void time_travel_update_time_rel(unsigned long long offs)
{
}
@@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
*/
if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
!time_travel_ext_waiting)
- time_travel_update_time(time_travel_time +
- TIMER_MULTIPLIER,
- false);
+ time_travel_update_time_rel(TIMER_MULTIPLIER);
return time_travel_time / TIMER_MULTIPLIER;
}
--
2.41.0
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] um: time-travel: fix time corruption
2023-10-25 20:45 [PATCH v2] um: time-travel: fix time corruption Johannes Berg
@ 2023-10-26 7:23 ` Vincent Whitchurch
2023-10-26 7:38 ` Johannes Berg
0 siblings, 1 reply; 5+ messages in thread
From: Vincent Whitchurch @ 2023-10-26 7:23 UTC (permalink / raw)
To: johannes, linux-um; +Cc: johannes.berg, Vincent Whitchurch
On Wed, 2023-10-25 at 22:45 +0200, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
>
> In 'basic' time-travel mode (without =inf-cpu or =ext), we
> still get timer interrupts. These can happen at arbitrary
> points in time, i.e. while in timer_read(), which pushes
> time forward just a little bit. Then, if we happen to get
> the interrupt after calculating the new time to push to,
> but before actually finishing that, the interrupt will set
> the time to a value that's incompatible with the forward,
> and we'll crash because time goes backwards when we do the
> forwarding.
>
> Fix this by reading the time_travel_time, calculating the
> adjustment, and doing the adjustment all with interrupts
> disabled.
>
> Reported-by: Vincent Whitchurch <Vincent.Whitchurch@axis.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Thanks, this works for me too. However, one question below.
> ---
> v2: remove stray debug code
> ---
> arch/um/kernel/time.c | 32 +++++++++++++++++++++++++++-----
> 1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
> index 8ff46bc86d09..0c01674e14d5 100644
> --- a/arch/um/kernel/time.c
> +++ b/arch/um/kernel/time.c
> @@ -551,9 +551,29 @@ static void time_travel_update_time(unsigned long long next, bool idle)
> time_travel_del_event(&ne);
> }
>
>
> +static void time_travel_update_time_rel(unsigned long long offs)
> +{
> + unsigned long flags;
> +
> + /*
> + * Disable interrupts before calculating the new time so
> + * that a real timer interrupt (signal) can't happen at
> + * a bad time e.g. after we read time_travel_time but
> + * before we've completed updating the time.
> + */
> + local_irq_save(flags);
> + time_travel_update_time(time_travel_time + offs, false);
> + local_irq_restore(flags);
> +}
> +
> void time_travel_ndelay(unsigned long nsec)
> {
> - time_travel_update_time(time_travel_time + nsec, false);
> + /*
> + * Not strictly needed to use _rel() version since this is
> + * only used in INFCPU/EXT modes, but it doesn't hurt and
> + * is more readable too.
> + */
> + time_travel_update_time_rel(nsec);
> }
> EXPORT_SYMBOL(time_travel_ndelay);
>
>
> @@ -687,7 +707,11 @@ static void time_travel_set_start(void)
> #define time_travel_time 0
> #define time_travel_ext_waiting 0
>
>
> -static inline void time_travel_update_time(unsigned long long ns, bool retearly)
> +static inline void time_travel_update_time(unsigned long long ns, bool idle)
> +{
> +}
> +
> +static inline void time_travel_update_time_rel(unsigned long long offs)
> {
> }
>
>
> @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> */
> if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> !time_travel_ext_waiting)
> - time_travel_update_time(time_travel_time +
> - TIMER_MULTIPLIER,
> - false);
> + time_travel_update_time_rel(TIMER_MULTIPLIER);
> return time_travel_time / TIMER_MULTIPLIER;
> }
The reason I hesitated with putting the whole of
time_travel_update_time() under local_irq_save() in my attempt was
because I didn't quite understand the reason for the !irqs_disabled()
condition here and the comment just above it about recursion and things
getting messed up. If it's OK to disable interrupts as this patch does,
is the !irqs_disabled() condition valid?
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] um: time-travel: fix time corruption
2023-10-26 7:23 ` Vincent Whitchurch
@ 2023-10-26 7:38 ` Johannes Berg
2023-10-26 8:49 ` Vincent Whitchurch
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Berg @ 2023-10-26 7:38 UTC (permalink / raw)
To: Vincent Whitchurch, linux-um
On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> > */
> > if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> > !time_travel_ext_waiting)
> > - time_travel_update_time(time_travel_time +
> > - TIMER_MULTIPLIER,
> > - false);
> > + time_travel_update_time_rel(TIMER_MULTIPLIER);
> > return time_travel_time / TIMER_MULTIPLIER;
> > }
>
> The reason I hesitated with putting the whole of
> time_travel_update_time() under local_irq_save() in my attempt was
> because I didn't quite understand the reason for the !irqs_disabled()
> condition here and the comment just above it about recursion and things
> getting messed up. If it's OK to disable interrupts as this patch does,
> is the !irqs_disabled() condition valid?
Hmm. I was going to say that's different, because it wants to only
prevent us from doing this while we're *already* in IRQ context, and the
bug you found is calling timer_read() not in IRQ context, but getting an
event queued by the signal.
But ... now that I think about it, I have a feeling that this was a
workaround for the exact same problem, and I just didn't understand it
at the time? I mean, recursing into our own processing is now impossible
here after this patch - either we're running normally, or the interrupt
cannot hit timer_read() in the middle, same as it cannot hit
time_travel_handle_real_alarm() in the middle now.
Removing that still seems to work with your test, but it's also not a
good test for this, since there are no devices etc. that could have
interrupts, not sure how to test it right now?
Maybe I'll add a comment there saying this might no longer be needed?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] um: time-travel: fix time corruption
2023-10-26 7:38 ` Johannes Berg
@ 2023-10-26 8:49 ` Vincent Whitchurch
2023-10-26 9:10 ` Johannes Berg
0 siblings, 1 reply; 5+ messages in thread
From: Vincent Whitchurch @ 2023-10-26 8:49 UTC (permalink / raw)
To: johannes, Vincent Whitchurch, linux-um
On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote:
> On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> > > */
> > > if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> > > !time_travel_ext_waiting)
> > > - time_travel_update_time(time_travel_time +
> > > - TIMER_MULTIPLIER,
> > > - false);
> > > + time_travel_update_time_rel(TIMER_MULTIPLIER);
> > > return time_travel_time / TIMER_MULTIPLIER;
> > > }
> >
> > The reason I hesitated with putting the whole of
> > time_travel_update_time() under local_irq_save() in my attempt was
> > because I didn't quite understand the reason for the !irqs_disabled()
> > condition here and the comment just above it about recursion and things
> > getting messed up. If it's OK to disable interrupts as this patch does,
> > is the !irqs_disabled() condition valid?
>
> Hmm. I was going to say that's different, because it wants to only
> prevent us from doing this while we're *already* in IRQ context, and the
> bug you found is calling timer_read() not in IRQ context, but getting an
> event queued by the signal.
>
> But ... now that I think about it, I have a feeling that this was a
> workaround for the exact same problem, and I just didn't understand it
> at the time? I mean, recursing into our own processing is now impossible
> here after this patch - either we're running normally, or the interrupt
> cannot hit timer_read() in the middle, same as it cannot hit
> time_travel_handle_real_alarm() in the middle now.
>
> Removing that still seems to work with your test, but it's also not a
> good test for this, since there are no devices etc. that could have
> interrupts, not sure how to test it right now?
>
> Maybe I'll add a comment there saying this might no longer be needed?
I tried removing the !irqs_disabled() check and that blew up pretty
quickly (below) when running the full roadtest suite. It works fine
with your unmodified patch so no need to change the comment.
Kernel panic - not syncing: time-travel: time goes backwards 26374790000864 -> 26374790000853
show_stack.cold (arch/um/kernel/sysrq.c:56)
dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
dump_stack (lib/dump_stack.c:114)
panic (kernel/panic.c:262 kernel/panic.c:361)
timer_handler.cold (arch/um/kernel/time.c:51 arch/um/kernel/time.c:510 arch/um/kernel/time.c:634)
timer_real_alarm_handler (arch/um/os-Linux/signal.c:109)
unblock_signals (arch/um/os-Linux/signal.c:338)
tick_nohz_idle_exit (kernel/time/tick-sched.c:1364)
do_idle (kernel/sched/idle.c:310)
cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1))
kernel_init (init/main.c:1435)
0x60001ce6
0x6000220e
0x60004961
new_thread_handler (arch/um/include/asm/thread_info.h:46 arch/um/kernel/process.c:136)
uml_finishsetup (arch/um/kernel/um_arch.c:268)
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] um: time-travel: fix time corruption
2023-10-26 8:49 ` Vincent Whitchurch
@ 2023-10-26 9:10 ` Johannes Berg
0 siblings, 0 replies; 5+ messages in thread
From: Johannes Berg @ 2023-10-26 9:10 UTC (permalink / raw)
To: Vincent Whitchurch, linux-um
On Thu, 2023-10-26 at 08:49 +0000, Vincent Whitchurch wrote:
> On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote:
> > On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> > > > */
> > > > if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> > > > !time_travel_ext_waiting)
> > > > - time_travel_update_time(time_travel_time +
> > > > - TIMER_MULTIPLIER,
> > > > - false);
> > > > + time_travel_update_time_rel(TIMER_MULTIPLIER);
> > > > return time_travel_time / TIMER_MULTIPLIER;
> > > > }
> > >
> > > The reason I hesitated with putting the whole of
> > > time_travel_update_time() under local_irq_save() in my attempt was
> > > because I didn't quite understand the reason for the !irqs_disabled()
> > > condition here and the comment just above it about recursion and things
> > > getting messed up. If it's OK to disable interrupts as this patch does,
> > > is the !irqs_disabled() condition valid?
> >
> > Hmm. I was going to say that's different, because it wants to only
> > prevent us from doing this while we're *already* in IRQ context, and the
> > bug you found is calling timer_read() not in IRQ context, but getting an
> > event queued by the signal.
> >
> > But ... now that I think about it, I have a feeling that this was a
> > workaround for the exact same problem, and I just didn't understand it
> > at the time? I mean, recursing into our own processing is now impossible
> > here after this patch - either we're running normally, or the interrupt
> > cannot hit timer_read() in the middle, same as it cannot hit
> > time_travel_handle_real_alarm() in the middle now.
> >
> > Removing that still seems to work with your test, but it's also not a
> > good test for this, since there are no devices etc. that could have
> > interrupts, not sure how to test it right now?
> >
> > Maybe I'll add a comment there saying this might no longer be needed?
>
> I tried removing the !irqs_disabled() check and that blew up pretty
> quickly (below) when running the full roadtest suite. It works fine
> with your unmodified patch so no need to change the comment.
>
Hah, OK. So maybe when/if I remember what happens there or can figure it
out again, I can update the comment :)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-10-26 9:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-25 20:45 [PATCH v2] um: time-travel: fix time corruption Johannes Berg
2023-10-26 7:23 ` Vincent Whitchurch
2023-10-26 7:38 ` Johannes Berg
2023-10-26 8:49 ` Vincent Whitchurch
2023-10-26 9:10 ` Johannes Berg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.