linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@kernel.org>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Arnd Bergmann <arnd@arndb.de>,
	Russell King <linux@armlinux.org.uk>,
	Tony Luck <tony.luck@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Greg Ungerer <gerg@linux-m68k.org>,
	Finn Thain <fthain@telegraphics.com.au>,
	Philip Blundell <philb@gnu.org>,
	Joshua Thompson <funaho@jurai.org>, Sam Creasey <sammy@sammy.net>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Helge Deller <deller@gmx.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	John Stultz <john.stultz@linaro.org>,
	Stephen Boyd <sboyd@kernel.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	linux-ia64@vger.kernel.org,
	Parisc List <linux-parisc@vger.kernel.org>,
	linux-m68k <linux-m68k@lists.linux-m68k.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Anatoly Pugachev <matorola@gmail.com>
Subject: Re: [PATCH v2 05/15] ia64: convert to legacy_timer_tick
Date: Fri, 18 Dec 2020 23:07:26 +0100	[thread overview]
Message-ID: <CAK8P3a0s5Wt=9sZXAV1JzebZFXVy4qn3KNvygEfPeH3eVMyO_w@mail.gmail.com> (raw)
In-Reply-To: <59efce0e-a28d-9424-82ca-fb7f3a1b9c29@physik.fu-berlin.de>

[-- Attachment #1: Type: text/plain, Size: 5354 bytes --]

On Fri, Dec 18, 2020 at 10:29 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> Hi Arnd!
>
> On 10/30/20 4:17 PM, Arnd Bergmann wrote:
> > ia64 is the only architecture that calls xtime_update() in a loop,
> > once for each jiffie that has passed since the last event.
> >
> > Before commit 3171a0305d62 ("[PATCH] simplify update_times (avoid
> > jiffies/jiffies_64 aliasing problem)") in 2006, it could not actually do
> > this any differently, but now it seems simpler to just pass the number
> > of jiffies that passed in the meantime.
> >
> > While this loses the ability process interrupts in the middle of
> > the timer tick by calling local_irq_enable(), doing so is fairly
> > peculiar anyway and it seems better to just do what everyone
> > else does here.
>
> This broke the hpsa block device driver on my HP RX2600.

Hi Adrian,

Sorry for causing this bug, and thank you for bisecting it
down to my patch.

Do you see any other strange behavior with that patch, or is
this the only symptom you are aware of?

>
> I'm seeing this backtrace now:
>
> [  905.883273] usb 1-2: SerialNumber: A60020000001
> [  905.918170]  sda: sda1 sda2 sda3
> [  905.920107] sd 0:1:0:0: [sda] Attached SCSI disk
> [  905.944102] usb-storage 1-2:1.0: USB Mass Storage device detected
> [  905.944102] scsi host1: usb-storage 1-2:1.0
> [  905.944102] usbcore: registered new interface driver usb-storage
> [  905.944117] usbcore: registered new interface driver uas

The timestamps show that time is moving forward, which is at least
something. Do you have a feeling for whether the timestamps are
counting in (roughly) the correct speed, or is it going much faster
or slower than it should?

To clarify: the [905.944117] numbers are seconds/microseconds
since boot, so message would be 906 seconds after the kernel
started.

> Begin: Loading essential drivers ... done.                                                                                                                        > Begin: Running /scripts/init-premount ... done.                                                                                                                 > Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.

Ok, so it gets into user space. Is this initramfs or the actual read-only root?

> [  906.666923] hpsa 0000:05:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
> [  906.670923] hpsa 0000:05:00.0: device is ready.
> [  906.670923] hpsa 0000:05:00.0: scsi 0:1:0:0: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
> done.
> [  906.722166] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [  906.722166] rcu:     2-....: (3 ticks this GP) idle=fe6/1/0x4000000000000000 softirq=693/698 fqs=4
> [  906.722166]  (detected by 0, t=6115 jiffies, g=465, q=80)
This appears to be an 'rcu stall' warning, from print_cpu_stall_info(),
indicating that timer ticks are missing.

> [  909.360108] INFO: task systemd-sysv-ge:200 blocked for more than 127 seconds.
> [  909.360108]       Not tainted 5.10.0+ #130
> [  909.360108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  909.360108] task:systemd-sysv-ge state:D stack:    0 pid:  200 ppid:   189 flags:0x00000000
> [  909.364108]
> [  909.364108] Call Trace:
> [  909.364423]  [<a00000010109b210>] __schedule+0x890/0x21e0
> [  909.364423]                                 sp=e0000100487d7b70 bsp=e0000100487d1748
> [  909.368423]  [<a00000010109cc00>] schedule+0xa0/0x240
> [  909.368423]                                 sp=e0000100487d7b90 bsp=e0000100487d16e0
> [  909.368558]  [<a00000010109ce70>] io_schedule+0x70/0xa0
> [  909.368558]                                 sp=e0000100487d7b90 bsp=e0000100487d16c0
> [  909.372290]  [<a00000010109e1c0>] bit_wait_io+0x20/0xe0
> [  909.372290]                                 sp=e0000100487d7b90 bsp=e0000100487d1698
> [  909.374168] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [  909.376290]  [<a00000010109d860>] __wait_on_bit+0xc0/0x1c0
> [  909.376290]                                 sp=e0000100487d7b90 bsp=e0000100487d1648
> [  909.374168] rcu:     3-....: (2 ticks this GP) idle=19e/1/0x4000000000000002 softirq=1581/1581 fqs=2
> [  909.374168]  (detected by 0, t=5661 jiffies, g=1089, q=3)
> [  909.376290]  [<a00000010109da80>] out_of_line_wait_on_bit+0x120/0x140
> [  909.376290]                                 sp=e0000100487d7b90 bsp=e0000100487d1610
> [  909.374168] Task dump for CPU 3:
> [  909.374168] task:khungtaskd      state:R  running task

and this seems to be another instance of the same. I would assume that this
is completely unrelated to the block driver and just happened to trigger during
the same time the driver was doing something.

Can you see in your full logs if the "Oops: timer tick before it's due" warning
triggered at any point?

I've attached a patch for a partial revert of my original change, this
should still work with the final cleanup on top, but restore the loop
plus the local_irq_enable()/local_irq_disable() that I dropped from
the original code. Does this make a difference?

        Arnd

[-- Attachment #2: ia64-timer-partial-revert.diff --]
[-- Type: text/x-patch, Size: 1319 bytes --]

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 9431edb08508..e3d9c8088d56 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -161,29 +161,34 @@ void vtime_account_idle(struct task_struct *tsk)
 static irqreturn_t
 timer_interrupt (int irq, void *dev_id)
 {
-	unsigned long cur_itm, new_itm, ticks;
+	unsigned long new_itm;
 
 	if (cpu_is_offline(smp_processor_id())) {
 		return IRQ_HANDLED;
 	}
 
 	new_itm = local_cpu_data->itm_next;
-	cur_itm = ia64_get_itc();
 
-	if (!time_after(cur_itm, new_itm)) {
+	if (!time_after(ia64_get_itc(), new_itm))
 		printk(KERN_ERR "Oops: timer tick before it's due (itc=%lx,itm=%lx)\n",
-		       cur_itm, new_itm);
-		ticks = 1;
-	} else {
-		ticks = DIV_ROUND_UP(cur_itm - new_itm,
-				     local_cpu_data->itm_delta);
-		new_itm += ticks * local_cpu_data->itm_delta;
-	}
+		       ia64_get_itc(), new_itm);
+
+	while (1) {
+		new_itm += local_cpu_data->itm_delta;
+
+		legacy_timer_tick(smp_processor_id() == time_keeper_id);
 
-	if (smp_processor_id() != time_keeper_id)
-		ticks = 0;
+		local_cpu_data->itm_next = new_itm;
 
-	legacy_timer_tick(ticks);
+		if (time_after(new_itm, ia64_get_itc()))
+			break;
+
+		/*
+		 * Allow IPIs to interrupt the timer loop.
+		 */
+		local_irq_enable();
+		local_irq_disable();
+	}
 
 	do {
 		/*

  reply	other threads:[~2020-12-18 22:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 15:17 [PATCH v2 00/15] timers: clean up ARCH_GETTIMEOFFSET, LEGACY_TIMER_TICK Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 01/15] ARM: remove ebsa110 platform Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 02/15] net: remove am79c961a driver Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 03/15] timekeeping: remove arch_gettimeoffset Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 04/15] timekeeping: add CONFIG_LEGACY_TIMER_TICK Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 05/15] ia64: convert to legacy_timer_tick Arnd Bergmann
2020-12-18 21:29   ` John Paul Adrian Glaubitz
2020-12-18 22:07     ` Arnd Bergmann [this message]
2020-12-18 22:13       ` John Paul Adrian Glaubitz
2020-12-18 22:32         ` John Paul Adrian Glaubitz
2020-10-30 15:17 ` [PATCH v2 06/15] ARM: rpc: use legacy_timer_tick Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 07/15] parisc: " Arnd Bergmann
2020-10-30 17:13   ` Helge Deller
2020-10-30 19:33     ` Arnd Bergmann
2020-10-30 19:48       ` Helge Deller
2020-10-30 20:37         ` Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 08/15] m68k: coldfire: use legacy_timer_tick() Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 09/15] m68k: split heartbeat out of timer function Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 10/15] m68k: sun3/sun3c: use legacy_timer_tick Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 11/15] m68k: m68328: use legacy_timer_tick() Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 12/15] m68k: change remaining timers to legacy_timer_tick Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 13/15] m68k: remove timer_interrupt() function Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 14/15] timekeeping: remove xtime_update Arnd Bergmann
2020-10-30 15:17 ` [PATCH v2 15/15] timekeeping: default GENERIC_CLOCKEVENTS to enabled Arnd Bergmann
2020-11-01  3:43 ` [PATCH v2 00/15] timers: clean up ARCH_GETTIMEOFFSET, LEGACY_TIMER_TICK Finn Thain
2020-11-01  9:53   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK8P3a0s5Wt=9sZXAV1JzebZFXVy4qn3KNvygEfPeH3eVMyO_w@mail.gmail.com' \
    --to=arnd@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=arnd@arndb.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=deller@gmx.de \
    --cc=fenghua.yu@intel.com \
    --cc=fthain@telegraphics.com.au \
    --cc=funaho@jurai.org \
    --cc=gerg@linux-m68k.org \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=john.stultz@linaro.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=matorola@gmail.com \
    --cc=philb@gnu.org \
    --cc=rppt@linux.ibm.com \
    --cc=sammy@sammy.net \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).