linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.4+: PAGE FAULT crashes the system multiple times per 24h
@ 2020-02-10 14:39 Udo van den Heuvel
  2020-02-10 16:04 ` Gabriel C
  0 siblings, 1 reply; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-10 14:39 UTC (permalink / raw)
  To: linux-mm@vger.kernel.org

Hello,

Would this be a bug in the mm area?

For bug https://bugzilla.kernel.org/show_bug.cgi?id=206191 I have been
bisecting way but now the process landed me with a kernel that cannot
find the root fs. (with either good or bad bisect choices)

Pictures of the crash that is the reason for this bisect:
https://bugzilla.kernel.org/attachment.cgi?id=286787
https://bugzilla.kernel.org/attachment.cgi?id=286789
https://bugzilla.kernel.org/attachment.cgi?id=286791
https://bugzilla.kernel.org/attachment.cgi?id=286793

How can I proceed from here with the bisecting?
Did someone perhaps find the root cause for the page fault?
As the crash is fairly easy to reproduce I can test patches...

Please let me know!

Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 14:39 5.4+: PAGE FAULT crashes the system multiple times per 24h Udo van den Heuvel
@ 2020-02-10 16:04 ` Gabriel C
  2020-02-10 16:24   ` Udo van den Heuvel
  0 siblings, 1 reply; 7+ messages in thread
From: Gabriel C @ 2020-02-10 16:04 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: linux-mm@vger.kernel.org

Am Mo., 10. Feb. 2020 um 15:39 Uhr schrieb Udo van den Heuvel
<udovdh@xs4all.nl>:
>
> Hello,

Hi,

>
> Would this be a bug in the mm area?

I don' know, possible.
Can be everything and nothing, bad OC, bad RAM, broken firmware could
be a cause too.

>
> For bug https://bugzilla.kernel.org/show_bug.cgi?id=206191 I have been
> bisecting way but now the process landed me with a kernel that cannot
> find the root fs. (with either good or bad bisect choices)
>
> Pictures of the crash that is the reason for this bisect:
> https://bugzilla.kernel.org/attachment.cgi?id=286787
> https://bugzilla.kernel.org/attachment.cgi?id=286789
> https://bugzilla.kernel.org/attachment.cgi?id=286791
> https://bugzilla.kernel.org/attachment.cgi?id=286793
>

I looked at some of your logs. I hit freeze/crashes similar to yours
with an R3 APU a while back.
That was caused by a mismatch in kernel -> Xorg driver <-> mesa code + firmware.

I think first you should try to fix your amdgpu bug which is this one:
https://gitlab.freedesktop.org/drm/amd/issues/963

And the fixes are the patchset there:
https://patchwork.freedesktop.org/series/72733/

Also, can you try booting without all these crazy options?
As an example why would you need to force ACPI on your HW?

BR,

Gabriel C.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 16:04 ` Gabriel C
@ 2020-02-10 16:24   ` Udo van den Heuvel
  2020-02-10 17:01     ` Gabriel C
  0 siblings, 1 reply; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-10 16:24 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-mm@vger.kernel.org

Hello Gabriel,

Thank you kindly for your rmail and teh links inthere, I will most
certainly look into those.

On 10-02-2020 17:04, Gabriel C wrote:
> I think first you should try to fix your amdgpu bug which is this one:
> https://gitlab.freedesktop.org/drm/amd/issues/963
> 
> And the fixes are the patchset there:
> https://patchwork.freedesktop.org/series/72733/

Thanks, will try those on 5.5.2.

> Also, can you try booting without all these crazy options?

What is crazy here?
Each one has a story.

> As an example why would you need to force ACPI on your HW?

Force?
Because then I can be certain it will be there, this has been there for
quite a while.
Or would you suggest I run my x86_64 without acpi? (I am not an expert
in this area yet)

noexec=on noexec32=on vga=0xF06 SYSFONT=latarcyrheb-sun16
LANG=en_US.UTF-8 KEYTABLE=us
fbcon=font:VGA8x16

Not important I guess.

acpi_enforce_resources=lax

To avoid conflict.

radeon.pcie_gen2=1

To enable PCIE gen 2

cgroup_disable=memory

No control groups for memory.

threadirqs

Theads for irqs.

plymouth.enable=0 rd.plymouth=0

No plymouth.

mce=dont_log_ce

To avoid logging.

panic=0

Kernel behaviour.

rd.lvm.vg=myvg  rd.lvm.vg=ssdvg

To have the kernel open the vg

radeon.dpm=1

We want power management

zswap.enabled=1

We want zswap.

rd.auto=1

enable autoassembly of special devices like cryptoLUKS, dmraid,
   mdraid or lvm.

audit=0

No audit.

systemd.log_level=warning

Less systemd clutter in logging.

ip=192.168.10.70::192.168.10.98:255.255.255.0:::off:192.168.10.98
rd.neednet=1

This is unnecessary.

net.ifnames=0

Old style network interface names.

amdgpu.gttsize=8192

Had to do with viewing larger PDFs, for genealogy etc.

clocksource=hpet

We want hpet. Not tsc.

amdgpu.lockup_timeout=0

rd.luks.options=discard

We want to use discard on our ssd's.

elevator=mq-deadline

We want a different scheduler for ssd versus hdd.




Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 16:24   ` Udo van den Heuvel
@ 2020-02-10 17:01     ` Gabriel C
  2020-02-11  2:56       ` Udo van den Heuvel
  2020-02-11 17:04       ` Udo van den Heuvel
  0 siblings, 2 replies; 7+ messages in thread
From: Gabriel C @ 2020-02-10 17:01 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: linux-mm@vger.kernel.org

Am Mo., 10. Feb. 2020 um 17:25 Uhr schrieb Udo van den Heuvel
<udovdh@xs4all.nl>:
>
> Hello Gabriel,
>
> Thank you kindly for your rmail and teh links inthere, I will most
> certainly look into those.
>
> On 10-02-2020 17:04, Gabriel C wrote:
> > I think first you should try to fix your amdgpu bug which is this one:
> > https://gitlab.freedesktop.org/drm/amd/issues/963
> >
> > And the fixes are the patchset there:
> > https://patchwork.freedesktop.org/series/72733/
>
> Thanks, will try those on 5.5.2.
>
> > Also, can you try booting without all these crazy options?
>
> What is crazy here?
> Each one has a story.
>

Sure, I'm not saying to not use these.

But try to boot a kernel with only what you need to boot when hunting bugs.
As an example, if such a kernel works then you know for sure one of
the option or a combination causes bugs.

> > As an example why would you need to force ACPI on your HW?
>
> Force?
> Because then I can be certain it will be there, this has been there for
> quite a while.
> Or would you suggest I run my x86_64 without acpi? (I am not an expert
> in this area yet)

The force parameter is used to try to enable ACPI on HW has is OFF by
default, you don't need that.

....

> rd.luks.options=discard
>
> We want to use discard on our ssd's.

Use mount options?

> elevator=mq-deadline
>We want a different scheduler for ssd versus hdd.

If you really want that you should use udev rules for SSD/NVME/HDD/USB etc.

BR,

Gabriel C.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 17:01     ` Gabriel C
@ 2020-02-11  2:56       ` Udo van den Heuvel
  2020-02-11 17:04       ` Udo van den Heuvel
  1 sibling, 0 replies; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-11  2:56 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-mm@vger.kernel.org

On 10-02-2020 18:01, Gabriel C wrote:
>> rd.luks.options=discard
>>
>> We want to use discard on our ssd's.
> 
> Use mount options?

Not enough to make it work.

> 
>> elevator=mq-deadline
>> We want a different scheduler for ssd versus hdd.
> 
> If you really want that you should use udev rules for SSD/NVME/HDD/USB etc.

Simply load the scheduler module and set the scheduler in rc.local is
easier.


Udo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 17:01     ` Gabriel C
  2020-02-11  2:56       ` Udo van den Heuvel
@ 2020-02-11 17:04       ` Udo van den Heuvel
  1 sibling, 0 replies; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-11 17:04 UTC (permalink / raw)
  Cc: linux-mm@vger.kernel.org

On 10-02-2020 18:01, Gabriel C wrote:
> But try to boot a kernel with only what you need to boot when hunting bugs.
> As an example, if such a kernel works then you know for sure one of
> the option or a combination causes bugs.

These options are reasonable and necessary; so far things worked OK.
So why would they start being an issue?
And how can I even proceed when the kernel cannot find a rootfs anymore
while bisecting?
5.5.2 also has the page fault issue.
So why Linus does call 5.5.x 'stable' is beyond me.

How can I continue and find the root cause for the page fault hang?


> The force parameter is used to try to enable ACPI on HW has is OFF by
> default, you don't need that.

I booted 5.5.3 without acpi=force and dmesg output with `acpi` in it
looks similar.
So acpi=force wil be removed from future kernel commandlines.

>> We want to use discard on our ssd's.
> 
> Use mount options?

Not enough to make it work for LUKS.

>> elevator=mq-deadline
>> We want a different scheduler for ssd versus hdd.
> 
> If you really want that you should use udev rules for SSD/NVME/HDD/USB etc.

/etc/rc.d/rc.local is easier.
Look at the overhead of a service file.

Same as the overhead of NetworkManager versus a few kilobytes of
network-scripts.
But Fedora thinks otherwise....


Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 5.4+: PAGE FAULT crashes the system multiple times per 24h
@ 2020-02-09  8:39 Udo van den Heuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-09  8:39 UTC (permalink / raw)
  To: linux-kernel

Hello,

For bug https://bugzilla.kernel.org/show_bug.cgi?id=206191 I have been
bisecting way but now the process landed me with a kernel that cannot
find the root fs. (with either good or bad bisect choices)

Pictures of the crash that is the reason for this bisect:
https://bugzilla.kernel.org/attachment.cgi?id=286787
https://bugzilla.kernel.org/attachment.cgi?id=286789
https://bugzilla.kernel.org/attachment.cgi?id=286791
https://bugzilla.kernel.org/attachment.cgi?id=286793

How can I proceed from here with the bisecting?
Did someone perhaps find the root cause for the page fault?
As the crash is fairly easy to reproduce I can test patches...

Please let me know!

Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-02-11 17:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 14:39 5.4+: PAGE FAULT crashes the system multiple times per 24h Udo van den Heuvel
2020-02-10 16:04 ` Gabriel C
2020-02-10 16:24   ` Udo van den Heuvel
2020-02-10 17:01     ` Gabriel C
2020-02-11  2:56       ` Udo van den Heuvel
2020-02-11 17:04       ` Udo van den Heuvel
  -- strict thread matches above, loose matches on Subject: below --
2020-02-09  8:39 Udo van den Heuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).