All of lore.kernel.org
 help / color / mirror / Atom feed
* 5.4+: PAGE FAULT crashes the system multiple times per 24h
@ 2020-02-10 14:39 Udo van den Heuvel
  2020-02-10 16:04 ` Gabriel C
  0 siblings, 1 reply; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-10 14:39 UTC (permalink / raw)
  To: linux-mm@vger.kernel.org

Hello,

Would this be a bug in the mm area?

For bug https://bugzilla.kernel.org/show_bug.cgi?id=206191 I have been
bisecting way but now the process landed me with a kernel that cannot
find the root fs. (with either good or bad bisect choices)

Pictures of the crash that is the reason for this bisect:
https://bugzilla.kernel.org/attachment.cgi?id=286787
https://bugzilla.kernel.org/attachment.cgi?id=286789
https://bugzilla.kernel.org/attachment.cgi?id=286791
https://bugzilla.kernel.org/attachment.cgi?id=286793

How can I proceed from here with the bisecting?
Did someone perhaps find the root cause for the page fault?
As the crash is fairly easy to reproduce I can test patches...

Please let me know!

Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 14:39 5.4+: PAGE FAULT crashes the system multiple times per 24h Udo van den Heuvel
@ 2020-02-10 16:04 ` Gabriel C
  2020-02-10 16:24   ` Udo van den Heuvel
  0 siblings, 1 reply; 7+ messages in thread
From: Gabriel C @ 2020-02-10 16:04 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: linux-mm@vger.kernel.org

Am Mo., 10. Feb. 2020 um 15:39 Uhr schrieb Udo van den Heuvel
<udovdh@xs4all.nl>:
>
> Hello,

Hi,

>
> Would this be a bug in the mm area?

I don' know, possible.
Can be everything and nothing, bad OC, bad RAM, broken firmware could
be a cause too.

>
> For bug https://bugzilla.kernel.org/show_bug.cgi?id=206191 I have been
> bisecting way but now the process landed me with a kernel that cannot
> find the root fs. (with either good or bad bisect choices)
>
> Pictures of the crash that is the reason for this bisect:
> https://bugzilla.kernel.org/attachment.cgi?id=286787
> https://bugzilla.kernel.org/attachment.cgi?id=286789
> https://bugzilla.kernel.org/attachment.cgi?id=286791
> https://bugzilla.kernel.org/attachment.cgi?id=286793
>

I looked at some of your logs. I hit freeze/crashes similar to yours
with an R3 APU a while back.
That was caused by a mismatch in kernel -> Xorg driver <-> mesa code + firmware.

I think first you should try to fix your amdgpu bug which is this one:
https://gitlab.freedesktop.org/drm/amd/issues/963

And the fixes are the patchset there:
https://patchwork.freedesktop.org/series/72733/

Also, can you try booting without all these crazy options?
As an example why would you need to force ACPI on your HW?

BR,

Gabriel C.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 16:04 ` Gabriel C
@ 2020-02-10 16:24   ` Udo van den Heuvel
  2020-02-10 17:01     ` Gabriel C
  0 siblings, 1 reply; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-10 16:24 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-mm@vger.kernel.org

Hello Gabriel,

Thank you kindly for your rmail and teh links inthere, I will most
certainly look into those.

On 10-02-2020 17:04, Gabriel C wrote:
> I think first you should try to fix your amdgpu bug which is this one:
> https://gitlab.freedesktop.org/drm/amd/issues/963
> 
> And the fixes are the patchset there:
> https://patchwork.freedesktop.org/series/72733/

Thanks, will try those on 5.5.2.

> Also, can you try booting without all these crazy options?

What is crazy here?
Each one has a story.

> As an example why would you need to force ACPI on your HW?

Force?
Because then I can be certain it will be there, this has been there for
quite a while.
Or would you suggest I run my x86_64 without acpi? (I am not an expert
in this area yet)

noexec=on noexec32=on vga=0xF06 SYSFONT=latarcyrheb-sun16
LANG=en_US.UTF-8 KEYTABLE=us
fbcon=font:VGA8x16

Not important I guess.

acpi_enforce_resources=lax

To avoid conflict.

radeon.pcie_gen2=1

To enable PCIE gen 2

cgroup_disable=memory

No control groups for memory.

threadirqs

Theads for irqs.

plymouth.enable=0 rd.plymouth=0

No plymouth.

mce=dont_log_ce

To avoid logging.

panic=0

Kernel behaviour.

rd.lvm.vg=myvg  rd.lvm.vg=ssdvg

To have the kernel open the vg

radeon.dpm=1

We want power management

zswap.enabled=1

We want zswap.

rd.auto=1

enable autoassembly of special devices like cryptoLUKS, dmraid,
   mdraid or lvm.

audit=0

No audit.

systemd.log_level=warning

Less systemd clutter in logging.

ip=192.168.10.70::192.168.10.98:255.255.255.0:::off:192.168.10.98
rd.neednet=1

This is unnecessary.

net.ifnames=0

Old style network interface names.

amdgpu.gttsize=8192

Had to do with viewing larger PDFs, for genealogy etc.

clocksource=hpet

We want hpet. Not tsc.

amdgpu.lockup_timeout=0

rd.luks.options=discard

We want to use discard on our ssd's.

elevator=mq-deadline

We want a different scheduler for ssd versus hdd.




Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 16:24   ` Udo van den Heuvel
@ 2020-02-10 17:01     ` Gabriel C
  2020-02-11  2:56       ` Udo van den Heuvel
  2020-02-11 17:04       ` Udo van den Heuvel
  0 siblings, 2 replies; 7+ messages in thread
From: Gabriel C @ 2020-02-10 17:01 UTC (permalink / raw)
  To: Udo van den Heuvel; +Cc: linux-mm@vger.kernel.org

Am Mo., 10. Feb. 2020 um 17:25 Uhr schrieb Udo van den Heuvel
<udovdh@xs4all.nl>:
>
> Hello Gabriel,
>
> Thank you kindly for your rmail and teh links inthere, I will most
> certainly look into those.
>
> On 10-02-2020 17:04, Gabriel C wrote:
> > I think first you should try to fix your amdgpu bug which is this one:
> > https://gitlab.freedesktop.org/drm/amd/issues/963
> >
> > And the fixes are the patchset there:
> > https://patchwork.freedesktop.org/series/72733/
>
> Thanks, will try those on 5.5.2.
>
> > Also, can you try booting without all these crazy options?
>
> What is crazy here?
> Each one has a story.
>

Sure, I'm not saying to not use these.

But try to boot a kernel with only what you need to boot when hunting bugs.
As an example, if such a kernel works then you know for sure one of
the option or a combination causes bugs.

> > As an example why would you need to force ACPI on your HW?
>
> Force?
> Because then I can be certain it will be there, this has been there for
> quite a while.
> Or would you suggest I run my x86_64 without acpi? (I am not an expert
> in this area yet)

The force parameter is used to try to enable ACPI on HW has is OFF by
default, you don't need that.

....

> rd.luks.options=discard
>
> We want to use discard on our ssd's.

Use mount options?

> elevator=mq-deadline
>We want a different scheduler for ssd versus hdd.

If you really want that you should use udev rules for SSD/NVME/HDD/USB etc.

BR,

Gabriel C.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 17:01     ` Gabriel C
@ 2020-02-11  2:56       ` Udo van den Heuvel
  2020-02-11 17:04       ` Udo van den Heuvel
  1 sibling, 0 replies; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-11  2:56 UTC (permalink / raw)
  To: Gabriel C; +Cc: linux-mm@vger.kernel.org

On 10-02-2020 18:01, Gabriel C wrote:
>> rd.luks.options=discard
>>
>> We want to use discard on our ssd's.
> 
> Use mount options?

Not enough to make it work.

> 
>> elevator=mq-deadline
>> We want a different scheduler for ssd versus hdd.
> 
> If you really want that you should use udev rules for SSD/NVME/HDD/USB etc.

Simply load the scheduler module and set the scheduler in rc.local is
easier.


Udo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.4+: PAGE FAULT crashes the system multiple times per 24h
  2020-02-10 17:01     ` Gabriel C
  2020-02-11  2:56       ` Udo van den Heuvel
@ 2020-02-11 17:04       ` Udo van den Heuvel
  1 sibling, 0 replies; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-11 17:04 UTC (permalink / raw)
  Cc: linux-mm@vger.kernel.org

On 10-02-2020 18:01, Gabriel C wrote:
> But try to boot a kernel with only what you need to boot when hunting bugs.
> As an example, if such a kernel works then you know for sure one of
> the option or a combination causes bugs.

These options are reasonable and necessary; so far things worked OK.
So why would they start being an issue?
And how can I even proceed when the kernel cannot find a rootfs anymore
while bisecting?
5.5.2 also has the page fault issue.
So why Linus does call 5.5.x 'stable' is beyond me.

How can I continue and find the root cause for the page fault hang?


> The force parameter is used to try to enable ACPI on HW has is OFF by
> default, you don't need that.

I booted 5.5.3 without acpi=force and dmesg output with `acpi` in it
looks similar.
So acpi=force wil be removed from future kernel commandlines.

>> We want to use discard on our ssd's.
> 
> Use mount options?

Not enough to make it work for LUKS.

>> elevator=mq-deadline
>> We want a different scheduler for ssd versus hdd.
> 
> If you really want that you should use udev rules for SSD/NVME/HDD/USB etc.

/etc/rc.d/rc.local is easier.
Look at the overhead of a service file.

Same as the overhead of NetworkManager versus a few kilobytes of
network-scripts.
But Fedora thinks otherwise....


Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 5.4+: PAGE FAULT crashes the system multiple times per 24h
@ 2020-02-09  8:39 Udo van den Heuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Udo van den Heuvel @ 2020-02-09  8:39 UTC (permalink / raw)
  To: linux-kernel

Hello,

For bug https://bugzilla.kernel.org/show_bug.cgi?id=206191 I have been
bisecting way but now the process landed me with a kernel that cannot
find the root fs. (with either good or bad bisect choices)

Pictures of the crash that is the reason for this bisect:
https://bugzilla.kernel.org/attachment.cgi?id=286787
https://bugzilla.kernel.org/attachment.cgi?id=286789
https://bugzilla.kernel.org/attachment.cgi?id=286791
https://bugzilla.kernel.org/attachment.cgi?id=286793

How can I proceed from here with the bisecting?
Did someone perhaps find the root cause for the page fault?
As the crash is fairly easy to reproduce I can test patches...

Please let me know!

Kind regards,
Udo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-02-11 17:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 14:39 5.4+: PAGE FAULT crashes the system multiple times per 24h Udo van den Heuvel
2020-02-10 16:04 ` Gabriel C
2020-02-10 16:24   ` Udo van den Heuvel
2020-02-10 17:01     ` Gabriel C
2020-02-11  2:56       ` Udo van den Heuvel
2020-02-11 17:04       ` Udo van den Heuvel
  -- strict thread matches above, loose matches on Subject: below --
2020-02-09  8:39 Udo van den Heuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.