All of lore.kernel.org
 help / color / mirror / Atom feed
* Page fault in kernel code
       [not found] <CAJKgH8Df51ZL-BaN_zBmtP=2tjxh5po6KWdbR1Q7LwiR2DZzTg@mail.gmail.com>
@ 2014-09-09 13:23 ` Manavendra Nath Manav
  2014-09-09 14:25   ` Greg KH
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Manavendra Nath Manav @ 2014-09-09 13:23 UTC (permalink / raw)
  To: kernelnewbies

While reading the book Essential Linux device drivers it says "user mode
code is allowed to page fault, however, whereas kernel mode code isn't".

Why is it so? Why can't kernel mode code handle the page fault and reload
the page from swap? Also, can page fault occur when kernel is executing in
process context and/or interrupt context?

-- manav m-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140909/6be1c2dc/attachment.html 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-09 13:23 ` Page fault in kernel code Manavendra Nath Manav
@ 2014-09-09 14:25   ` Greg KH
  2014-09-09 15:51   ` Valdis.Kletnieks at vt.edu
  2014-09-09 16:54   ` Jeff Haran
  2 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2014-09-09 14:25 UTC (permalink / raw)
  To: kernelnewbies

On Tue, Sep 09, 2014 at 06:53:55PM +0530, Manavendra Nath Manav wrote:
> While reading the book Essential Linux device drivers it says "user mode code
> is allowed to page fault, however, whereas kernel mode code isn't".
> 
> Why is it so? Why can't kernel mode code handle the page fault and reload the
> page from swap? Also, can page fault occur when kernel is executing in process
> context and/or interrupt context?

That is just the way the Linux kernel is designed, no page faults within
it, unlike other operating systems.  In the end, it makes kernel code
much simpler.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-09 13:23 ` Page fault in kernel code Manavendra Nath Manav
  2014-09-09 14:25   ` Greg KH
@ 2014-09-09 15:51   ` Valdis.Kletnieks at vt.edu
  2014-09-09 16:54   ` Jeff Haran
  2 siblings, 0 replies; 11+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-09-09 15:51 UTC (permalink / raw)
  To: kernelnewbies

On Tue, 09 Sep 2014 18:53:55 +0530, Manavendra Nath Manav said:

> Why is it so? Why can't kernel mode code handle the page fault and reload
> the page from swap? Also, can page fault occur when kernel is executing in
> process context and/or interrupt context?

There's no inherent chiseled-in-stone rule that says "the operating systems
kernel may not page fault", and in fact many operating systems allow it. The
IBM OS/360 family, starting with VS/1 and MVS (as OS/360's MFT and MVT variants
ran on hardware that didn't do virtual memory) clear through Z/OS 40 years
later now all supported having part of their kernel be pageable.  I've worked
with several Unix variants that allowed parts of the kernel to be pageable.

But that's a design decision that adds little real benefit, especially on
today's large RAM systems - even a Raspberry Pi has enough memory that you
don't really need to worry about making the kernel pageable.

Cautionary tale:  I once had a UTX/32 system that had routines for recovery
from disk errors (in particular, recovering and forwarding of bad blocks to
spare blocks was done by the host, *not* the device), and supported having
about 1/3 of the kernel code be pageable (this was in 1985 or so, and a
Powernode/9080 with 16M of RAM was a *big* system, so being able to put 500K of
a 1.5M kernel out on disk was a big win for performance).  I'll let you think
about what sort of afternoon I had the day that we kept hitting an I/O error on
a bad block in the swap area (which quite reasonably paused all I/O to the
failing disk until the error recovery routine ran), while the block-forwarder
module was swapped out....

(And I've had to debug similar dork-ups in VS/1, VM/SP, and MUSIC as well.
Actually... hmm, yep.  I think I've seen every single OS I've worked with in 3
decades that supported paged kernel end up shooting itself in the foot because
the wrong thing was paged out at the wrong time. That stuff is *hard* to get
right...)

That sort of thing is why Linus decided Just Say No. ;)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140909/19b06859/attachment.bin 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-09 13:23 ` Page fault in kernel code Manavendra Nath Manav
  2014-09-09 14:25   ` Greg KH
  2014-09-09 15:51   ` Valdis.Kletnieks at vt.edu
@ 2014-09-09 16:54   ` Jeff Haran
  2014-09-10  9:15     ` Manavendra Nath Manav
  2 siblings, 1 reply; 11+ messages in thread
From: Jeff Haran @ 2014-09-09 16:54 UTC (permalink / raw)
  To: kernelnewbies



From: kernelnewbies-bounces+jharan=bytemobile.com@kernelnewbies.org [mailto:kernelnewbies-bounces+jharan=bytemobile.com at kernelnewbies.org] On Behalf Of Manavendra Nath Manav
Sent: Tuesday, September 09, 2014 6:24 AM
To: kernelnewbies at kernelnewbies.org; feedback at elinuxdd.com
Subject: Page fault in kernel code


While reading the book Essential Linux device drivers it says "user mode code is allowed to page fault, however, whereas kernel mode code isn't".

Why is it so? Why can't kernel mode code handle the page fault and reload the page from swap? Also, can page fault occur when kernel is executing in process context and/or interrupt context?

-- manav m-n

Think about handling the case where a page fault has occurred but the code that handles the page fault is itself not already in RAM, which leads to another page fault. Gets complicated. That complexity can be avoided by keeping all the kernel code in RAM all the time. Same applies to the kernel data that is needed to handle a page fault.

Jeff Haran


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140909/1a2c1f4f/attachment-0001.html 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-09 16:54   ` Jeff Haran
@ 2014-09-10  9:15     ` Manavendra Nath Manav
  2014-09-10 12:54       ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 11+ messages in thread
From: Manavendra Nath Manav @ 2014-09-10  9:15 UTC (permalink / raw)
  To: kernelnewbies

On 09-Sep-2014 10:25 pm, "Jeff Haran" <Jeff.Haran@citrix.com> wrote:
>
>
> While reading the book Essential Linux device drivers it says "user mode
code is allowed to page fault, however, whereas kernel mode code isn't".
>
> Why is it so? Why can't kernel mode code handle the page fault and reload
the page from swap? Also, can page fault occur when kernel is executing in
process context and/or interrupt context?
>
> -- manav m-n
>
> Think about handling the case where a page fault has occurred but the
code that handles the page fault is itself not already in RAM, which leads
to another page fault. Gets complicated. That complexity can be avoided by
keeping all the kernel code in RAM all the time. Same applies to the kernel
data that is needed to handle a page fault.
>
> Jeff Haran
>
>

But if the total RAM is limited (less than 896MB LOWMEM), for example as in
embedded devices how the kernel code be kept in RAM all the time. Am I
correct to assume that the kernel pre-fetches all pages when entering
kernel mode from user mode?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140910/eb2f31cd/attachment-0001.html 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-10  9:15     ` Manavendra Nath Manav
@ 2014-09-10 12:54       ` Valdis.Kletnieks at vt.edu
  2014-09-10 14:52         ` Manavendra Nath Manav
  0 siblings, 1 reply; 11+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-09-10 12:54 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 10 Sep 2014 14:45:23 +0530, Manavendra Nath Manav said:

> But if the total RAM is limited (less than 896MB LOWMEM), for example as in
> embedded devices how the kernel code be kept in RAM all the time. Am I
> correct to assume that the kernel pre-fetches all pages when entering
> kernel mode from user mode?

No, kernel code is loaded by your boot loader, and *it stays there*.  Similarly,
if you modprobe something, the kernel allocates the page, loads the code,
and leaves it there.

Particularly in embedded devices, where you know all the modules the kernel may
need, it's common to just create a kernel with everything built in, no module
support, and when the system boots, it loads into memory and never moves again.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140910/08b61386/attachment.bin 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-10 12:54       ` Valdis.Kletnieks at vt.edu
@ 2014-09-10 14:52         ` Manavendra Nath Manav
  2014-09-11 12:03           ` Leon Romanovsky
  0 siblings, 1 reply; 11+ messages in thread
From: Manavendra Nath Manav @ 2014-09-10 14:52 UTC (permalink / raw)
  To: kernelnewbies

On 10-Sep-2014 6:24 pm, <Valdis.Kletnieks@vt.edu> wrote:
>
> On Wed, 10 Sep 2014 14:45:23 +0530, Manavendra Nath Manav said:
>
> > But if the total RAM is limited (less than 896MB LOWMEM), for example
as in
> > embedded devices how the kernel code be kept in RAM all the time. Am I
> > correct to assume that the kernel pre-fetches all pages when entering
> > kernel mode from user mode?
>
> No, kernel code is loaded by your boot loader, and *it stays there*.
Similarly,
> if you modprobe something, the kernel allocates the page, loads the code,
> and leaves it there.
>
> Particularly in embedded devices, where you know all the modules the
kernel may
> need, it's common to just create a kernel with everything built in, no
module
> support, and when the system boots, it loads into memory and never moves
again.
>

Linux kernel memory is not page-able, but memory allocated through vmalloc
can still cause page fault. How device drivers using vmalloc handle this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140910/6cf378b6/attachment.html 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-10 14:52         ` Manavendra Nath Manav
@ 2014-09-11 12:03           ` Leon Romanovsky
  2014-09-11 14:19             ` Christoph Lameter
  2014-09-11 14:53             ` Miles MH Chen
  0 siblings, 2 replies; 11+ messages in thread
From: Leon Romanovsky @ 2014-09-11 12:03 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Sep 10, 2014 at 5:52 PM, Manavendra Nath Manav
<mnm.kernel@gmail.com> wrote:
>
> On 10-Sep-2014 6:24 pm, <Valdis.Kletnieks@vt.edu> wrote:
>>
>> On Wed, 10 Sep 2014 14:45:23 +0530, Manavendra Nath Manav said:
>>
>> > But if the total RAM is limited (less than 896MB LOWMEM), for example as
>> > in
>> > embedded devices how the kernel code be kept in RAM all the time. Am I
>> > correct to assume that the kernel pre-fetches all pages when entering
>> > kernel mode from user mode?
>>
>> No, kernel code is loaded by your boot loader, and *it stays there*.
>> Similarly,
>> if you modprobe something, the kernel allocates the page, loads the code,
>> and leaves it there.
>>
>> Particularly in embedded devices, where you know all the modules the
>> kernel may
>> need, it's common to just create a kernel with everything built in, no
>> module
>> support, and when the system boots, it loads into memory and never moves
>> again.
>>
>
> Linux kernel memory is not page-able, but memory allocated through vmalloc
> can still cause page fault. How device drivers using vmalloc handle this?
Pages allocated via vmalloc call won't generate page-faults.

>
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>



-- 
Leon Romanovsky | Independent Linux Consultant
        www.leon.nu | leon at leon.nu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-11 12:03           ` Leon Romanovsky
@ 2014-09-11 14:19             ` Christoph Lameter
  2014-09-11 14:53             ` Miles MH Chen
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2014-09-11 14:19 UTC (permalink / raw)
  To: kernelnewbies

On Thu, 11 Sep 2014, Leon Romanovsky wrote:

> > Linux kernel memory is not page-able, but memory allocated through vmalloc
> > can still cause page fault. How device drivers using vmalloc handle this?
> Pages allocated via vmalloc call won't generate page-faults.

Kernel faults are used on some platforms in a very limited way but not
for swapping in or "paging in from disk". Have a look at
linux/arch/x86/mm/fault.c:


      /*
         * We fault-in kernel-space virtual memory on-demand. The
         * 'reference' page table is init_mm.pgd.
         *
         * NOTE! We MUST NOT take any locks for this case. We may
         * be in an interrupt or a critical region, and should
         * only copy the information from the master page table,
         * nothing more.
         *
         * This verifies that the fault happens in kernel space
         * (error_code & 4) == 0, and that the fault was not a
         * protection error (error_code & 9) == 0.
         */
        if (unlikely(fault_in_kernel_space(address))) {
                if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) {
                        if (vmalloc_fault(address) >= 0)
                                return;

                        if (kmemcheck_fault(regs, address, error_code))
                                return;
                }

                /* Can handle a stale RO->RW TLB: */
                if (spurious_fault(error_code, address))
                        return;

                /* kprobes don't want to hook the spurious faults: */
                if (kprobes_fault(regs))
                        return;
                /*
                 * Don't take the mm semaphore here. If we fixup a prefetch
                 * fault we could otherwise deadlock:
                 */
                bad_area_nosemaphore(regs, error_code, address);

                return;
        }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-11 12:03           ` Leon Romanovsky
  2014-09-11 14:19             ` Christoph Lameter
@ 2014-09-11 14:53             ` Miles MH Chen
  2014-09-11 15:26               ` Leon Romanovsky
  1 sibling, 1 reply; 11+ messages in thread
From: Miles MH Chen @ 2014-09-11 14:53 UTC (permalink / raw)
  To: kernelnewbies

Not exactly, vmalloc'ed addresses can generate page faults.

vmalloc'ed page entries live in kernel master page table, not in

every process' page table. When a vmalloc page fault occurs,

kernel simply copy the page table entry from master page table to
the current process' page table and fix the page fault.

MH

On Thu, Sep 11, 2014 at 8:03 PM, Leon Romanovsky <leon@leon.nu> wrote:

> On Wed, Sep 10, 2014 at 5:52 PM, Manavendra Nath Manav
> <mnm.kernel@gmail.com> wrote:
> >
> > On 10-Sep-2014 6:24 pm, <Valdis.Kletnieks@vt.edu> wrote:
> >>
> >> On Wed, 10 Sep 2014 14:45:23 +0530, Manavendra Nath Manav said:
> >>
> >> > But if the total RAM is limited (less than 896MB LOWMEM), for example
> as
> >> > in
> >> > embedded devices how the kernel code be kept in RAM all the time. Am I
> >> > correct to assume that the kernel pre-fetches all pages when entering
> >> > kernel mode from user mode?
> >>
> >> No, kernel code is loaded by your boot loader, and *it stays there*.
> >> Similarly,
> >> if you modprobe something, the kernel allocates the page, loads the
> code,
> >> and leaves it there.
> >>
> >> Particularly in embedded devices, where you know all the modules the
> >> kernel may
> >> need, it's common to just create a kernel with everything built in, no
> >> module
> >> support, and when the system boots, it loads into memory and never moves
> >> again.
> >>
> >
> > Linux kernel memory is not page-able, but memory allocated through
> vmalloc
> > can still cause page fault. How device drivers using vmalloc handle this?
> Pages allocated via vmalloc call won't generate page-faults.
>
> >
> >
> > _______________________________________________
> > Kernelnewbies mailing list
> > Kernelnewbies at kernelnewbies.org
> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >
>
>
>
> --
> Leon Romanovsky | Independent Linux Consultant
>         www.leon.nu | leon at leon.nu
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140911/1dc64a22/attachment.html 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Page fault in kernel code
  2014-09-11 14:53             ` Miles MH Chen
@ 2014-09-11 15:26               ` Leon Romanovsky
  0 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2014-09-11 15:26 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Sep 11, 2014 at 5:53 PM, Miles MH Chen <orca.chen@gmail.com> wrote:
> Not exactly, vmalloc'ed addresses can generate page faults.
>
> vmalloc'ed page entries live in kernel master page table, not in
>
> every process' page table. When a vmalloc page fault occurs,
>
> kernel simply copy the page table entry from master page table to
>
> the current process' page table and fix the page fault.
>
> MH
Thanks, I was under wrong impression that the difference between page
allocated in kmalloc vs. vmalloc is in PTE mapping.

>
> On Thu, Sep 11, 2014 at 8:03 PM, Leon Romanovsky <leon@leon.nu> wrote:
>>
>> On Wed, Sep 10, 2014 at 5:52 PM, Manavendra Nath Manav
>> <mnm.kernel@gmail.com> wrote:
>> >
>> > On 10-Sep-2014 6:24 pm, <Valdis.Kletnieks@vt.edu> wrote:
>> >>
>> >> On Wed, 10 Sep 2014 14:45:23 +0530, Manavendra Nath Manav said:
>> >>
>> >> > But if the total RAM is limited (less than 896MB LOWMEM), for example
>> >> > as
>> >> > in
>> >> > embedded devices how the kernel code be kept in RAM all the time. Am
>> >> > I
>> >> > correct to assume that the kernel pre-fetches all pages when entering
>> >> > kernel mode from user mode?
>> >>
>> >> No, kernel code is loaded by your boot loader, and *it stays there*.
>> >> Similarly,
>> >> if you modprobe something, the kernel allocates the page, loads the
>> >> code,
>> >> and leaves it there.
>> >>
>> >> Particularly in embedded devices, where you know all the modules the
>> >> kernel may
>> >> need, it's common to just create a kernel with everything built in, no
>> >> module
>> >> support, and when the system boots, it loads into memory and never
>> >> moves
>> >> again.
>> >>
>> >
>> > Linux kernel memory is not page-able, but memory allocated through
>> > vmalloc
>> > can still cause page fault. How device drivers using vmalloc handle
>> > this?
>> Pages allocated via vmalloc call won't generate page-faults.
>>
>> >
>> >
>> > _______________________________________________
>> > Kernelnewbies mailing list
>> > Kernelnewbies at kernelnewbies.org
>> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>> >
>>
>>
>>
>> --
>> Leon Romanovsky | Independent Linux Consultant
>>         www.leon.nu | leon at leon.nu
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>



-- 
Leon Romanovsky | Independent Linux Consultant
        www.leon.nu | leon at leon.nu

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-09-11 15:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAJKgH8Df51ZL-BaN_zBmtP=2tjxh5po6KWdbR1Q7LwiR2DZzTg@mail.gmail.com>
2014-09-09 13:23 ` Page fault in kernel code Manavendra Nath Manav
2014-09-09 14:25   ` Greg KH
2014-09-09 15:51   ` Valdis.Kletnieks at vt.edu
2014-09-09 16:54   ` Jeff Haran
2014-09-10  9:15     ` Manavendra Nath Manav
2014-09-10 12:54       ` Valdis.Kletnieks at vt.edu
2014-09-10 14:52         ` Manavendra Nath Manav
2014-09-11 12:03           ` Leon Romanovsky
2014-09-11 14:19             ` Christoph Lameter
2014-09-11 14:53             ` Miles MH Chen
2014-09-11 15:26               ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.