Monday, January 30, 2012

ARM Linux Boot Sequence

Source :  http://gicl.cs.drexel.edu/people/sevy/linux/ARM_Linux_boot_sequence.html

The following traces the Linux boot sequence for ARM-based systems in the 2.6.18 kernel. It looks at just the earliest stages of the boot process, until the generic non-processor-specific start_kernel function is called. The line numbers of each statement are in parenthese at the end of the line; the kernel source itself can be conveniently browsed on the Linux Cross-Reference website.


zImage decompression

  • arch/arm/boot/compressed/head.S: start (108)
    • First code executed, jumped to by the bootloader, at label "start" (108)
    • save contents of registers r1 and r2 in r7 and r8 to save off architecture ID and atags pointer passed in by bootloader (118)
    • execute arch-specific code (inserted at 146)
      • arch/arm/boot/compressed/head-xscale.S or other arch-specific code file
      • added to build in arch/arm/boot/compressed/Makefile
      • linked into head.S by linker section declaration:  .section “start”
      • flush cache, turn off cache and MMU
    • load registers with stored parameters (152)
      • sp = stack pointer for decompression code (152)
      • r4 = zreladdr = kernel entry point physical address
    • check if running at link address, and fix up global offset table if not (196)
    • zero decompression bss (205)
    • call cache_on to turn on cache (218)
      • defined at arch/arm/boot/compressed/head.S (320)
      • call call_cache_fn to turn on cache as appropriate for processor variant
        • defined at arch/arm/boot/compressed/head.S (505)
        • walk through proc_types list (530) until find corresponding processor
        • call cache-on function in list item corresponding to processor (511)
          • for ARMv5tej core, cache_on function is __armv4_mmu_cache_on (417)
            • call setup_mmu to set up initial page tables since MMU must be on for cache to be on (419)
            • turn on cache and MMU (426)
    • check to make sure won't overwrite image during decompression; assume not for this trace (232)
    • call decompress_kernel to decompress kernel to RAM (277)
    • branch to call_kernel (278)
      • call cache_clean_flush to flush cache contents to RAM (484)
      • call cache_off to turn cache off as expected by kernel initialization routines (485)
      • jump to start of kernel in RAM (489)
        • jump to address in r4 = zreladdr from previous load
          • zreladdr = ZRELADDR = zreladdr-y
          • zreladdr-y specified in arch/arm/mach-vx115/Makefile.boot

ARM-specific kernel code

  • arch/arm/kernel/head.S: stext (72)
    • call __lookup_processor_type (76)
      • defined in arch/arm/kernel/head-common.S (146)
      • search list of supported processor types __proc_info_begin (176)
        • kernel may be built to support more than one processor type
        • list of proc_info_list structs 
          • defined in arch/arm/mm/proc-arm926.S (467) and other corresponding proc-*.S files
          • linked into list by section declaration:  .section ".proc.info.init"
      • return pointer to proc_info_list struct corresponding to processor if found, or loop in error if not
    • call __lookup_machine_type (79)
      • defined in arch/arm/kernel/head-common.S (194)
      • search list of supported machines (boards)
        • kernel may be built to support more than one board
        • list of machine_desc structs 
          • machine_desc struct for boards defined in board-specific file vx115_vep.c
          • linked into list by section declaration that's part of MACHINE_DESC macro
      • return pointer to machine_desc struct corresponding to machine (board)
    • call __create_page_tables to set up initial MMU tables (82)
    • set lr to __enable_mmu, r13 to address of __switch_data (91, 93)
      • lr and r13 used for jumps after the following calls
      • __switch_data defined in arch/arm/kernel/head-common.S (15)
    • call the __cpu_flush function pointer in the previously returned proc_info_list struct (94)
      • offset is #PROCINFO_INITFUNC into struct
      • this function is __arm926_setup for the ARM 926EJ-S, defined in arch/arm/mm/proc-arm926.S (392)
        • initialize caches, writebuffer
        • jump to lr, previously set to address of __enable_mmu
    • __enable_mmu (147)
      • set page table pointer (TTB) in MMU hardware so it knows where to start page-table walks (167)
      • enable MMU so running with virtual addresses (185)
      • jump to r13, previously set to address of __switch_data, whose first field is address of __mmap_switched
        • __switch_data defined in arch/arm/kernel/head-common.S (15)

  • arch/arm/kernel/head-common.S: __mmap_switched (35)
    • copy data segment to RAM (39)
    • zero BSS (45)
    • branch to start_kernel (55)

Processor-independent kernel code

  • init/main.c: start_kernel (456)

Tuesday, January 24, 2012

Linux Kernel Programming–Memory Allocation / Kmalloc VS Vmalloc

Memory allocation in Linux kernel is different from the user space counterpart. The following facts are noteworthy,
  • Kernel memory is not pageable.
  • Kernel memory allocation mistakes can cause system oops (system crash) easily.
  • Kernel memory has limited hard stack size limit.
There’re two ways to allocate memory space for a kernel process, statically from the stack or dynamically from the heap.
Static Memory Allocation
The static memory allocation is normally used you know how much memory space you’ll need. For example,
#define BUF_LEN    2048

char buf[BUF_LEN];
However, the kernel stack size is fixed and limited (the limit is architecture dependent, but normally it’s only tens of kilobytes). Therefore people seldom request big chunk of memory in the stack. The better way is to allocate the memory dynamically from heap.
Dynamic Memory Allocation
There’re two functions available to allocate memory from heap in Linux kernel process,
1. vmalloc
The vmalloc function is defined in /lib/modules/$(uname -r)/build/include/linux/vmalloc.h as below,
void *vmalloc(unsigned long size);
It’s Linux kernel’s version of malloc() function, which is used in user space. Like malloc, the function allocates virtually contiguous memory that may or may not physically contiguous.
To free the memory space allocated by vmalloc, one simply call vfree(), which is defined as,
void vfree(const void *addr);
2. kmalloc
The most commonly used memory allocation function in kernel is kmalloc, which is defined in /lib/modules/$(uname -r)/build/include/linux/slab.h as below,
void * kmalloc(size_t size, int flags);
kmalloc allocates a region of physically contiguous (also virtually contiguous) memory and return the pointer to the allocated memory. It returns NULL when the operation fails.
The behavior of kmalloc is dependent on the second parameter flags. Here only the two most popular flags are introduced,
GFP_KERNEL: this flag indicates a normal kernel memory allocation. The kernel might block the requesting code, free up enough memory and then continue the allocation. It cannot be used in places where sleep is not allowed.
GFP_ATOMIC: this flag indicates the kmalloc function is atomic operation. Atomic operation means the function is performed entirely or not performed at all. It cannot block at the middle of execution. As the kernel cannot jump out of the allocation and free up memory to satisfy the function request, this function has a higher chance of failure with this flag passed in.
There’re several other flags, for example, GFP_DMA for allocation of memory space capable of undergoing Direct Memory Access. For more information, one can refer the links provided in the references section.
To free up the memory allocated by kmalloc, one can use kfree defined as below,
void kfree(const void *objp);
3. vmalloc VS kmalloc
vmalloc allocates virtually contiguous memory space (not necessarily physically contiguous), while kmalloc allocates physically contiguous memory (also virtually contiguous). Most of the memory allocations in Linux kernel are done using kmalloc, due to the following reasons:
  • On many architectures, hardware devices don’t understand virtual address. Therefore, their device drivers can only allocate memory using kmalloc.
  • kmalloc has better performance in most cases because physically contiguous memory region is more efficient than virtually contiguous memory. The reason behind this is not covered here, interested readers can search for Linux memory management articles.
But when a large chunk of memory is needed, vmalloc is used often as it doesn’t require physically contiguous memory and the kernel can satisfy the request with much less effort than using kmalloc.


malloc allocates physically contiguous memory, memory which
pages are laid consecutively in physical RAM. vmalloc allocates
memory which is contiguous in kernel virtual memory space (that means
pages allocated that way are not contiguous in RAM, but the kernel
sees them as one block).
kmalloc is the preffered way, as long as you don't need very big
areas. The trouble is, if you want to do DMA from/to some hardware
device, you'll need to use kmalloc, and you'll probably need bigger
chunk. The solution is to allocate memory as soon as possible, before
memory gets fragmented.



What are the advantages of having a contiguous block of memory? Specifically, why would I need to have a contiguous physical block of memory in a system call? Is there any reason I couldn't just use vmalloc?

You only need to worry about using physically contiguous memory if the buffer will be accessed by a DMA device on a physically addressed bus (like PCI). The trouble is that many system calls have no way to know whether their buffer will eventually be passed to a DMA device: once you pass the buffer to another kernel subsystem, you really cannot know where it is going to go. Even if the kernel does not use the buffer for DMA today, a future development might do so.
vmalloc is often slower than kmalloc, because it may have to remap the buffer space into a virtually contiguous range. kmalloc never remaps, though if not called with GFP_ATOMIC kmalloc can block.
kmalloc is limited in the size of buffer it can provide: 128 KBytes*). If you need a really big buffer, you have to use vmalloc or some other mechanism like reserving high memory at boot.

This was true of earlier kernels. On recent kernels (I tested this on 2.6.33.2), max size of a single kmalloc is up to 4 MB!




Reading Physical Mapped Memory using /dev/mem


FILES

/dev/mem Provides access to the computer's physical memory. /dev/kmem Provides access to the virtual address space of the operating system kernel, excluding memory that is associated with an I/O device. /dev/allkmem Provides access to the virtual address space of the operating system kernel, including memory that is associated with an I/O device. 
 
http://www.linuxjournal.com/magazine/anthony-lineberry-devmem-rootkits?page=0,0 
 
http://tldp.org/LDP/khg/HyperNews/get/devices/fake.html 
 http://www.plugcomputer.org/plugforum/index.php?PHPSESSID=pelnql865kdm19vfl3bgvr14f7&topic=104.0

source :
            http://forum.kernelnewbies.org/read.php?13,2316,2316

Example : Blink the Led's from the user space through the /dev/mem

#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define LED_ADDR 0x80840020

extern int errno;

int main()
{
        int i;
        unsigned char *leds;
        unsigned char val;

        int fd = open("/dev/mem",O_RDWR|O_SYNC);
        if(fd < 0)
        {
                printf("Can't open /dev/mem\n"winking smiley;
                return 1;
        }
        leds = (unsigned char *) mmap(0, getpagesize(), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0x80840000);
        if(leds == NULL)
        {
                printf("Can't mmap\n"winking smiley;
                return 1;
        }
        else
                printf("leds=%x\n",leds);

        for(i = 0; i < 256; i++)
        {
                val = i % 4;
                leds[0x20] = val;

                sleep(1);
        }

        return 0;
}