Tag Archives: kernel

The Linux kernel now with RTL fully merged

Something pretty historic (at least for geeks like us), happened this week of 16 September 2024; the Linux realtime effort, christened PREEMPT_RT and then Real-Time Linux (RTL), has finally – finally! – been fully merged into the kernel.

The intense, and often earlier unrewarded, work of making Linux RT capable – to use as an RTOS – began back in 2004 with Thomas Gleixner, the late Doug Niehaus, and Ingo Molnar. Soon enough Steven Rostedt (he of Ftrace fame), and several others became major contributors to the effort. It took ‘just’ 20 years; and soon the shiny new 6.12 6.11 Linux kernel will have all the RTL code in-tree. (Update: 6.11 has been released on 23 Sept 2024.)

Until now, developers had to apply an out-of-tree patch to get the code in; here’s the recent 6.9 RT patch(es).

The ‘Merge’

The final barrier falls – the ‘last’ patch (PR) from Petr Mladek, dated 13 Sept 2024, that gets the stubborn printk issues resolved and merged into the (soon-to-be) 6.11 kernel.

Here’s a pic of Thomas Gleixner presenting the last printk-related PR – the one that completes the full inclusion of RTL – to Linus Torvalds at the Open Source Summit Europe in Vienna on 19 Sept 2024. The pull request was presented to Linus in hard-copy gold paper, tied with a ribbon! (That’s Thomas on the left, Linus on the right.)

Pic credit: https://lwn.net/Articles/990985/, Jon Corbet.

See a short video clip of this historic event here (credit: Alexander Kanavin. In it, you can also see a background pic of late Doug Niehaus and Daniel Bristot de Oliveira; as one person commented on LinkedIn: ‘The RT folks are classy.’).

Linus acts on the patch! Here’s the commit (# baeb9a7d8b60b021d907127509c44507539c15e5).

Trying it out

Here’s a simple ‘try’ of the brand new fully-merged RTL kernel; the steps to fetch the git tree, configure and build it, and see it run are shown here… (I did this trivially on an x86_64 Ubuntu 22.04 LTS VM).

Steps

  1. Get the bleeding-edge linux-next kernel tree
    git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  2. Look up the Makefile and git log, see the last RTL patch:
    $ cd linux-next
    $ head Makefile
    # SPDX-License-Identifier: GPL-2.0
    VERSION = 6
    PATCHLEVEL = 11
    SUBLEVEL = 0
    EXTRAVERSION =

    [ ... ]
    $ git log

    Merge: 2004cef11ea0 2638e4e6b182Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date: Fri Sep 20 06:04:27 2024 +0200
    Merge tag 'sched-rt-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull RT enablement from Thomas Gleixner:
    "Enable PREEMPT_RT on supported architectures:

    After twenty years of development we finally reached the point to enable PREEMPT_RT support in the mainline kernel.

    All prerequisites are merged, so enable it on the supported architectures ARM64, RISCV and X86(32/64-bit)"

    * tag 'sched-rt-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    riscv: Allow to enable PREEMPT_RT.
    arm64: Allow to enable PREEMPT_RT.
    x86: Allow to enable PREEMPT_RT.

    commit 2004cef11ea072838f99bd95cefa5c8e45df0847
    Merge: 509d2cd12a10 bc9057da1a22
    Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date: Thu Sep 19 15:55:58 2024 +0200

    Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    [ ... ]

  3. Configure the kernel, turn on PREEMPT_RT – no patching required!
    (as a simplification, I use the ‘localmodconfig’ config target to have the kernel config be a reasonable size, based on that of the build host):

    lsmod > mylsmod
    make LSMOD=./mylsmod localmodconfig
    make menuconfig

    [ ... ]

Figure: navigate to ‘General Setup / Preemption Model’ and turn RTL on!

Save and exit.

FYI, here’s the ‘Kconfig’ fragment: kernel/Kconfig.preempt

config PREEMPT_RT
bool "Fully Preemptible Kernel (Real-Time)"
depends on EXPERT && ARCH_SUPPORTS_RT
select PREEMPTION
help
This option turns the kernel into a real-time kernel by replacing various locking primitives (spinlocks, rwlocks, etc.) with preemptible priority-inheritance aware variants, enforcing interrupt threading and introducing mechanisms to break up long non-preemptible sections. This makes the kernel, except for very low level and critical code paths (entry code, scheduler, low level interrupt handling) fully preemptible and brings most execution contexts under scheduler control.

Select this if you are building a kernel for systems which require real-time guarantees.

[ … ]

$ grep PREEMPT_RT .config
CONFIG_PREEMPT_RT=y

4. Build it:

$ time make -j12
[ ... ]
BUILD arch/x86/boot/bzImage
Kernel: arch/x86/boot/bzImage is ready (#1)
$ sudo make modules_install && sudo make install

5. Ok, let’s reboot and see…

sudo reboot
[ ... select the new RTL kernel from the bootloader ...
]
$ cat /proc/version
Linux version 6.11.0-rtl+ (kaiwan@vbox-22) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP PREEMPT_RT Fri Sep 20 19:11:35 IST 2024

Nice! (Not that one can ‘feel’ anything very different on a PC/laptop; to realize it’s value, run a real-time app – audio (via JACK, PulseAudio, etc) is typically a very good fit.)

The history of PREEMPT_RT / RTL

If interested, you can find (a lot!) of the history, as well as the reasons to use RTL, from the following:

I do hope you enjoyed reading this; do leave your comments and share, thanks.

Linux Kernel Programming – my second book

I’ve recently completed a project – the writing of the Linux Kernel Programming book, published by Packt (it was announced by the publisher on 01 March 2021). This project took just over two years…

All those long days and nights, poring over the writing and the code, I now feel has definitely been very worth-while and that the book will be a useful contribution to the Linux programming community.

A key point: I’ve ensured that all the material and code examples are based on the 5.4 LTS Linux kernel; it’s slated to be maintained right through Dec 2025, thus keeping the book’s content very relevant for a long while!

Due to its sheer size and depth, the publisher suggested we split the original tome into two books. That’s what has happened: 

  • the first part, Linux Kernel Programming, covers the essentials and, in my opinion, should be read first (of course, if you’re already very familiar with the topics it covers, feel free to start either way)
  • the second part, Linux Kernel Programming Part 2, covering a small section of device driver topics, focusses on the basics and the character ‘misc’ class device driver framework.

Many cross-references, especially from the second book to topics in the first, do turn up; hence the suggestion to read them in order.

Here’s a quick run down on what’s covered in each book.


Lets begin with the Linux Kernel Programming book; firstly, it’s targeted at people who are quite new to the world of Linux kernel development and makes no assumptions regarding knowledge of the kernel. The prerequisite is a working knowledge of programming on Linux with ‘C’; it’s the medium we use throughout (along with a few bash scripts). The book is divided into three major sections, each containing appropriate chapters:

  • Section 1 covers the basics: firstly, the appropriate setup of the kernel development workspace on your system; next, two chapters cover the building of the Linux kernel from scratch, from code. (It includes the cross compile as well, using the popular Raspberry Pi board as a ‘live’ example).
    • The following two chapters delve in-depth into the kernel’s powerful Loadable Kernel Module (LKM) framework, how to program it along with more advanced features. I also try and take a lot of trouble to point out how one should code with security in mind!
  • In Section 2 we deal with having you, the reader, gain a deeper understanding (to the practical extent required) of key kernel internals topics. A big reason why many struggle with kernel development is a lack of understanding of its internals.
    • Here, Chapter 6 covers the kernel architecture, focusing on how the kernel maintains attribute information on processes/threads and their associated stacks.
    • The next chapter – a really key one, again – delves into a difficult topic for many – memory management internals. I try to keep the coverage focused on what matters to a kernel and/or driver developer.
    • The following two chapters dive into the many and varied ways to allocate and deallocate memory when working within the kernel – an area where you can make a big difference performance-wise by knowing which kernel APIs and methods to use when.
    • The remaining two chapters here round off kernel internals with discussion on the kernel-level CPU scheduler; several concepts and practical code examples have the reader learn what’s required.
  • Section 3 is where the books dives into what folks new to it consider to be difficult and arcane matters – how and why synchronization matters, how data races occur and how you can protect critical sections in your kernel / driver code!
    • The amount of material here requires two chapters to do justice to: the first of them focuses on critical sections, concurrency concerns, the understanding and the practical usage of the mutex and the spinlock.
    • The book’s last chapter continues this discussion on kernel synchronization covering more areas relevant to the modern kernel and/or driver developer – atomic (and refcount) operators, cache effects, a primer on ‘lock-free’ programming techniques, with one of them – the percpu one – covered in some detail. Lock debugging within the kernel – using the powerful lockdep validator – as well as other techniques is covered as well!

The second book – Linux Kernel Programming Part 2 – Char Device Drivers and Kernel Synchronization deliberately covers just a small section of ‘how to write a device driver on Linux’. It does not purport to cover the many types and aspects of device driver development, instead focusing on the basics of teaching the reader how to write a simple yet complete character device driver belonging to the ‘misc’ class.

Great news! This book – Linux Kernel Programming Part 2 – Char Device Drivers and Kernel Synchronization – is downloadable for FREE. Enjoy!

Access it now!

Having said that, the materials covering user-kernel communication pathways, working with peripheral I/O memory, and especially, the topic on dealing with hardware interrupts, is very detailed and will prove to be very useful in pretty much all kinds of Linux device driver projects.

A quick chapter-wise run down of the second book:

  • In Chapter 1, we cover the basics – the reader understands the basics of the Linux Device Model (LDM) and ends up writing a small, simple, yet complete ‘misc’ class character driver. Security-awareness is built too: we demonstrate a simple “privesc” – privilege escalation – attack
  • Chapter 2 shows the reader something every driver author will at one time or the other have to do: efficiently communicate between user and kernel address spaces. You’ll learn to use various technologies to do so – via procfs, sysfs, debugfs (especially useful to insert debug hooks as well), netlink sockets and the ioctl system call
  • The next chapter has the reader understand the nuances of reading and writing peripheral (hardware) I/O memory, via both the memory-mapped I/O (MMIO) as well as the Port I/O (PIO) technique
  • Chapter 4 covers dealing with hardware interrupts in-depth; the reader will learn how the kernel works with hardware interrupts, then move onto how one is expected to allocate an IRQ line (covering modern resource-managed APIs), and how to correctly implement the interrupt handler routine. The modern approach of using threaded handlers (and the why of it) is then covered. The reasons for and using both “top half” and “bottom half” interrupt mechanisms (hardirq, tasklet, and softirqs) in code, as well as key information regarding the dos and don’ts of hardware interrupt handling are covered. Measuring interrupt latencies with the modern [e]BPF toolset, as well as with Ftrace, concludes this key chapter
  • Common kernel mechanisms – setting up delays, kernel timers, working with kernel threads and kernel workqueues – is the subject matter of Chapter 5. Several example kernel modules, including three versions of a ‘simple encrypt decrypt’ (‘sed’) example driver, serve to illustrate the concepts learned in code
  • The final two chapters of this book deal with the really important topic of kernel synchronization (the same material in fact as the last two chapters of the first book). 

I think you’ll find that both books have a fairly large number of high quality, relevant code examples, all of which are based on the 5.4 LTS kernel.

[ LKP : code on GitHub ] [ LKP Part 2 : code on GitHub ]

Thanks for taking the time to read this post; more, I really hope you will read and enjoy these books!

Get Linux Kernel Programming, Kaiwan N Billimoria, Packt, Mar 2021 :

[ On Amazon (US)  ]    [ On Amazon (India) ]    [ On Packt ]

Linux Kernel Online and Book Resources collection

Working on the Linux kernel is challenging stuff, no doubt about that. Thus, the hunt for good technical articles, documentation, tips and gotchas on the subject quickly becomes part and parcel of the kernel developer’s work. This page is an attempt to collate and aggregate quality online (and offline – book lists) about the Linux kernel. It’s certainly not  the first and won’t be the last such attempt. Nevertheless, hope you find it useful! Kindly comment and let me know what I inadvertently missed out. Here goes:

  • Perhaps the best all-in-one or starting point website to begin digging up practical (and theoretical) information on the Linux kernel: 

The Wikipedia “Portal:Linux” page linuxportal Continue reading Linux Kernel Online and Book Resources collection

A KDB / KGDB session on the popular Raspberry Pi embedded Linux board

Assumptions / Pre-reqs

For this post to be useful, you should:

– know how to build a Linux kernel from source

– know something about Linux kernel programming, writing kernel module code, etc

– have some familiarity with setting up and using KDB and KGDB (a bit of this is covered here, not all); also, see some useful Resources just below..

– have an R Pi (I use the Rev B R Pi) with an SD card

– have a custom Linux kernel running on it (need to be able to modify kernel configuration and rebuild at will)

– the R Pi does not have a dedicated physical serial port; we require one to get (and send) console I/O (so that we can see kernel printk’s and interact via the keyboard). I find a simple and efficient way to do this is to make use of the GPIO pins 14 (TXD) and 15 (RXD) on the board, connecting them to a simple FTDI
USBTTL serial breakout board. I’m using FTDI’s FT232R Breakout board; it works very well indeed.

My R Pi (Model B) attached to a FTDI FT232R USB-to-TTL breakout board
My R Pi (Model B) attached to a FTDI FT232R USB-to-TTL breakout board

Above pic: My R Pi (Model B) attached to a FTDI FT232R USB-to-TTL breakout board.
Connections: (see photo)
          R Pi                                   FTDI
TXD (GPIO 14) RX-I              (RX-I and TX-O pins are at the front of the FTDI
RXD (GPIO 15) TX-O              board (directly opp the USB mini connector))
GND (GPIO 6)   GND

Yeah, quite a few pre-reqs huh 🙂

Resources

– Raspberry Pi on Wikipedia

– Using kgdb, kdb and the kernel debugger internals

– A good tutorial on building-from-scratch for the R Pi root filesystem and Linux kernel, using the excellent Buildroot tool,
can be found here.

Hi folks,

Continue reading A KDB / KGDB session on the popular Raspberry Pi embedded Linux board

Exploring Linux procfs via shell scripts

Very often, while working on a Linux project, we’d like information about the system we’re working on: both at a global scope and a local (process) scope.

Have we not wondered: is there a quick way to query which kernel version am using, what interrupts are enabled & hit, what my processor(s) are, details about kernel subsystems, memory usage, file, network, IPC usage, etc etc. Linux’s proc filesystem makes this easy.

So what exactly is the proc filesystem all about?

Essentially, some quick salient points about the proc filesystem:

  • it’s a RAM-based filesystem (think ramdisk; yup, it’s volatile)
  • it’s a kernel feature, not userspace – proc is a filesystem supported by the Linux kernel VFS
  • it serves two primary purposes
    • proc serves as a “view” deep into the kernel internals; we can see details about hardware and software subsystems that userspace otherwise would have no access to (no syscalls)
    • certain “files” under proc, typically anchored under /proc/sys, can be written into: these basically are the “tuning knobs” of the Linux kernel. Sysads, developers, apps, etc exploit this feature
  • proc is mounted on start-up under /proc
  • a quick peek under /proc will show you several “files” and “folders”. These are pseudo-entries in the sense that they exist only in RAM while power is applied. The “folders” that are numbers are in fact the PID of each process that’s alive when you typed ‘ls’! it’s a snapshot of the system at that moment in time..
  • in fact, the name “proc” suggests “process”

At this point, and if you’re not really familiar with this stuff, I’d urge you to peek around /proc on your Linux box, cat-ting stuff as you go. (Also, lest i forget, it’s better to run as root (sudo /bin/bash) so that we don’t get annoying ‘permission denied’ messages). Of course, be careful when you run as root!!!

For example, to get one started off:

Continue reading Exploring Linux procfs via shell scripts

kmalloc and vmalloc : Linux kernel memory allocation API Limits

The Intent

To determine how large a memory allocation can be made from within the kernel, via the “usual suspects” – the kmalloc and vmalloc kernel memory allocation APIs, in a single call.

Lets answer this question using two approaches:

  1. By reading the source, and
  2. By trying it out empirically on the system.

(Kernel source from kernel ver 3.0.2; tried out on kernel ver 2.6.35 on an x86 PC and 2.6.33.2 on the (ARM) BeagleBoard).

Quick Summary

For the impatient:

The upper limit (number of bytes that can be allocated in a single kmalloc request), is a function of:

  • the processor – really, the page size – and
  • the number of buddy system freelists (MAX_ORDER).

On both x86 and ARM, with a standard page size of 4 Kb and MAX_ORDER of 11, the kmalloc upper limit is 4 MB!

The vmalloc upper limit is, in theory, the amount of physical RAM on the system.
[EDIT/UPDATE #2]
In practice, the kernel allocates an architecture (cpu) specific “range” of virtual memory for the purpose of vmalloc: from VMALLOC_START to VMALLOC_END.

[EDIT/UPDATE #1]
In practice, it’s usually a lot less. A useful comment by ugoren points out that:
” in 32bit systems, vmalloc is severely limited by its virtual memory area. For a 32bit x86 machine, with 1GB RAM or more, vmalloc is limited to 128MB (for all allocations together, not just for one).

[EDIT/UPDATE #3 : July ’17]
I wrote a simple kernel module (can download the source code, see the link at the end of this article), to test the kmalloc/vmalloc upper limits; the results are what we expect:
for kmalloc, 4 MB is the upper limit with a single call; for vmalloc, it depends on the vmalloc range.

Also, please realize, the actual amount you can acquire at runtime depends on the amount of physically contiguous RAM available at that moment in time; this can and does vary widely.

Finally, what if one require more than 4 MB of physically contiguous memory? That’s pretty much exactly the reason behind CMA – the Contiguous Memory Allocator! Details on CMA and using it are in this excellent LWN article here. Note that CMA was integrated into mainline Linux in v3.17 (05 Oct 2014). Also, the recommended API interface to use CMA is the ‘usual’ DMA [de]alloc APIs (kernel documentation here and here); don’t try and use them directly.

I kmalloc Limit Tests

First, lets check out the limits for kmalloc :

Continue reading kmalloc and vmalloc : Linux kernel memory allocation API Limits

Inside the MSDOS / FAT Linux VFS Implementation

A (small) part of the Linux VFS module of the Designer Graphix Linux Internals training programme.

Referenced kernel ver: 2.6.30

Once extracted, see the

 fs/fat

folder.

_Tip:_
For ease of code browsing, do ‘make tags’ (or ‘ctags -R’) in the root folder of the kernel soure tree.

cd fs/fat

Note: Here the focus is on part of the MSDOS – Linux VFS kernel implementation, mainly the disk-related part, i.e., the superblock and inode objects. We don’t attempt to cover the Dcache/dentry, page cache (address operations) and just touch upon the process<–>filesystem relationship stuff (at least for now).

_Tip:_
To gain some insight into the physical structure / arch of the MSDOS (and [v]fat) filesystem, see this page.
The <linux/msdos_fs.h> header mirrors much of this.

For example, the FAT16 boot record (boot sector) structure is nicely seen here; it’s Linux layout is here:
include/linux/msdos_fs.h:struct fat_boot_sector
(can browse it via the superb LXR tool here).

Superblock Setup

In namei_msdos.c:


...
static struct file_system_type msdos_fs_type = {
.owner          = THIS_MODULE,
.name           = "msdos",
.get_sb         = msdos_get_sb,
.kill_sb        = kill_block_super,
.fs_flags       = FS_REQUIRES_DEV,
};

static int __init init_msdos_fs(void)
{
return register_filesystem(&msdos_fs_type);
}

Continue reading Inside the MSDOS / FAT Linux VFS Implementation

Google-Android (ABO) Executive Conference, 5 Mar 2009, Bangalore, India

google-android-conf

Was happy to be a speaker at the ‘Google-Android for Executives Conference ’09’ organized by ABO Ventures, held in Bangalore on 5 Mar ’09.

I made a brief presentation entitled “Android – A Look Under the Hood”, which was pretty well received. It included a couple of demos on the Android Developer Phone (ADP1):
– changing brightness of the LCD screen (using a shell script, low-level hardware access via /sys – not the recommended way to do it!)
– flashing the device: saving and restoring the bootloader, kernel and system/app images.

Some pics taken during the flashing process below:

ADP1 hooked up to the laptop via the USB cable

ADP1 (Android Developer Phone v1) hooked up to the laptop via the USB cable.

ADP1 showing the bootloader-loaded recovery util (JF v1.3)

ADP1 showing the bootloader-loaded recovery util (JF v1.3).

ADP1

ADP1 ‘Settings/About phone’ screenshot after upgrade (flash).

My open source contributions, listed on ohloh.net

ohloh.net does a really neat job tracking open-source projects / efforts. It tracks by person, project, language.

Really cool!

My personal (& very small!) contributions to open source can be seen here:

See https://www.ohloh.net/people?sort=kudo_position&q=kaiwan+billimoria

(Note that my actual commits are to the project ‘Linux Kernel 2.6’ and the ‘OpenMoko VisualGPS’ project; their being used in other open source projects makes it show up elsewhere as well…).


Ohloh profile for Kaiwan N Billimoria

Porting Android kernel to the TS-72xx board (EP93xx)

Porting Attempt

From: Android base (2.6.25-android-r1.0), ARM11, on android phones (like G1)

To: TS-7200 SBC, ARM9, 2.6.24.4-ts… (Matt’s) kernel.

Upside: TS72xx learning, android learning, porting experience.

Downside: no full LED screen/touchpanel, moving to lower-powered processor, toolchain (asm) issues?

Resources:

http://www.nthcode.com/pubs/porting-android-to-a-new-device-p2.html

Verify TS-7200 board running off a 2.6 kernel (because it ships with 2.4.26)

So: downloaded kernel codebase of 2.6.21-ts from here:

tskernel-2.6.21-ts-src.tar.gz 56618 KB Thursday 15 November 2007 12:00:00 IST

$ wget ftp://ftp.embeddedarm.com/ts-arm-sbc/ts-7200-linux/sources/tskernel-2.6.21-ts-src.tar.gz

–18:27:16– ftp://ftp.embeddedarm.com/ts-arm-sbc/ts-7200-linux/sources/tskernel-2.6.21-ts-src.tar.gz

=> ‘tskernel-2.6.21-ts-src.tar.gz.1’

Resolving ftp.embeddedarm.com… 67.40.67.44

Connecting to ftp.embeddedarm.com|67.40.67.44|:21… connected.

Logging in as anonymous … Logged in!

==> SYST … done. ==> PWD … done.

==> TYPE I … done. ==> CWD /ts-arm-sbc/ts-7200-linux/sources … done.

==> PASV … done. ==> RETR tskernel-2.6.21-ts-src.tar.gz … done.

Length: 5,79,75,931 (55M) (unauthoritative)

2% [> ] 13,86,578 18.46K/s ETA 25:18

$

Use the correct configuration file.

Continue reading Porting Android kernel to the TS-72xx board (EP93xx)