Category Archives: C

C, Debugging, Design, Device Drivers, Embedded Linux, Linux Kernel, Linux Programming, programming

Linux Kernel Programming – my second book

March 21, 2021 Kaiwan 4 Comments

I’ve recently completed a project – the writing of the Linux Kernel Programming book, published by Packt (it was announced by the publisher on 01 March 2021). This project took just over two years…

All those long days and nights, poring over the writing and the code, I now feel has definitely been very worth-while and that the book will be a useful contribution to the Linux programming community.

A key point: I’ve ensured that all the material and code examples are based on the 5.4 LTS Linux kernel; it’s slated to be maintained right through Dec 2025, thus keeping the book’s content very relevant for a long while!

Due to its sheer size and depth, the publisher suggested we split the original tome into two books. That’s what has happened:

the first part, Linux Kernel Programming, covers the essentials and, in my opinion, should be read first (of course, if you’re already very familiar with the topics it covers, feel free to start either way)
the second part, Linux Kernel Programming Part 2, covering a small section of device driver topics, focusses on the basics and the character ‘misc’ class device driver framework.

Many cross-references, especially from the second book to topics in the first, do turn up; hence the suggestion to read them in order.

Here’s a quick run down on what’s covered in each book.

Lets begin with the Linux Kernel Programming book; firstly, it’s targeted at people who are quite new to the world of Linux kernel development and makes no assumptions regarding knowledge of the kernel. The prerequisite is a working knowledge of programming on Linux with ‘C’; it’s the medium we use throughout (along with a few bash scripts). The book is divided into three major sections, each containing appropriate chapters:

Section 1 covers the basics: firstly, the appropriate setup of the kernel development workspace on your system; next, two chapters cover the building of the Linux kernel from scratch, from code. (It includes the cross compile as well, using the popular Raspberry Pi board as a ‘live’ example).
- The following two chapters delve in-depth into the kernel’s powerful Loadable Kernel Module (LKM) framework, how to program it along with more advanced features. I also try and take a lot of trouble to point out how one should code with security in mind!

In Section 2 we deal with having you, the reader, gain a deeper understanding (to the practical extent required) of key kernel internals topics. A big reason why many struggle with kernel development is a lack of understanding of its internals.
- Here, Chapter 6 covers the kernel architecture, focusing on how the kernel maintains attribute information on processes/threads and their associated stacks.
- The next chapter – a really key one, again – delves into a difficult topic for many – memory management internals. I try to keep the coverage focused on what matters to a kernel and/or driver developer.
- The following two chapters dive into the many and varied ways to allocate and deallocate memory when working within the kernel – an area where you can make a big difference performance-wise by knowing which kernel APIs and methods to use when.
- The remaining two chapters here round off kernel internals with discussion on the kernel-level CPU scheduler; several concepts and practical code examples have the reader learn what’s required.

Section 3 is where the books dives into what folks new to it consider to be difficult and arcane matters – how and why synchronization matters, how data races occur and how you can protect critical sections in your kernel / driver code!
- The amount of material here requires two chapters to do justice to: the first of them focuses on critical sections, concurrency concerns, the understanding and the practical usage of the mutex and the spinlock.
- The book’s last chapter continues this discussion on kernel synchronization covering more areas relevant to the modern kernel and/or driver developer – atomic (and refcount) operators, cache effects, a primer on ‘lock-free’ programming techniques, with one of them – the percpu one – covered in some detail. Lock debugging within the kernel – using the powerful lockdep validator – as well as other techniques is covered as well!

The second book – Linux Kernel Programming Part 2 – Char Device Drivers and Kernel Synchronization – deliberately covers just a small section of ‘how to write a device driver on Linux’. It does not purport to cover the many types and aspects of device driver development, instead focusing on the basics of teaching the reader how to write a simple yet complete character device driver belonging to the ‘misc’ class.

Great news! This book – Linux Kernel Programming Part 2 – Char Device Drivers and Kernel Synchronization – is downloadable for FREE. Enjoy!
Access it now!

Having said that, the materials covering user-kernel communication pathways, working with peripheral I/O memory, and especially, the topic on dealing with hardware interrupts, is very detailed and will prove to be very useful in pretty much all kinds of Linux device driver projects.

A quick chapter-wise run down of the second book:

In Chapter 1, we cover the basics – the reader understands the basics of the Linux Device Model (LDM) and ends up writing a small, simple, yet complete ‘misc’ class character driver. Security-awareness is built too: we demonstrate a simple “privesc” – privilege escalation – attack
Chapter 2 shows the reader something every driver author will at one time or the other have to do: efficiently communicate between user and kernel address spaces. You’ll learn to use various technologies to do so – via procfs, sysfs, debugfs (especially useful to insert debug hooks as well), netlink sockets and the ioctl system call
The next chapter has the reader understand the nuances of reading and writing peripheral (hardware) I/O memory, via both the memory-mapped I/O (MMIO) as well as the Port I/O (PIO) technique
Chapter 4 covers dealing with hardware interrupts in-depth; the reader will learn how the kernel works with hardware interrupts, then move onto how one is expected to allocate an IRQ line (covering modern resource-managed APIs), and how to correctly implement the interrupt handler routine. The modern approach of using threaded handlers (and the why of it) is then covered. The reasons for and using both “top half” and “bottom half” interrupt mechanisms (hardirq, tasklet, and softirqs) in code, as well as key information regarding the dos and don’ts of hardware interrupt handling are covered. Measuring interrupt latencies with the modern [e]BPF toolset, as well as with Ftrace, concludes this key chapter
Common kernel mechanisms – setting up delays, kernel timers, working with kernel threads and kernel workqueues – is the subject matter of Chapter 5. Several example kernel modules, including three versions of a ‘simple encrypt decrypt’ (‘sed’) example driver, serve to illustrate the concepts learned in code
The final two chapters of this book deal with the really important topic of kernel synchronization (the same material in fact as the last two chapters of the first book).

I think you’ll find that both books have a fairly large number of high quality, relevant code examples, all of which are based on the 5.4 LTS kernel.

[ LKP : code on GitHub ] [ LKP Part 2 : code on GitHub ]

Thanks for taking the time to read this post; more, I really hope you will read and enjoy these books!

Get Linux Kernel Programming, Kaiwan N Billimoria, Packt, Mar 2021 :

[ On Amazon (US) ] [ On Amazon (India) ] [ On Packt ]

C, Linux Programming, Misc

My book – Hands-On System Programming with Linux

November 29, 2018 Kaiwan Leave a comment

Hello, pleased that my first book has been recently released (on 31 Oct 2018). The publisher is Packt (based in Birmingham, UK).

Quick Links

The book is available in various formats here:

The book has an open-to-public GitHub repository as well.

A chapter of the book is freely available online: File IO Essentials. Do download and check it out.

A fairly detailed description of the book, and what it attempts to cover is given below. Obviously, am very grateful to my readers. A request: once you’ve gone through the book, please take five minutes to write a review for the book on Amazon.

Hands-On System Programming with Linux

The Linux OS has grown from being a one-person tiny hobby project to a fundamental and integral part of a stunning variety of software products and projects; it is found in tiny industrial robots, consumer electronic devices (smartphones, tablets, music players); it is the powerhouse behind enterprise-scale servers, network switches and data centres all over the Earth.

This book is about a fundamental and key aspect of Linux – systems programming at the library and system call layers of the Linux stack. It will cement for you, in a fast-paced yet deeply technical manner, both the conceptual “why” and with a hands-on approach the practical “how” of systems programming on the Linux OS. Real-world relevant and detailed code examples, tested on the latest 4.x kernel distros, are found throughout the book, with added emphasis on key aspects such as programming for security. Linux was never more relevant in industry; are you?

The book’s style is one of making complex topics and ideas easy to understand; we take great pains to introduce the reader to relevant theoretical concepts such that the fundamentals are crystal clear before moving on to APIs and code. Then, the book goes on to provide detailed practical, hands-on and relevant code examples to solidify the concepts. All the code shown in the book (several example for each chapter) is available ready-to-build-and-try on the book’s GitHub repository here : https://github.com/PacktPublishing/Hands-on-System-Programming-with-Linux. Not only that, each chapter has suggested assignments for the budding programmer to try; some of these assignments have solutions provided as well (find them in the GitHub repo).

This book covers a huge amount of ground within the Linux system programming domain; nevertheless, it does so in a unique way. Several features make this book useful and unique; some of them are enumerated below:

How many books on programming have you read that, besides the typical ‘Hello, world’ ‘C’ program, also include a couple of assembly language examples, describe what the CPU ABI is, and a whole lot more, including a detailed description of system architecture, system calls, etc in the very first chapter
While we do not delve into intricate details on their usage, this book uses, and briefly shows how to use, plenty of useful and relevant tools – from ltrace and strace to perf, to LTTng and Ftrace
The book is indeed detailed; for example, it does not shy away from even delving (to the appropriate degree) into kernel internals wherever required. The understanding and description of an important topic such as Virtual Memory in Chapter 2, serves as a good example. Fairly advanced concepts such as looking into the process stack (gstack, GDB) and what the process VM-split is, are covered as well
Ch 4 on Dynamic Memory Allocation goes well beyond the usual malloc API family; here, once we get that covered (in depth of course), the book moves into more advanced topics such as internal malloc behavior, demand paging, explicitly locking and protecting memory pages, using the alloca API, etc
Ch 5 on Memory Issues, leads with a detailed description (and actual code test cases) of common memory defects (bugs), that novice (and even experienced!) programmers often overlook. These include the OOB (Out Of Bounds) memory accesses (read/write overflows/underflows), UAF, UAR, leakage and double free
Ch 6 then then continues logically forward with a detailed discussion on Tools to detect the afore-mentioned memory defects with; here, we focus on using the well known Valgrind (memcheck) and on the newer, exciting Sanitizer toolset (compiler-based tools). We compare them point for point, and explain how the modern C/C++ developer must tackle memory defects
File I/O: a vast topic by itself; we divide the discussion into two parts. The reader is advised to first delve into Appendix A – File IO Essentials (available online) which covers the basic concepts. Our Ch 18 on Advanced File I/O goes deeper; most programmers are aware of and frequently use the “usual” read/write system calls. We show how to optimize performance in various ways- using the MT (multithreaded) optimal pread/pwrite APIs, using scatter-gather (SG-IO). An explanation (with detailed diagrams) on the kernel picture for the block IO code path, shows the reader how I/O actually works – the kernel page cache and related components are shown. Leveraging this knowledge with the posix_fadvise and the readahead APIs is described as well. Memory mapping as a powerful “zero-copy” technique for file IO is explained with sample code. More advanced areas such as DIO and AIO are briefly delved into as well, all with the idea that the developer can leverage the system for maximum I/O performance
Most programmers are (dimly) aware of the traditional Unix “permissions model”; we cover this in detail from a system programming perspective in Ch 7 – Process Credentials (we explain setuid/setgid/saved-set-ID programs). We emphasize though, that there is a superior modern approach – the POSIX Capabilities model – which is covered in Ch 8. Security concerns and how these get addressed form the meat of the discussion here
Being a book on system programming, we obviously cover Process Execution (Ch 9) and Process Creation (Ch 10) in depth; the fork system call and all its subtleties and intricacies. Towards this end, we encode the “The rules of fork” – a set of seven rules that help the developer truly understand how this important system call actually behaves. As part of this chapter, the wait API and its variations, and the famous Unix “orphan” and “zombie” process are covered as well
Again, as expected, we cover the topic of Signaling in depth over two whole chapters (11 and 12). Besides the basics, the book delves into practical details (with code, of course) in covering, for example, various techniques by which one can handle a very high volume of signals continually bombarding a process. The notion of software reentrancy (and the related reentrant-safety concept) is covered. Ways to prevent the zombie, using an alternate signal stack, etc are covered as well. The next chapter on Signaling delves into the intricacies of handling fatal signals in a real-world Linux application – a must-do! The book provides an effective “template code” to conveniently fulfil this purpose. Using real-time signals for IPC, and synchronous APIs for signaling is covered too
The chapter (13) on Timers covers both the traditional and the more recent powerful POSIX timers; here, we explain concepts via two interesting mini-projects – building a “how fast can you react” (CLI) game and a small “run:walk” timer app
Multithreading with Pthreads is again a vast area; this book devotes three whole chapters (14, 15, 16) to this important topic. In the first of this trilogy, we delve into the thread concept, how exactly it differs from a process and importantly, why threading is useful (with code examples of course). The next chapter deals in depth with the key topics of concurrency and synchronization (within the Pthreads domain, covering mutex locks and CVs); here, we use the (old but) very interesting Mars Pathfinder mission as a quick case study on priority inversion and how it can be prevented. Next, the really important topics of thread safety, cancellation and cleanup, are covered. The chapter ends with a brief “Multi processing vs MT” discussion and some typical FAQs. (We even include a ‘pthreads_app_template.c’ program in the book’s GitHub tree)
Ch 17 delves into intricacies of CPU Scheduling on the Linux OS; key concepts – the Linux state machine, what real-time is, POSIX scheduling policies available, etc are covered. We demonstrate a MT app that switches two worker threads to become (soft) real-time
The book ends with a small chapter (19) devoted to Trobleshooting and Best Practices to follow – a small but really key chapter!

Scattered throughout the book are interesting examples and reader assignments (some of which have solutions provided). Also, we try not to be completely x86 specific; in a few examples we cross-compile for the popular ARM-32 CPU; we mention the SEALS project (allowing one to quickly prototype apps on a Qemu emulated ARM/Linux system).

Who this book will benefit:

The professional Linux application developer
Application architects, leads, technical managers, consultants
Linux QA professionals
Students desirious of gaining an industry-relevant edge
Anybody interested in Linux programming and crafting good software in general.

Buy, read and learn!

assembly, C, Embedded Linux, firmware, Misc, programming

Advice to a Young Firmware Developer – by Jack Ganssle; and Assembly

November 20, 2018 Kaiwan Leave a comment

I hope Jack Ganssle forgives my directly copying his content; the only reason I do so is that these thoughts of his are precious and I wish for more of us to read and appreciate them. (The discussion is definitely biased towards firmware/embedded developers that work primarily on a hardware platform using ‘C’ as the language. Just a small part of Jack’s excellent Embedded Muse newsletter is shown below; do check out the full article and subscribe to his newsletter).

Directly copied from here: The Embedded Muse, Issue #362 by jack Ganssle, 19 Nov 2018.

“Advice to a Young Firmware Developer

… Learn, in detail, how a computer works. You should be able to draw a detailed block diagram of one. Even if you have no interest in the hardware, it’s impossible to understand assembly language and other important aspects of creating firmware without understanding program counters, registers, busses and the like.

Learn an assembly language. Write real programs in it, and make them work. Absent a grounding in assembly much of the operation of a computer will be mysterious. In real life you’ll have to delve into the assembly at least occasionally, at least to work on the startup code, and to find some classes of bugs.

A recent article in IEEE Spectrum surveyed language use and C didn’t even make the cut. Java, Javascript, HMTL and Python were ranked as the most in-demand languages in the USA. Yet around 70% of firmware people work in C. C++ makes up another 20%. For better or worse, all of the other embedded languages are in the noise. Master C, pointers and all. (Rust is increasingly popular, yet, despite the hype, has under a 1% share in the embedded space).

But do learn some other languages. Python can be useful for scripting. Ada gives a discipline I wish more had.

Work in a cross-development environment with an embedded target board. It’s very different from using Visual Studio.

Get comfortable with a Linux shell. With sed, awk, and a hundred other tools you can do incredible things without writing any code.

Take the time to think through what you’re building. It’s tempting to start coding early. Design is critically important. Remember the old saying: “if you think good design is expensive, consider the cost of bad design.”

Monitor your bug rates. Forever. Skip this and you’ll never know two things: if you’re improving, and how you compare to the industry. We all think we’re great developers, but metrics can be a cold shower.

Always be on the prowl for tools. Some are free, others expensive, but great tools will make you more productive and less error-prone. These change all the time so figure on constantly refining your toolbox.

Did you know the average firmware person reads one technical book a year? Yet this field evolves at the pace of Moore’s Law. Constantly study software engineering. We do have a Body of Knowledge. Every year new ideas appear. Some are brilliant, others whacky, but they all make you think.

Learn about the hardware. At least get a general understanding. An engineer who can use an oscilloscope and logic analyzer for troubleshooting code is a valuable addition to a software team. Digital and analog hardware is cool and fascinating. …”

[Added on 24Feb2020]
Again, I couldn’t resist copy-pasting! … a well thought out answer on Quora to the question:
What are some things about coding that you learned from good programmers around you?
Here’s the full answer (by Håkon Hapnes Strand):

Don’t be a lazy bastard. Do things the right way even if it’s a lot of work.
Don’t give up on something just because you’re stuck. You’ll figure it out eventually.
Always track down the root cause of a bug. If a bug just “goes away”, it hasn’t. If you can’t explain what fixed the bug, it isn’t.
Write unit tests. It may seem unnecessary and a like lot of work, but it will help you in the long run. See point 1.
Best practices are best practices for a reason. Don’t assume that your approach is better just because it’s what you’re used to.
Make sure everything is reproducible. That includes data and infrastructure.
If it’s not in version control, it doesn’t exist. (See point 6)
Always abstract where it’s needed. Never abstract where it’s not needed.
Think before you code.

So, okay, that’s the part of the article(s) I wanted to show.

Learning Assembly Language – Resources

How does one just learn assembly language then? Well, there are resources of course that help – books, online articles; here’s a few:

C, Linux Programming, programming

Pthreads Dev – Common Programming Mistakes to Avoid

September 21, 2018 Kaiwan Leave a comment

Disclaimer: Okay, let me straight away say this: most of these points below are from various books and online sources. One that stands out in my mind, an excellent tome, though quite old now, is “Multithreaded Programming with Pthreads”, Bil Lewis and Daniel J. Berg.

Common Programming Errors one (often?) makes when programming MT apps with Pthreads

Failure to check return values for errors
Using errno without actually checking that an error has occurred

WRONG                       Correct
syscall_foo();              if (syscall_foo() < 0) {
if (errno) { ... }              if (errno) { ... } }

(Also, note that all Pthread APIs may not set errno)

Not joining on joinable threads
A critical one: Failure to verify that library calls are MT Safe
Use the foo_r API version if it exists, over the foo.
Use TLS, TSD, etc.
Falling off the bottom of main()
must call pthread_exit() ; yes, in main as well!
Forgetting to include the POSIX_C_SOURCE flag
Depending upon Scheduling Order
Write your programs to depend upon synchronization. Don’t do :
```
      sleep(5); /* Enough time for manager to start */
```

Instead, wait until the event in question actually occurs; synchronize on it, perhaps using CVs (condition variables)

Not Recognizing Shared Data
Especially true when manipulating complex data structures – such as lists in which each element (or node) as well as the entire list has a separate mutex for protection; so to search the list, you would have to obtain, then release, each lock as the thread moved down the list. It would work, but be very expensive. Reconsider the design perhaps?
Assuming bit, byte or word stores are atomic
Maybe, maybe not. Don’t assume – protect shared data
Not blocking signals when using sigwait(3)
Any signals you’re blocking upon with the sigwait(3) should never be delivered asynchronously to another thread; block ’em and use ’em
Passing pointers to data on the stack to another thread

(Case a) Simple – an integer value:

Correct (below):

main()
{
 ...
 // thread creation loop
 for (i=0; i<NUM_THREADS; i++) {
    thread_create(&thrd[i], &attr, worker, i);
 }
 ...
 // join loop...

 pthread_exit();
}

The integer i is passed to each thread as a ‘literal value’; no issues.

Wrong approach (below):

main()
{
 ...
 // thread creation loop
 for (i=0; i<NUM_THREADS; i++) {
     thread_create(&thrd[i], &attr, worker, &i);
 }
  ...
  // join loop...

  pthread_exit();
}

Passing the integer by address implies that some thread B could be reading it while thread main is writing to it! A race, a bug.

(Case b) More complex – a data structure:

Correct (below):

main()
{
my_struct *pstr;
...
// thread creation loop
for (i=0; i<NUM_THREADS; i++) {
   pstr = (my_struct *) malloc(...);
   pstr->data = <whatever>;
   pstr->... = ...; // and so on...
   pthread_create(&thrd[i], &attr, worker, pstr);
}
...
// in the join loop..
  free(pstr);

pthread_exit();
}

The malloc ensures the memory is accessible to the particular thread it’s being passed to. Thread Safe.

Wrong! (below)

my_struct *pstr = malloc(...);

main()
{
...
for (i=0; i<NUM_THREADS; i++) {
   pstr->data = <whatever>;
   pstr->... = ...; // and so on...
   pthread_create(&thrd[i], &attr, worker, pstr);
}
// join

free(pstr);
pthread_exit();
}

If you do this (the wrong one, above), then the global pointer (one instance of the data structure only) is being passed around without protection – threads will step on “each other’s toes” corrupting the data and the app. Thread Unsafe.

Avoid the above problems; use the TLS (Thread-Local Storage) – a simple and elegant approach to making your code thread safe.

Resource: GCC page on TLS.

C, Debugging, Embedded Linux, Linux Programming, programming

Application Binary Interface (ABI) Docs and Their Meaning

May 7, 2018 Kaiwan Leave a comment

Have you, the programmer, ever really thought about how it all actually works? Am sure you have…

We write

printf("Hello, world! value = %d\n", 41+1);

and it works. But it’s ‘C’ code – the microprocessor cannot possibly understand it; all it “understands” is a stream of binary digits – machine language. So, who or what transforms source code into this machine language?

The compiler of course! How? It just does (cheeky). So who wrote the compiler? How?
Ah. Compiler authors figure out how by reading a document provided by the microprocessor (cpu) folks – the ABI – Application Binary Interface.

People often ask “But what exactly is an ABI?”. I like the answer provided here by JesperE:

"... If you know assembly and how things work at the OS-level, you are conforming to a certain ABI. The ABI govern things like
how parameters are passed, where return values are placed. For many platforms there is only one ABI to choose from, and in those
cases the ABI is just "how things work".

However, the ABI also govern things like how classes/objects are laid out in C++. This is necessary if you want to be able to pass
object references across module boundaries or if you want to mix code compiled with different compilers. ..."

Another way to state it:
The ABI describes the underlying nuts and bolts of the mechanisms that systems software such as the compiler, linker, loader – IOW, the toolchain – needs to be aware of: data representation, function calling and return conventions, register usage conventions, stack construction, stack frame layout, argument passing – formal linkage, encoding of object files (eg. ELF), etc.

Having a minimal understanding of :

a CPU’s ABI – which includes stuff like
- it’s procedure calling convention
- stack frame layout
- ISA (Instruction Set Architecture)
- registers and their internal usage, and,
bare minimal assembly language for that CPU,

helps to no end when debugging a complex situation at the level of the “metal”.

With this in mind, here are a few links to various CPU ABI documents, and other related tutorials:

x86 ABI
- x86 ABI on GitHub
- StackOverflow answer on this
ARM-32 (Aarch32)
- ARM ABI
- ARM APCS (ARM Procedure Call Standard)
ARM-64 (Aarch64)
MIPS ABI
- SYSTEM V APPLICATION BINARY INTERFACE : MIPS RISC Processor

However, especially for folks new to it, reading the ABI docs can be quite a daunting task! Below, I hope to provide some simplifications which help one gain the essentials without getting completely lost in details (that probably do not matter).

Often, when debugging, one finds that the issue lies with how exactly a function is being called – we need to examine the function parameters, locals, return value. This can even be done when all we have is a binary dump – like the well known core file (see man 5 core for details).

Intel x86 – the IA-32

On the IA-32, the stack is used for function calling, parameter passing, locals.

Stack Frame Layout on IA-32

[...                            <-- Bottom; higher addresses.
PARAMS 
...]              
RET addr 
[SFP]                      <-- SFP = pointer to previous stack frame [EBP] [optional]
[... 
LOCALS 
...]                           <-- ESP: Top of stack; in effect, lowest stack address

Intel 64-bit – the x86_64

On this processor family, the situation is far more optimized. Registers are used to pass along the first six arguments to a function; the seventh onwards is passed on the stack. The stack layout is very similar to that on IA-32.

Register Set

<Original image: from Intel manuals>

Actually, the above register-set image applies to all x86 processors – it’s an overlay model:

the 32-bit registers are literally “half” the size and their prefix changes from R to E
the 16-bit registers are half the size of the 32-bit and their prefix changes from E to A
the 8-bit registers are half the size of the 16-bit and their prefix changes from A to AH, AL.

The first six arguments are passed in the following registers as follows:

RDI, RSI, RDX, RCX, R8, R9

(By the way, looking up the registers is easy from within GDB: just use it’s info registers command).

An example from this excellent blog “Stack frame layout on x86-64” will help illustrate:

On the x86_64, call a function that receives 8 parameters – ‘a, b, c, d, e, f, g, h’. The situation looks like this now:

What is this “red zone” thing above? From the ABI doc:

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

Basically it’s an optimization for the compiler folks: when a ‘leaf’ function is called (one that does not invoke any other functions), the compiler will generate code to use the 128 byte area as ‘scratch’ for the locals. This way we save two machine instructions to lower and raise the stack on function prologue (entry) and epilogue (return).

ARM-32 (Aarch32)

<Credits: some pics shown below are from here : ‘ARM University Program’, YouTube. Please see it for details>.

The Aarch32 processor family has seven modes of operation: of these, six of them are privileged and only one – ‘User’ – is the non-privileged mode, in which user application processes run.

When a process or thread makes a system call, the compiler has the code issue the SWI machine instruction which puts the CPU into Supervisor (SVC) mode.

The Aarch32 Register Set:

Function Calling on the ARM-32

The Aarch32 ABI reveals that it’s registers are used as follows:

Register	APCS name	Purpose
R0	a1	Argument registers, passing values, don’t need to be preserved, results are usually returned in R0
R1	a2
R2	a3
R3	a4
R4	v1	Variable registers, used internally by functions, must be preserved if used. Essentially, r4 to r9 hold local variables as register variables. (Also, in case of the SWI machine instruction (syscall), r7 holds the syscall #).
R5	v2
R6	v3
R7	v4
R8	v5
R9	v6
R10	sl	Stack Limit / stack chunk handle
R11	fp	Frame Pointer, contains zero, or points to stack backtrace structure
R12	ip	Procedure entry temporary workspace
R13	sp	Stack Pointer, fully descending stack, points to lowest free word
R14	lr	Link Register, return address at function exit
R15	pc	Program Counter

(APCS = ARM Procedure Calling Standard)

When a function is called on the ARM-32 family, the compiler generates assembly code such that the first four integer or pointer arguments are placed in the registers r0, r1, r2 and r3. If the function is to receive more than four parameters, the fifth one onwards goes onto the stack. If enabled, the frame pointer (very useful for accurate stack unwinding/backtracing) is in r11. The last three registers are always used for special purposes:

r13: stack pointer register
r14: link register; in effect, return (text/code) address
r15: the program counter (the PC)

The PSR – Processor State Register – holds the system ‘state’; it is constructed like this:

[Update: 24 Sept 2021]

ARM 64-bit

Ref: ARM Cortex-A Series Programmer’s Guide for ARMv8-A

Execution on the ARMv8 is at one of four Exception Levels (ELn; n=0,1,2,3). It determines the privilege level (just as x86 has 4 rings, and the ARM has seven modes). […] Exception Levels provide a logical separation of software execution privilege that applies across all operating states of the ARMv8 architecture. […] The following is a typical example of what software runs at each Exception level:

EL0: Normal user applications.
EL1: Operating system kernel typically described as privileged.
EL2: Hypervisor.
EL3: Low-level firmware, including the Secure Monitor.

Figure 3.1. Exception levels

ARMv8 Registers and their Usage (ABI)

In addition, the ‘special’ registers:

ARM-64 / A64 / Aarch64 ABI calling conventions

(The following is directly excerpted from the Wikipedia page here: https://en.wikipedia.org/wiki/Calling_convention#ARM_(A64)).

The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as:

x31 (SP): Stack pointer or a zero register, depending on context.
x30 (LR): Procedure link register, used to return from subroutines.
x29 (FP): Frame pointer.
x19 to x29: Callee-saved.
x18 (PR): Platform register. Used for some operating-system-specific special purpose, or an additional caller-saved register.
x16 (IP0) and x17 (IP1): Intra-Procedure-call scratch registers.
x9 to x15: Local variables, caller saved.
x8 (XR): Indirect return value address.
x0 to x7: Argument values passed to and results returned from a subroutine.

All registers starting with x have a corresponding 32-bit register prefixed with w. Thus, a 32-bit x0 is called w0.

Similarly, the 32 floating-point registers are allocated as:^[3]

v0 to v7: Argument values passed to and results returned from a subroutine.
v8 to v15: callee-saved, but only the bottom 64 bits need to be preserved.
v16 to v31: Local variables, caller saved.

Hope this helps!

kaiwanTECH: Kaiwan's Tech Blog