Skip to content

feat: Linux syscall ABI alignment + ARM64 SMP stability#229

Merged
ryanbreen merged 10 commits intomainfrom
feat/linux-syscall-abi
Feb 18, 2026
Merged

feat: Linux syscall ABI alignment + ARM64 SMP stability#229
ryanbreen merged 10 commits intomainfrom
feat/linux-syscall-abi

Conversation

@ryanbreen
Copy link
Owner

@ryanbreen ryanbreen commented Feb 18, 2026

Summary

  • Align x86_64 and ARM64 syscall numbers with Linux ABI for musl libc compatibility
  • Add envp/auxv stack layout for ELF process creation (musl requirement)
  • Run musl libc hello world on Breenix ARM64
  • Fix multiple ARM64 SMP stability bugs (lock-free framebuffer, frame allocator refactor, PM lock deadlock, waitpid TOCTOU race, IRQ-masked WFI in sys_exit)

Test plan

  • x86_64 boot test: 3/3 pass
  • ARM64 native boot test: pass
  • ARM64 strict boot test: 19/20 pass (1 QEMU serial timing flake)
  • All 5 fork+exit iterations complete in every boot
  • Interactive testing confirms bsh becomes responsive

🤖 Generated with Claude Code

ryanbreen and others added 8 commits February 17, 2026 11:23
…tibility

Renumber 9 conflicting syscalls on x86_64 to match the Linux ABI (Read=0,
Close=3, Fstat=5, Lseek=8, Yield=24, Fork=57, Exit=60, Getdents64=217).
ARM64 retains Breenix's existing numbers unchanged.

Add new syscalls required by musl: readv/writev (vectored I/O), arch_prctl
(x86_64 TLS via FS base), and newfstatat (path-based stat with AT_FDCWD).
Add compatibility stubs: mremap, madvise, ppoll, set_robust_list.

SyscallNumber enum is now purely semantic (no repr(u64)); from_u64() and
libbreenix nr module are split by #[cfg(target_arch)] for each architecture.
Consolidate duplicate syscall constants in signal.rs, pty.rs, termios.rs to
use nr:: imports. Fix hardcoded syscall numbers in userspace programs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ARM64 Linux uses asm-generic/unistd.h numbers which differ completely from
x86_64. This commit renumbers all ARM64 syscalls from legacy Breenix custom
numbers to standard Linux ARM64 ABI, matching what musl libc expects.

Key changes:
- Rewrite ARM64 nr module with Linux numbers (e.g. READ=63, WRITE=64,
  OPENAT=56, CLOSE=57, EXIT=93, CLONE=220, MMAP=222)
- ARM64 has no legacy syscalls (no open/fork/pipe/dup2/select/poll/etc),
  so all libbreenix, libbreenix-libc, and userspace code now uses *at
  variants (openat, faccessat, mkdirat, unlinkat, etc.) via #[cfg] blocks
- Add kernel *at handler functions that validate AT_FDCWD and delegate
  to existing implementations
- Route ARM64 clone(SIGCHLD,0,0,0,0) to sys_fork_aarch64 for fork emulation
- Fix ARM64 inline asm syscall numbers in hello_std_real.rs and
  signal_regs_test.rs
- Replace poll/select with ppoll/pselect6, pause with sigsuspend,
  alarm with setitimer on ARM64
- Remove deprecated get_time_ms() (referenced removed GET_TIME syscall)

Both x86_64 and ARM64 build clean and pass boot tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the exec stack setup to produce the full Linux ABI initial stack
layout that musl libc's _start expects. Previously the stack only had
argc/argv; now it includes:

- envp array (empty for now, with NULL terminator)
- Auxiliary vector (auxv) entries:
  - AT_ENTRY: program entry point
  - AT_PHDR: program header table address
  - AT_PHNUM: number of program headers
  - AT_PHENT: size of each program header entry
  - AT_PAGESZ: 4096
  - AT_RANDOM: pointer to 16 bytes of pseudo-random data on stack
  - AT_NULL: terminator

Also extend LoadedElf structs in both x86_64 and ARM64 ELF loaders to
capture phdr_vaddr, phnum, and phentsize from the ELF binary, scanning
for PT_PHDR headers with fallback to load_base + phoff.

Both architectures build clean and pass boot tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cross-compile stock musl libc for aarch64 and link a C hello world
program that successfully runs on Breenix, printing via musl's
printf/writev. No patches to musl source were needed.

Build infrastructure:
- userspace/c-programs/hello.c: C hello world source
- userspace/c-programs/Makefile: build script using Homebrew LLVM clang
  targeting aarch64-linux-musl, with musl headers and libc.a
- userspace/programs/linker-aarch64-musl.ld: linker script for musl
  programs (includes .init_array, .fini_array, .got, .tdata, .tbss)
- third-party/ added to .gitignore (musl and compiler-rt sources are
  cloned and built locally)

Runtime integration:
- init spawns /bin/hello_musl during startup
- hello_musl added to test binary list

Syscalls exercised by musl during hello world:
- set_tid_address (96): TLS initialization
- writev (66): printf output
- exit_group (94): process exit

This proves the complete musl libc compatibility chain works:
  Linux ARM64 syscall ABI -> envp/auxv stack -> musl _start -> printf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace frame allocator Mutex<Option<MemoryInfo>> with spin::Once<MemoryInfo>,
  eliminating lock contention on the SMP frame allocation hot path. MEMORY_INFO
  is immutable after boot, so a Mutex was architecturally wrong - get() is now
  a single atomic load instead of a spinlock acquire.

- Fix sys_exit_aarch64 to never return to userspace. Previously it set
  need_resched and returned 0, which allowed musl's exit loop to re-enter
  exit(), causing double-terminate and double-decrement of COW page refcounts.
  Now enters WFI loop until timer interrupt context-switches away.

- Add double-terminate guard in Process::terminate() to prevent COW page
  refcount corruption when exit is called multiple times.

- Route ppoll to actual sys_ppoll handler on both x86_64 and ARM64 instead
  of returning ENOSYS. Implements timespec-to-milliseconds conversion and
  delegates to existing sys_poll. Required for BWM's ppoll-based event loop.

- Add VT100 escape sequence generation for special keys (F1-F12, arrows,
  Home, End, Delete) in ARM64 timer interrupt keyboard polling.

- Make syscall dispatch tests architecture-conditional (x86_64 uses Linux
  numbers like exit=60, ARM64 uses asm-generic like exit=93).

- Add cfg guards for ARM64-only libbreenix-libc constants.

- Disable hello_musl launch in init and improve exec error reporting.

Tested: x86_64 3/3 parallel boot tests PASS, ARM64 400 boots with zero
panics/deadlocks (98% detection rate, 2% test-script timing flakes).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BWM's fb_flush() holds the SHELL_FRAMEBUFFER lock for ~400μs during
full-screen pixel copies (3.7MB at ~8 bytes/cycle). sys_fbinfo's
try_lock only spins for ~65μs, so bounce and other programs calling
fbinfo during a BWM flush would get EBUSY and fail.

Fix: cache immutable framebuffer dimensions in a OnceCell<FbInfoCache>
at init time. ARM64 sys_fbinfo now reads from this lock-free cache
(single atomic load) instead of acquiring the framebuffer lock.
x86_64 path unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sys_fbmmap also acquired SHELL_FRAMEBUFFER with a bounded try_lock
spin (65536 iterations / ~65μs) to read framebuffer dimensions. This
failed with EBUSY when BWM was mid-flush (~400μs lock hold), causing
bounce to exit(1) on startup after fb_mmap() returned an error.

Fix: use the same FB_INFO_CACHE (lock-free OnceCell) for sys_fbmmap
dimension reads on ARM64. x86_64 path unchanged.

Also improve sys_exit_aarch64 logging to include PID and process name,
merged into the existing process manager lock acquisition to avoid
redundant locking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
handle_thread_exit previously held the PM lock (which disables interrupts
on all CPUs via DAIFSet) while calling close_all_fds() and cleanup_cow_frames(),
both of which issued 30+ log::debug!() calls acquiring SERIAL and framebuffer
locks. When the render thread held FRAMEBUFFER and tried to log (acquiring
SERIAL), all 4 CPUs would spin with interrupts disabled — a system-wide hang.

Fix: Split process exit into two phases:
- Phase 1 (under PM lock): Mark terminated, extract FD entries, set SIGCHLD,
  get parent thread ID. No logging, no pipe wakeups, no scheduler calls.
- Phase 2 (no PM lock): Close extracted FDs, wake parent, log the exit.

Also removes all logging from close_all_fds() and cleanup_cow_frames() on
both architectures, adds FdTable::take_all() for FD extraction, and adds
TTBR_GONE_K diagnostics for orphaned thread debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
unsafe {
let user_stat = statbuf as *mut Stat;
core::ptr::write(user_stat, stat);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unvalidated user pointer write in newfstatat syscall

Medium Severity

sys_newfstatat writes a Stat structure to statbuf using raw core::ptr::write without verifying the address is in userspace. The only check is statbuf == 0. The kernel has copy_to_user in userptr.rs that validates addresses against the user/kernel boundary, and other syscalls like sys_fbinfo explicitly check against USER_SPACE_MAX. The comment notes Stat doesn't implement Copy so copy_to_user can't be used, but the validation itself is still missing entirely. A userspace process could pass a kernel address as statbuf, causing kernel memory corruption.

Fix in Cursor Fix in Web

tv_sec: i64,
tv_nsec: i64,
}
let ts = unsafe { core::ptr::read(timeout_ts_ptr as *const Timespec) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unvalidated user pointer read in ppoll syscall

Medium Severity

sys_ppoll reads a Timespec from timeout_ts_ptr using raw core::ptr::read without verifying the address is in userspace. The kernel provides copy_from_user in userptr.rs that validates addresses against the user/kernel boundary before reading. A malicious userspace process could pass a kernel address, potentially causing a kernel crash on unmapped memory or reading sensitive kernel data to influence timeout behavior.

Fix in Cursor Fix in Web

_ => {} // StdIo, RegularFile, Directory, Device, etc. — no action needed
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FD close logic duplicated in three locations

Low Severity

The per-FdKind cleanup logic (pipe close, PTY refcount decrement, TCP close, FIFO cleanup, etc.) is now identically duplicated across three functions: close_extracted_fds in process_task.rs, and the x86_64 and ARM64 close_all_fds in process.rs. Adding a new FdKind variant requires updating all three. Extracting a shared helper like close_fd_entry(fd_entry: FileDescriptor) would eliminate this triplication.

Additional Locations (2)

Fix in Cursor Fix in Web

Classic lost-wakeup bug: between checking child state (not terminated)
and setting BlockedOnChildExit, the child could exit on another CPU.
The child's unblock_for_child_exit() found the parent NOT yet blocked,
so the unblock was a no-op. Parent then entered BlockedOnChildExit
forever -- no one would wake it.

Fix: After setting BlockedOnChildExit, immediately re-check child state
before yielding. If the child exited during the race window, self-unblock
and return. This closes the TOCTOU gap completely:
- Child exits BEFORE blocking: re-check catches it
- Child exits AFTER blocking: unblock_for_child_exit succeeds
- Child exits DURING blocking: scheduler lock serializes

Applied to all 4 blocking waitpid paths (wait.rs pid>0, wait.rs pid==-1,
handlers.rs pid>0, handlers.rs pid==-1).

Before: 18/20 strict boot test, init hangs after 1-3 fork+exit iterations
After: 20/20 strict boot test, fork+exit loop progresses correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

return SyscallResult::Err(super::errno::ENOSYS as u64);
}
sys_open(pathname, flags, mode)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*at wrappers reject absolute paths with non-AT_FDCWD dirfd

Medium Severity

All nine *at wrapper functions (sys_openat, sys_faccessat, sys_mkdirat, sys_mknodat, sys_unlinkat, sys_symlinkat, sys_linkat, sys_renameat, sys_readlinkat) return ENOSYS when dirfd != AT_FDCWD, even when the pathname is absolute. Per Linux ABI, absolute paths cause dirfd to be ignored entirely. sys_newfstatat correctly checks dirfd != AT_FDCWD && !path.starts_with('/'), demonstrating the intended behavior — these wrappers are inconsistent with it.

Additional Locations (2)

Fix in Cursor Fix in Web

ARM64 syscall entry masks IRQ (msr daifset, #0x2). Normal syscalls
return through the assembly epilogue which handles context switching
and restores interrupt state via ERET. However, sys_exit_aarch64
enters a WFI loop directly without returning to the epilogue.

With IRQ masked, the timer interrupt is pending but never handled.
The CPU becomes permanently stuck — unable to process deferred thread
requeues or context-switch to other threads. This caused cascading
failures: child processes exiting on a now-dead CPU left their parent
(init) blocked forever in waitpid, making the system unresponsive.

Fix: add `msr daifclr, #2` before WFI to unmask IRQ, allowing the
timer interrupt to fire and context-switch away from the terminated
thread.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit bc13b9c into main Feb 18, 2026
3 of 5 checks passed
@ryanbreen ryanbreen deleted the feat/linux-syscall-abi branch February 18, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant