feat: Linux syscall ABI alignment + ARM64 SMP stability#229
Conversation
…tibility Renumber 9 conflicting syscalls on x86_64 to match the Linux ABI (Read=0, Close=3, Fstat=5, Lseek=8, Yield=24, Fork=57, Exit=60, Getdents64=217). ARM64 retains Breenix's existing numbers unchanged. Add new syscalls required by musl: readv/writev (vectored I/O), arch_prctl (x86_64 TLS via FS base), and newfstatat (path-based stat with AT_FDCWD). Add compatibility stubs: mremap, madvise, ppoll, set_robust_list. SyscallNumber enum is now purely semantic (no repr(u64)); from_u64() and libbreenix nr module are split by #[cfg(target_arch)] for each architecture. Consolidate duplicate syscall constants in signal.rs, pty.rs, termios.rs to use nr:: imports. Fix hardcoded syscall numbers in userspace programs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ARM64 Linux uses asm-generic/unistd.h numbers which differ completely from x86_64. This commit renumbers all ARM64 syscalls from legacy Breenix custom numbers to standard Linux ARM64 ABI, matching what musl libc expects. Key changes: - Rewrite ARM64 nr module with Linux numbers (e.g. READ=63, WRITE=64, OPENAT=56, CLOSE=57, EXIT=93, CLONE=220, MMAP=222) - ARM64 has no legacy syscalls (no open/fork/pipe/dup2/select/poll/etc), so all libbreenix, libbreenix-libc, and userspace code now uses *at variants (openat, faccessat, mkdirat, unlinkat, etc.) via #[cfg] blocks - Add kernel *at handler functions that validate AT_FDCWD and delegate to existing implementations - Route ARM64 clone(SIGCHLD,0,0,0,0) to sys_fork_aarch64 for fork emulation - Fix ARM64 inline asm syscall numbers in hello_std_real.rs and signal_regs_test.rs - Replace poll/select with ppoll/pselect6, pause with sigsuspend, alarm with setitimer on ARM64 - Remove deprecated get_time_ms() (referenced removed GET_TIME syscall) Both x86_64 and ARM64 build clean and pass boot tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the exec stack setup to produce the full Linux ABI initial stack layout that musl libc's _start expects. Previously the stack only had argc/argv; now it includes: - envp array (empty for now, with NULL terminator) - Auxiliary vector (auxv) entries: - AT_ENTRY: program entry point - AT_PHDR: program header table address - AT_PHNUM: number of program headers - AT_PHENT: size of each program header entry - AT_PAGESZ: 4096 - AT_RANDOM: pointer to 16 bytes of pseudo-random data on stack - AT_NULL: terminator Also extend LoadedElf structs in both x86_64 and ARM64 ELF loaders to capture phdr_vaddr, phnum, and phentsize from the ELF binary, scanning for PT_PHDR headers with fallback to load_base + phoff. Both architectures build clean and pass boot tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cross-compile stock musl libc for aarch64 and link a C hello world program that successfully runs on Breenix, printing via musl's printf/writev. No patches to musl source were needed. Build infrastructure: - userspace/c-programs/hello.c: C hello world source - userspace/c-programs/Makefile: build script using Homebrew LLVM clang targeting aarch64-linux-musl, with musl headers and libc.a - userspace/programs/linker-aarch64-musl.ld: linker script for musl programs (includes .init_array, .fini_array, .got, .tdata, .tbss) - third-party/ added to .gitignore (musl and compiler-rt sources are cloned and built locally) Runtime integration: - init spawns /bin/hello_musl during startup - hello_musl added to test binary list Syscalls exercised by musl during hello world: - set_tid_address (96): TLS initialization - writev (66): printf output - exit_group (94): process exit This proves the complete musl libc compatibility chain works: Linux ARM64 syscall ABI -> envp/auxv stack -> musl _start -> printf Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace frame allocator Mutex<Option<MemoryInfo>> with spin::Once<MemoryInfo>, eliminating lock contention on the SMP frame allocation hot path. MEMORY_INFO is immutable after boot, so a Mutex was architecturally wrong - get() is now a single atomic load instead of a spinlock acquire. - Fix sys_exit_aarch64 to never return to userspace. Previously it set need_resched and returned 0, which allowed musl's exit loop to re-enter exit(), causing double-terminate and double-decrement of COW page refcounts. Now enters WFI loop until timer interrupt context-switches away. - Add double-terminate guard in Process::terminate() to prevent COW page refcount corruption when exit is called multiple times. - Route ppoll to actual sys_ppoll handler on both x86_64 and ARM64 instead of returning ENOSYS. Implements timespec-to-milliseconds conversion and delegates to existing sys_poll. Required for BWM's ppoll-based event loop. - Add VT100 escape sequence generation for special keys (F1-F12, arrows, Home, End, Delete) in ARM64 timer interrupt keyboard polling. - Make syscall dispatch tests architecture-conditional (x86_64 uses Linux numbers like exit=60, ARM64 uses asm-generic like exit=93). - Add cfg guards for ARM64-only libbreenix-libc constants. - Disable hello_musl launch in init and improve exec error reporting. Tested: x86_64 3/3 parallel boot tests PASS, ARM64 400 boots with zero panics/deadlocks (98% detection rate, 2% test-script timing flakes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BWM's fb_flush() holds the SHELL_FRAMEBUFFER lock for ~400μs during full-screen pixel copies (3.7MB at ~8 bytes/cycle). sys_fbinfo's try_lock only spins for ~65μs, so bounce and other programs calling fbinfo during a BWM flush would get EBUSY and fail. Fix: cache immutable framebuffer dimensions in a OnceCell<FbInfoCache> at init time. ARM64 sys_fbinfo now reads from this lock-free cache (single atomic load) instead of acquiring the framebuffer lock. x86_64 path unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sys_fbmmap also acquired SHELL_FRAMEBUFFER with a bounded try_lock spin (65536 iterations / ~65μs) to read framebuffer dimensions. This failed with EBUSY when BWM was mid-flush (~400μs lock hold), causing bounce to exit(1) on startup after fb_mmap() returned an error. Fix: use the same FB_INFO_CACHE (lock-free OnceCell) for sys_fbmmap dimension reads on ARM64. x86_64 path unchanged. Also improve sys_exit_aarch64 logging to include PID and process name, merged into the existing process manager lock acquisition to avoid redundant locking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
handle_thread_exit previously held the PM lock (which disables interrupts on all CPUs via DAIFSet) while calling close_all_fds() and cleanup_cow_frames(), both of which issued 30+ log::debug!() calls acquiring SERIAL and framebuffer locks. When the render thread held FRAMEBUFFER and tried to log (acquiring SERIAL), all 4 CPUs would spin with interrupts disabled — a system-wide hang. Fix: Split process exit into two phases: - Phase 1 (under PM lock): Mark terminated, extract FD entries, set SIGCHLD, get parent thread ID. No logging, no pipe wakeups, no scheduler calls. - Phase 2 (no PM lock): Close extracted FDs, wake parent, log the exit. Also removes all logging from close_all_fds() and cleanup_cow_frames() on both architectures, adds FdTable::take_all() for FD extraction, and adds TTBR_GONE_K diagnostics for orphaned thread debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| unsafe { | ||
| let user_stat = statbuf as *mut Stat; | ||
| core::ptr::write(user_stat, stat); | ||
| } |
There was a problem hiding this comment.
Unvalidated user pointer write in newfstatat syscall
Medium Severity
sys_newfstatat writes a Stat structure to statbuf using raw core::ptr::write without verifying the address is in userspace. The only check is statbuf == 0. The kernel has copy_to_user in userptr.rs that validates addresses against the user/kernel boundary, and other syscalls like sys_fbinfo explicitly check against USER_SPACE_MAX. The comment notes Stat doesn't implement Copy so copy_to_user can't be used, but the validation itself is still missing entirely. A userspace process could pass a kernel address as statbuf, causing kernel memory corruption.
| tv_sec: i64, | ||
| tv_nsec: i64, | ||
| } | ||
| let ts = unsafe { core::ptr::read(timeout_ts_ptr as *const Timespec) }; |
There was a problem hiding this comment.
Unvalidated user pointer read in ppoll syscall
Medium Severity
sys_ppoll reads a Timespec from timeout_ts_ptr using raw core::ptr::read without verifying the address is in userspace. The kernel provides copy_from_user in userptr.rs that validates addresses against the user/kernel boundary before reading. A malicious userspace process could pass a kernel address, potentially causing a kernel crash on unmapped memory or reading sensitive kernel data to influence timeout behavior.
| _ => {} // StdIo, RegularFile, Directory, Device, etc. — no action needed | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
FD close logic duplicated in three locations
Low Severity
The per-FdKind cleanup logic (pipe close, PTY refcount decrement, TCP close, FIFO cleanup, etc.) is now identically duplicated across three functions: close_extracted_fds in process_task.rs, and the x86_64 and ARM64 close_all_fds in process.rs. Adding a new FdKind variant requires updating all three. Extracting a shared helper like close_fd_entry(fd_entry: FileDescriptor) would eliminate this triplication.
Additional Locations (2)
Classic lost-wakeup bug: between checking child state (not terminated) and setting BlockedOnChildExit, the child could exit on another CPU. The child's unblock_for_child_exit() found the parent NOT yet blocked, so the unblock was a no-op. Parent then entered BlockedOnChildExit forever -- no one would wake it. Fix: After setting BlockedOnChildExit, immediately re-check child state before yielding. If the child exited during the race window, self-unblock and return. This closes the TOCTOU gap completely: - Child exits BEFORE blocking: re-check catches it - Child exits AFTER blocking: unblock_for_child_exit succeeds - Child exits DURING blocking: scheduler lock serializes Applied to all 4 blocking waitpid paths (wait.rs pid>0, wait.rs pid==-1, handlers.rs pid>0, handlers.rs pid==-1). Before: 18/20 strict boot test, init hangs after 1-3 fork+exit iterations After: 20/20 strict boot test, fork+exit loop progresses correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| return SyscallResult::Err(super::errno::ENOSYS as u64); | ||
| } | ||
| sys_open(pathname, flags, mode) | ||
| } |
There was a problem hiding this comment.
*at wrappers reject absolute paths with non-AT_FDCWD dirfd
Medium Severity
All nine *at wrapper functions (sys_openat, sys_faccessat, sys_mkdirat, sys_mknodat, sys_unlinkat, sys_symlinkat, sys_linkat, sys_renameat, sys_readlinkat) return ENOSYS when dirfd != AT_FDCWD, even when the pathname is absolute. Per Linux ABI, absolute paths cause dirfd to be ignored entirely. sys_newfstatat correctly checks dirfd != AT_FDCWD && !path.starts_with('/'), demonstrating the intended behavior — these wrappers are inconsistent with it.
Additional Locations (2)
ARM64 syscall entry masks IRQ (msr daifset, #0x2). Normal syscalls return through the assembly epilogue which handles context switching and restores interrupt state via ERET. However, sys_exit_aarch64 enters a WFI loop directly without returning to the epilogue. With IRQ masked, the timer interrupt is pending but never handled. The CPU becomes permanently stuck — unable to process deferred thread requeues or context-switch to other threads. This caused cascading failures: child processes exiting on a now-dead CPU left their parent (init) blocked forever in waitpid, making the system unresponsive. Fix: add `msr daifclr, #2` before WFI to unmask IRQ, allowing the timer interrupt to fire and context-switch away from the terminated thread. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


Summary
Test plan
🤖 Generated with Claude Code