Skip to content

Segmentation fault on CPU fallback when Vulkan prebuilt binary is incompatible (Linux x64, no GPU) #554

@HalAssistant

Description

@HalAssistant

Issue Description

When running on a Linux x64 machine without a GPU, node-llama-cpp correctly detects that the Vulkan prebuilt binary is incompatible and falls back to CPU-only mode. However, inference then crashes with a segmentation fault during model loading/inference.

The error message indicates the fallback is happening:

[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU

But immediately after, the process crashes with SIGSEGV.

Expected Behavior

CPU-only inference should work on machines without GPU support. The fallback path should successfully load GGUF models and perform inference using CPU.

Actual Behavior

Process crashes with segmentation fault after Vulkan fallback to CPU mode.

Environment

Component Version
OS Ubuntu 24.04.3 LTS (Noble Numbat)
Kernel Linux 6.12.67 x86_64
glibc 2.39
node-llama-cpp 3.15.1
Bun 1.3.8
Node.js v22.22.0
CPU Intel Xeon Platinum 8259CL @ 2.50GHz (2 cores)
CPU Flags sse42, popcnt, avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vnni
GPU None (VM without GPU passthrough)
Vulkan packages libvulkan1 1.3.275.0, mesa-vulkan-drivers 25.2.8

Steps to Reproduce

  1. Set up a Linux x64 VM/machine without GPU
  2. Install Vulkan libraries (common on Ubuntu): libvulkan1, mesa-vulkan-drivers
  3. Install node-llama-cpp via npm or bun
  4. Attempt to load any GGUF model

Minimal reproduction:

mkdir test-repro && cd test-repro

cat > package.json << 'EOF'
{
  "name": "test-repro",
  "type": "module",
  "dependencies": {
    "node-llama-cpp": "^3.15.1"
  }
}
EOF

cat > test.mjs << 'EOF'
import { getLlama } from 'node-llama-cpp';

console.log('Initializing Llama...');
const llama = await getLlama();
console.log('GPU:', llama.gpu);

// Download and load a small test model
const model = await llama.loadModel({
  modelPath: "hf:HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF"
});
console.log('Model loaded successfully');

await model.dispose();
await llama.dispose();
EOF

npm install
node test.mjs

Crash Output

[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
============================================================
Bun v1.3.8 (b64edcb4) Linux x64
Linux Kernel v6.12.67 | glibc v2.39
CPU: sse42 popcnt avx avx2 avx512
Elapsed: 45036ms | User: 59099ms | Sys: 6506ms
RSS: 46.33MB | Peak: 5.35GB | Commit: 46.33MB | Faults: 280 | Machine: 7.76GB

panic: Segmentation fault at address 0x7FD0B4A3D840

Analysis

The crash appears to occur in the CPU inference path during GGUF model loading after the Vulkan binary detection fails. Key observations:

  1. Vulkan detection works correctly — The system properly identifies that Vulkan isn't usable
  2. Fallback triggers — The "falling back to using no GPU" message confirms fallback initiated
  3. getLlama() may succeed — The crash happens during loadModel(), not initialization
  4. CPU path crashes — The SIGSEGV occurs when llama.cpp attempts to load model tensors

The crash output shows ~45 seconds elapsed time and memory peaking at 5.35GB before the segfault, suggesting the crash occurs during model weight loading/processing, not during initial setup.

This suggests the CPU-only inference code path in the prebuilt binary has an issue on this platform/configuration, likely in the tensor loading or SIMD-accelerated computation paths.

Possible Causes

  • The prebuilt CPU binary may have been compiled with incompatible assumptions
  • GGML_CPU_REPACK might be causing issues (similar to llama.cpp#16479)
  • Memory alignment or SIMD instruction issues on this CPU variant

Workaround Attempts

  • Explicit gpu: false: Cannot test easily without modifying consuming library code
  • Building from source: Not yet attempted

Questions

  1. Is there a known working configuration for CPU-only Linux x64?
  2. Would building from source with specific cmake options (e.g., GGML_CPU_REPACK=OFF) help?
  3. Are there prebuilt binaries specifically for CPU-only (no Vulkan fallback path)?

Related Issues

  • llama.cpp#16479 - Segfault on q4_0 repacking (Windows AVX2, similar symptom)

Happy to provide additional debug output with NODE_LLAMA_CPP_DEBUG=true if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions