-
-
Notifications
You must be signed in to change notification settings - Fork 164
Description
Issue Description
When running on a Linux x64 machine without a GPU, node-llama-cpp correctly detects that the Vulkan prebuilt binary is incompatible and falls back to CPU-only mode. However, inference then crashes with a segmentation fault during model loading/inference.
The error message indicates the fallback is happening:
[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
But immediately after, the process crashes with SIGSEGV.
Expected Behavior
CPU-only inference should work on machines without GPU support. The fallback path should successfully load GGUF models and perform inference using CPU.
Actual Behavior
Process crashes with segmentation fault after Vulkan fallback to CPU mode.
Environment
| Component | Version |
|---|---|
| OS | Ubuntu 24.04.3 LTS (Noble Numbat) |
| Kernel | Linux 6.12.67 x86_64 |
| glibc | 2.39 |
| node-llama-cpp | 3.15.1 |
| Bun | 1.3.8 |
| Node.js | v22.22.0 |
| CPU | Intel Xeon Platinum 8259CL @ 2.50GHz (2 cores) |
| CPU Flags | sse42, popcnt, avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vnni |
| GPU | None (VM without GPU passthrough) |
| Vulkan packages | libvulkan1 1.3.275.0, mesa-vulkan-drivers 25.2.8 |
Steps to Reproduce
- Set up a Linux x64 VM/machine without GPU
- Install Vulkan libraries (common on Ubuntu):
libvulkan1,mesa-vulkan-drivers - Install node-llama-cpp via npm or bun
- Attempt to load any GGUF model
Minimal reproduction:
mkdir test-repro && cd test-repro
cat > package.json << 'EOF'
{
"name": "test-repro",
"type": "module",
"dependencies": {
"node-llama-cpp": "^3.15.1"
}
}
EOF
cat > test.mjs << 'EOF'
import { getLlama } from 'node-llama-cpp';
console.log('Initializing Llama...');
const llama = await getLlama();
console.log('GPU:', llama.gpu);
// Download and load a small test model
const model = await llama.loadModel({
modelPath: "hf:HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF"
});
console.log('Model loaded successfully');
await model.dispose();
await llama.dispose();
EOF
npm install
node test.mjsCrash Output
[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
============================================================
Bun v1.3.8 (b64edcb4) Linux x64
Linux Kernel v6.12.67 | glibc v2.39
CPU: sse42 popcnt avx avx2 avx512
Elapsed: 45036ms | User: 59099ms | Sys: 6506ms
RSS: 46.33MB | Peak: 5.35GB | Commit: 46.33MB | Faults: 280 | Machine: 7.76GB
panic: Segmentation fault at address 0x7FD0B4A3D840
Analysis
The crash appears to occur in the CPU inference path during GGUF model loading after the Vulkan binary detection fails. Key observations:
- Vulkan detection works correctly — The system properly identifies that Vulkan isn't usable
- Fallback triggers — The "falling back to using no GPU" message confirms fallback initiated
getLlama()may succeed — The crash happens duringloadModel(), not initialization- CPU path crashes — The SIGSEGV occurs when llama.cpp attempts to load model tensors
The crash output shows ~45 seconds elapsed time and memory peaking at 5.35GB before the segfault, suggesting the crash occurs during model weight loading/processing, not during initial setup.
This suggests the CPU-only inference code path in the prebuilt binary has an issue on this platform/configuration, likely in the tensor loading or SIMD-accelerated computation paths.
Possible Causes
- The prebuilt CPU binary may have been compiled with incompatible assumptions
- GGML_CPU_REPACK might be causing issues (similar to llama.cpp#16479)
- Memory alignment or SIMD instruction issues on this CPU variant
Workaround Attempts
- Explicit
gpu: false: Cannot test easily without modifying consuming library code - Building from source: Not yet attempted
Questions
- Is there a known working configuration for CPU-only Linux x64?
- Would building from source with specific cmake options (e.g.,
GGML_CPU_REPACK=OFF) help? - Are there prebuilt binaries specifically for CPU-only (no Vulkan fallback path)?
Related Issues
- llama.cpp#16479 - Segfault on q4_0 repacking (Windows AVX2, similar symptom)
Happy to provide additional debug output with NODE_LLAMA_CPP_DEBUG=true if helpful.