Segmentation fault on CPU fallback when Vulkan prebuilt binary is incompatible (Linux x64, no GPU)

### Issue Description

When running on a Linux x64 machine **without a GPU**, node-llama-cpp correctly detects that the Vulkan prebuilt binary is incompatible and falls back to CPU-only mode. However, inference then crashes with a segmentation fault during model loading/inference.

The error message indicates the fallback is happening:
```
[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
```

But immediately after, the process crashes with SIGSEGV.

### Expected Behavior

CPU-only inference should work on machines without GPU support. The fallback path should successfully load GGUF models and perform inference using CPU.

### Actual Behavior

Process crashes with segmentation fault after Vulkan fallback to CPU mode.

### Environment

| Component | Version |
|-----------|---------|
| **OS** | Ubuntu 24.04.3 LTS (Noble Numbat) |
| **Kernel** | Linux 6.12.67 x86_64 |
| **glibc** | 2.39 |
| **node-llama-cpp** | 3.15.1 |
| **Bun** | 1.3.8 |
| **Node.js** | v22.22.0 |
| **CPU** | Intel Xeon Platinum 8259CL @ 2.50GHz (2 cores) |
| **CPU Flags** | sse42, popcnt, avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vnni |
| **GPU** | None (VM without GPU passthrough) |
| **Vulkan packages** | libvulkan1 1.3.275.0, mesa-vulkan-drivers 25.2.8 |

### Steps to Reproduce

1. Set up a Linux x64 VM/machine without GPU
2. Install Vulkan libraries (common on Ubuntu): `libvulkan1`, `mesa-vulkan-drivers`
3. Install node-llama-cpp via npm or bun
4. Attempt to load any GGUF model

**Minimal reproduction:**

```bash
mkdir test-repro && cd test-repro

cat > package.json << 'EOF'
{
  "name": "test-repro",
  "type": "module",
  "dependencies": {
    "node-llama-cpp": "^3.15.1"
  }
}
EOF

cat > test.mjs << 'EOF'
import { getLlama } from 'node-llama-cpp';

console.log('Initializing Llama...');
const llama = await getLlama();
console.log('GPU:', llama.gpu);

// Download and load a small test model
const model = await llama.loadModel({
  modelPath: "hf:HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF"
});
console.log('Model loaded successfully');

await model.dispose();
await llama.dispose();
EOF

npm install
node test.mjs
```

### Crash Output

```
[node-llama-cpp] The prebuilt binary for platform "linux" "x64" with Vulkan support is not compatible with the current system, falling back to using no GPU
============================================================
Bun v1.3.8 (b64edcb4) Linux x64
Linux Kernel v6.12.67 | glibc v2.39
CPU: sse42 popcnt avx avx2 avx512
Elapsed: 45036ms | User: 59099ms | Sys: 6506ms
RSS: 46.33MB | Peak: 5.35GB | Commit: 46.33MB | Faults: 280 | Machine: 7.76GB

panic: Segmentation fault at address 0x7FD0B4A3D840
```

### Analysis

The crash appears to occur in the CPU inference path **during GGUF model loading** after the Vulkan binary detection fails. Key observations:

1. **Vulkan detection works correctly** — The system properly identifies that Vulkan isn't usable
2. **Fallback triggers** — The "falling back to using no GPU" message confirms fallback initiated  
3. **`getLlama()` may succeed** — The crash happens during `loadModel()`, not initialization
4. **CPU path crashes** — The SIGSEGV occurs when llama.cpp attempts to load model tensors

The crash output shows ~45 seconds elapsed time and memory peaking at 5.35GB before the segfault, suggesting the crash occurs during model weight loading/processing, not during initial setup.

This suggests the CPU-only inference code path in the prebuilt binary has an issue on this platform/configuration, likely in the tensor loading or SIMD-accelerated computation paths.

### Possible Causes

- The prebuilt CPU binary may have been compiled with incompatible assumptions
- GGML_CPU_REPACK might be causing issues (similar to [llama.cpp#16479](https://github.com/ggml-org/llama.cpp/issues/16479))
- Memory alignment or SIMD instruction issues on this CPU variant

### Workaround Attempts

- **Explicit `gpu: false`**: Cannot test easily without modifying consuming library code
- **Building from source**: Not yet attempted

### Questions

1. Is there a known working configuration for CPU-only Linux x64?
2. Would building from source with specific cmake options (e.g., `GGML_CPU_REPACK=OFF`) help?
3. Are there prebuilt binaries specifically for CPU-only (no Vulkan fallback path)?

### Related Issues

- [llama.cpp#16479](https://github.com/ggml-org/llama.cpp/issues/16479) - Segfault on q4_0 repacking (Windows AVX2, similar symptom)

---

Happy to provide additional debug output with `NODE_LLAMA_CPP_DEBUG=true` if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segmentation fault on CPU fallback when Vulkan prebuilt binary is incompatible (Linux x64, no GPU) #554

Issue Description

Expected Behavior

Actual Behavior

Environment

Steps to Reproduce

Crash Output

Analysis

Possible Causes

Workaround Attempts

Questions

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Version
OS	Ubuntu 24.04.3 LTS (Noble Numbat)
Kernel	Linux 6.12.67 x86_64
glibc	2.39
node-llama-cpp	3.15.1
Bun	1.3.8
Node.js	v22.22.0
CPU	Intel Xeon Platinum 8259CL @ 2.50GHz (2 cores)
CPU Flags	sse42, popcnt, avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vnni
GPU	None (VM without GPU passthrough)
Vulkan packages	libvulkan1 1.3.275.0, mesa-vulkan-drivers 25.2.8

Uh oh!

Segmentation fault on CPU fallback when Vulkan prebuilt binary is incompatible (Linux x64, no GPU) #554

Description

Issue Description

Expected Behavior

Actual Behavior

Environment

Steps to Reproduce

Crash Output

Analysis

Possible Causes

Workaround Attempts

Questions

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions