Fix CPU performance on Whisper (index_put)

Whisper performance on CPU (through Optimum) is very slow - less than 1 token/s in decode for medium and larger. This is because it's using functional index_put, which is very slow. We should be using the KV cache update logic.