Skip to content

Comments

Precompute compressed lenghts of the training data#2

Open
rdno wants to merge 1 commit intotsoding:mainfrom
rdno:precompute_compressed_lengths_of_training_data
Open

Precompute compressed lenghts of the training data#2
rdno wants to merge 1 commit intotsoding:mainfrom
rdno:precompute_compressed_lengths_of_training_data

Conversation

@rdno
Copy link

@rdno rdno commented Jan 14, 2024

Hi,

I came across this implementation. I had an idea to speed up the computations. I don't expect you to merge it.

By pre-computing and storing the compressed lengths of the training data, one deflate call can be avoided in ncd function. I've observed ~33% performance increase.

Great project.

Thanks.

By precomputing and storing the compressed lengths one deflate call
can be avoided in `ncd` function.
@gyreas
Copy link

gyreas commented Jan 28, 2024

What about precomputing the compressed lengths of the test data while keeping the original text around (same for the training data) as well? (Possibly pouring some threads for that.) So, the only final computation will happen in the combined. I'm not too familiar with C, but I used a similar albeit naive approach in Kotlin, which is pathetically slow.

[edit]
I poured actual threads, got ~5secs per test sample (still slow for me) using my suggestion. will try SIMD next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants