Precompute compressed lenghts of the training data by rdno · Pull Request #2 · tsoding/data-mining-in-c

rdno · 2024-01-14T21:06:30Z

Hi,

I came across this implementation. I had an idea to speed up the computations. I don't expect you to merge it.

By pre-computing and storing the compressed lengths of the training data, one deflate call can be avoided in ncd function. I've observed ~33% performance increase.

Great project.

Thanks.

By precomputing and storing the compressed lengths one deflate call can be avoided in `ncd` function.

gyreas · 2024-01-28T12:02:02Z

What about precomputing the compressed lengths of the test data while keeping the original text around (same for the training data) as well? (Possibly pouring some threads for that.) So, the only final computation will happen in the combined. I'm not too familiar with C, but I used a similar albeit naive approach in Kotlin, which is pathetically slow.

[edit]
I poured actual threads, got ~5secs per test sample (still slow for me) using my suggestion. will try SIMD next

Precompute compressed lenghts of the training data.

fb3504e

By precomputing and storing the compressed lengths one deflate call can be avoided in `ncd` function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Precompute compressed lenghts of the training data#2

Precompute compressed lenghts of the training data#2
rdno wants to merge 1 commit intotsoding:mainfrom
rdno:precompute_compressed_lengths_of_training_data

rdno commented Jan 14, 2024

Uh oh!

gyreas commented Jan 28, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

rdno commented Jan 14, 2024

Uh oh!

gyreas commented Jan 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gyreas commented Jan 28, 2024 •

edited

Loading