Skip to content

Consider parallelizing xval #523

@mulhod

Description

@mulhod

Cross-validation runs serially (grid search cross-validation, however, does make use of threads). This is a considerable bottleneck for large data-sets/large feature spaces. For example, in recent experiments with 15k samples and perhaps up to 100k features, 10-fold cross-validation can take upwards of two weeks. It would be a good idea to consider parallelizing at the cross-validation fold-level, if possible. For example, perhaps each fold can be gridmaped individually or folds can be run in threads (however, as mentioned, grid search cross-validation already spawns 3 threads, so that would have to be kept in mind).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions