-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Labels
Description
Cross-validation runs serially (grid search cross-validation, however, does make use of threads). This is a considerable bottleneck for large data-sets/large feature spaces. For example, in recent experiments with 15k samples and perhaps up to 100k features, 10-fold cross-validation can take upwards of two weeks. It would be a good idea to consider parallelizing at the cross-validation fold-level, if possible. For example, perhaps each fold can be gridmaped individually or folds can be run in threads (however, as mentioned, grid search cross-validation already spawns 3 threads, so that would have to be kept in mind).