Update Rust crate burn to 0.19.0 #35
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
0.18.0->0.19.0Release Notes
tracel-ai/burn (burn)
v0.19.1Compare Source
Bug Fixes & Improvements
cudarc, auto-detect CUDA version and fix some 12.8 features (CubeCL #1008) @wingertgedoc_cfgto fixdocs.rsbuilds (#3979) @laggui*_likepreserves dtype (#3953) @crutcherRotaryEncodingsum dimension for 3D input (#3954) @lagguisqueezecheck for output rank > 0 (#3946) @lagguiLinearfor input/output rank 1 (#3966) @lucasmdjlv0.19.0Compare Source
Summary
This release brings major improvements to enable efficient distributed training, quantization, and CPU support in Burn.
To achieve true multi-GPU parallelism, we had to rethink several core systems: we implemented multi-stream execution to keep all GPUs busy, optimized device transfers to avoid unnecessary synchronization, and redesigned our locking strategies to eliminate bottlenecks in autotuning, fusion, and autodiff. We also introduced burn-collective for gradient synchronization and refactored our training loop to support different distributed training strategies.
Additionally, we added comprehensive quantization support, allowing models to use significantly less memory while maintaining performance through fused dequantization and optimized quantized operations.
Finally, we introduced a new CPU backend powered by MLIR and LLVM, bringing the same JIT compilation, autotuning, and fusion capabilities from our GPU backends to CPU execution.
As with previous releases, this version includes various bug fixes, further optimizations and enhanced documentation. Support for ONNX models has also been expanded, with additional operators and bug fixes for better operator coverage.
For more details, check out the release post on our website.
Changelog
Breaking
We've introduced a couple of breaking API changes with this release. The affected interfaces are detailed in the sections below.
Learning Strategy
We refactored the
Learnerto support better distributed training strategies. Instead of registering a list of device(s), you now specify a training strategy.let learner = LearnerBuilder::new(artifact_dir) .metric_train_numeric(AccuracyMetric::new()) .metric_valid_numeric(AccuracyMetric::new()) .metric_train_numeric(LossMetric::new()) .metric_valid_numeric(LossMetric::new()) .with_file_checkpointer(CompactRecorder::new()) - .devices(vec![device.clone()]) + .learning_strategy(LearningStrategy::SingleDevice(device.clone())) .num_epochs(config.num_epochs) .summary() .build( config.model.init::<B>(&device), config.optimizer.init(), config.learning_rate, );Learner Training Result
The
Learnerpreviously lacked an evaluation loop. We extended its return type to include all training states in aTrainingResult, which includes the trained model and a metrics renderer.This enables the renderer to be reused by the new evaluator so that training and evaluation metrics appear together in the TUI dashboard:
Interface Changes
ConfigThe
Configtrait now requiresDebug:BatchNormBatchNormno longer requires the spatial dimension generic:#[derive(Module, Debug)] pub struct ConvBlock<B: Backend> { conv: nn::conv::Conv2d<B>, - norm: BatchNorm<B, 2>, + norm: BatchNorm<B>, pool: Option<MaxPool2d>, activation: nn::Relu, }Backend::seedSeeding is now device-specific:
TensorFor consistency with other methods like
unsqueeze()/unsqueeze_dim(dim),squeeze(dim)was renamed:We've also added a
tensor.squeeze()method which squeezes all singleton dimensions.Finally, we removed
tensor ^ Tsyntax, which was clunky.tensor.t()is also a simple alias fortensor.transpose().Module & Tensor
tensor^Tmagic transpose marker in favor oftensor.t(). (#3452) @crutcherDistribution::DefaulttheDefault::default(). (#3582) @crutcherBatchNorm<B, D>. (#3625) @crutcherNormLayerabstraction for unified normalization layers. (#3630) @crutchersliceoperations (#3748) @antimorabool_xoroperation for boolean tensors (#3785) @crutchertensor.cumsum(dim)first implementation (#3806) @antimoratensor.squareand fast-path int-power exponents. (#3847) @crutcherDatasets & Training
SamplerDatasetdistribution fix; constructors and builder. (#3490) @crutcherAgNewsDataset(#3698) @buttfacautious_weight_decayto AdamW optimizer. (#3869) @crutcherBackends
Truncop (#3860) @moooriBug Fixes
mask_wherebroadcasted line size (#3823) @lagguicubecl::random::seed(seed)(#3878) @lagguiinto_contiguous(#3903) @wingertgeDocumentation & Examples
HuggingfaceDatasetLoader(#3484) @lagguiFixes
ONNX Support
auto_padandceil_modeattrs handling (#3542) @lagguitry_cast_vecwith fallback in proto conversion (#3546) @lagguiprettypleaseto format burn-import output rust files (#3578) @n1ght-huntertrunc,fmodandModONNX ops (#3767) @antimorabool_and(#3829) @moooribool_or,bool_xor(#3839) @moooriEnhancements
Refactoring
nn.activationmodule (#3627) @crutcherQuantizedEncodingtype and unused candle/tch impl (#3645) @lagguiScalarIrto represent scalars generically (#3706) @lagguiburn-nn(#3740) @lagguiburn-optim(#3773) @lagguiShapemanipulations (#3845) @lagguiMiscellaneous
HuggingfaceDatasetLoaderautomatically check for pip (#3479) @Puranjay-del-Mishrarun-checks(#3512) @crutcherFromimplementations forActivationConfigand cleanup tests (#3631) @crutcherdefault-features = false(#3675) @dcrewiburn-no-std-testsand warning clean up (#3671) @antimoraburn-storecrate for model storage with safetensors support (#3666) @antimorarayonbut ndarray doesn't (#3848) @wingertgeConfiguration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.