Philipp Moritz, Tyler Griggs, and the SkyRL Team

🗓️ Posted: October 21, 2025

<aside>

We are happy to announce SkyRL tx 0.0.3! This is the first release that has full MoE support and much better support for checkpointing. It also has an initial but still slow implementation of sampling.

</aside>

Updates

We now support saving and loading the optimizer state (#461) and resuming a training run (#503), this means you can now run the full sl_loop.py recipe from the cookbook without changes – thanks Lucas and Guido for the contributions!
We now track checkpoints in the database and can list them (#486) — thanks Atem for the contribution!
We now support storing and loading checkpoints from cloud storage (#466) — thanks Colin for the contribution!
We merged a few fixes so you can train MoE models out of the box (#475, #478, #491)
We now have a very first version of sampling working (#488, #494, #527, #531) — thanks Guido for the contribution! Currently the sampling is slow because it is not batched yet and we also currently do not use jit for the decoding, but we will fix these shortcomings shortly.
We now also support importance_sampling and ppo loss function (#498) — thanks Guido for the contribution! If you are interested in custom loss functions, consider contributing!
We now support separate optimization parameters per request (#470) — thanks Colin for the contribution!

We are now almost ready to support full reinforcement learning runs — for the remaining issues see this issue. If you want to help with this effort or extend the functionality, join us!

Roadmap

Some items from the roadmap we published in the 0.0.2 release are still in progress. We are working towards a 0.1 release, which will focus on documentation, making it easier to run SkyRL tx as a service, improving the performance, and implementing missing features. You can find a list of concrete issues we are working on here. We welcome contributions!

Going forward, we are also planning to port the model definitions to PyTorch. For the time being, the existing engine will run on Jax with torchax since Jax is great for prototyping scale-out and model sharding, but we are also planning to make the engine run fully under PyTorch going forward (and we welcome contributions!). Ideally we will find a common API for the engine that allows us to abstract out the differences between PyTorch and Jax when it comes to sharding the model, CUDA graphs, communication and kernels, but the details still need to be worked out.

Running MoE training

You can now train MoE models out-of-the box, e.g. training a Qwen/Qwen3-Coder-30B-A3B model on 8 x H100 GPUs or 8 x L40s GPUs (the parameters are conservative, so if you have enough memory, you can increase the max lora adapters or rank or the micro batch size):

# Start the SkyRL tx server (in <https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-tx>)
uv run --extra gpu --extra tinker -m tx.tinker.api --base-model Qwen/Qwen3-Coder-30B-A3B-Instruct --max-lora-adapters 1 --max-lora-rank 1 --tensor-parallel-size 8 --micro-batch-size 1 --no-shard-attention-heads

# Launch the training run (in <https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/recipes>)
export TINKER_API_KEY="nil"
export WANDB_API_KEY="<your key>"
uv run --with wandb --with tinker sl_loop.py base_url=http://localhost:8000 lora_rank=1 max_length=512 batch_size=8

Screenshot 2025-10-15 at 1.02.54 AM.png

Running training and sampling