Philipp Moritz, Tyler Griggs, and the SkyRL Team

🗓️ Posted: October 21, 2025

<aside>

We are happy to announce SkyRL tx 0.0.3! This is the first release that has full MoE support and much better support for checkpointing. It also has an initial but still slow implementation of sampling.

</aside>

Updates

We are now almost ready to support full reinforcement learning runs — for the remaining issues see this issue. If you want to help with this effort or extend the functionality, join us!

Roadmap

Some items from the roadmap we published in the 0.0.2 release are still in progress. We are working towards a 0.1 release, which will focus on documentation, making it easier to run SkyRL tx as a service, improving the performance, and implementing missing features. You can find a list of concrete issues we are working on here. We welcome contributions!

Going forward, we are also planning to port the model definitions to PyTorch. For the time being, the existing engine will run on Jax with torchax since Jax is great for prototyping scale-out and model sharding, but we are also planning to make the engine run fully under PyTorch going forward (and we welcome contributions!). Ideally we will find a common API for the engine that allows us to abstract out the differences between PyTorch and Jax when it comes to sharding the model, CUDA graphs, communication and kernels, but the details still need to be worked out.

Running MoE training

You can now train MoE models out-of-the box, e.g. training a Qwen/Qwen3-Coder-30B-A3B model on 8 x H100 GPUs or 8 x L40s GPUs (the parameters are conservative, so if you have enough memory, you can increase the max lora adapters or rank or the micro batch size):

# Start the SkyRL tx server (in <https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-tx>)
uv run --extra gpu --extra tinker -m tx.tinker.api --base-model Qwen/Qwen3-Coder-30B-A3B-Instruct --max-lora-adapters 1 --max-lora-rank 1 --tensor-parallel-size 8 --micro-batch-size 1 --no-shard-attention-heads

# Launch the training run (in <https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/recipes>)
export TINKER_API_KEY="nil"
export WANDB_API_KEY="<your key>"
uv run --with wandb --with tinker sl_loop.py base_url=http://localhost:8000 lora_rank=1 max_length=512 batch_size=8

Screenshot 2025-10-15 at 1.02.54 AM.png

Running training and sampling