Philipp Moritz, Tyler Griggs, and the SkyRL Team
๐๏ธ Posted: November 3, 2025
<aside>
We are happy to announce SkyRL tx v0.1.0!
SkyRL tx is a unified training and inference engine that implements the Tinker API and allows people to set up their own Tinker-like service running on their own hardware.
This is the first version that supports RL end-to-end, and also sampling is now significantly faster.
</aside>
<aside> ๐
We are giving a talk on SkyRL tx: A unified training and inference engine on November 4 at 4pm at this yearโs Ray Summit. If you are around, come say hi!
</aside>
There are a number of PRs that are currently in-flight and will be part of the next release, including supporting an external inference engine (#568), LoRA support for the embeddings (#511), support for prompt logprobs (#577), enabling full parameter optimization (#611), covering more scenarios for the sampling (#613). Thanks Guido, Lucas and Atem for the contributions!
As always, we welcome more contributions! We are currently falling a little behind in extending the documentation, if you are excited about contributing to it, that would be particularly welcome. Similarly welcome is implementing more functionality of the Tinker API, performance optimizations, and any of the currently open tasks here or really anything you would like to see implemented.
Since this is the first release that supports RL end-to-end, we conclude the blog post with a instructions on how to run it on 8xH100. First clone https://github.com/NovaSky-AI/SkyRL and in the skyrl-tx folder, start the engine with
uv run --extra gpu --extra tinker -m tx.tinker.api --base-model Qwen/Qwen3-4B --max-lora-adapters 3 --max-lora-rank 1 --tensor-parallel-size 8 --train-micro-batch-size 8 > out.log
Then, clone https://github.com/thinking-machines-lab/tinker-cookbook and in the tinker_cookbook/recipes folder, run
export TINKER_API_KEY=dummy
export WANDB_API_KEY=<your key>
uv run --with wandb --with tinker rl_loop.py base_url=http://localhost:8000 model_name="Qwen/Qwen3-4B" lora_rank=1 max_length=1024 save_every=100
You should get a reward curve like the following: