Philipp Moritz, Tyler Griggs, and the SkyRL Team

🗓️ Posted: November 3, 2025

<aside>

We are happy to announce SkyRL tx v0.1.0!

SkyRL tx is a unified training and inference engine that implements the Tinker API and allows people to set up their own Tinker-like service running on their own hardware.

This is the first version that supports RL end-to-end, and also sampling is now significantly faster.

</aside>

<aside> 👋

We are giving a talk on SkyRL tx: A unified training and inference engine on November 4 at 4pm at this year’s Ray Summit. If you are around, come say hi!

</aside>

Updates

Sampling is now much faster, since it is jitted (#539) and properly batched (#560) and sharded (#590). Thanks Guido for the contributions!
We now support different sampling parameters for each request (#567), per request seeds (#586) and stop tokens for sampling (#541). Thanks Pranav and Rohan for the contributions!
After a number of fixes, the RL loop is now running properly (#584 and #591)! Thanks Guido for the contribution!
We implemented gradient checkpointing support (#559) and micro-batching for the sampling (#594).
We now support Postgres as the database (#564 and #600)! Thanks Atem for the contribution!
We also implemented a couple of smaller optimizations (#614, #610) and cleanups (#575). Thanks Murali for the contribution!

There are a number of PRs that are currently in-flight and will be part of the next release, including supporting an external inference engine (#568), LoRA support for the embeddings (#511), support for prompt logprobs (#577), enabling full parameter optimization (#611), covering more scenarios for the sampling (#613). Thanks Guido, Lucas and Atem for the contributions!

As always, we welcome more contributions! We are currently falling a little behind in extending the documentation, if you are excited about contributing to it, that would be particularly welcome. Similarly welcome is implementing more functionality of the Tinker API, performance optimizations, and any of the currently open tasks here or really anything you would like to see implemented.

Running RL end-to-end

Since this is the first release that supports RL end-to-end, we conclude the blog post with a instructions on how to run it on 8xH100. First clone https://github.com/NovaSky-AI/SkyRL and in the skyrl-tx folder, start the engine with

uv run --extra gpu --extra tinker -m tx.tinker.api --base-model Qwen/Qwen3-4B --max-lora-adapters 3 --max-lora-rank 1 --tensor-parallel-size 8 --train-micro-batch-size 8 > out.log

Then, clone https://github.com/thinking-machines-lab/tinker-cookbook and in the tinker_cookbook/recipes folder, run

export TINKER_API_KEY=dummy
export WANDB_API_KEY=<your key>
uv run --with wandb --with tinker rl_loop.py base_url=http://localhost:8000 model_name="Qwen/Qwen3-4B" lora_rank=1 max_length=1024 save_every=100

You should get a reward curve like the following: