Philipp Moritz, Tyler Griggs, and the SkyRL Team

🗓️ Posted: December 8, 2025

<aside>

We are happy to announce SkyRL tx v0.2!

SkyRL tx is a unified training and inference engine that implements the Tinker API and allows people to set up a Tinker-like service running on their own hardware.

In this release, we add several performance improvements and add support for an external inference engine.

</aside>

<aside> 📢

We gave a talk on SkyRL tx: A unified training and inference engine at this year’s Ray Summit, check out the recording and slides.

</aside>

Updates

We now have support for using an external inference engine, see #568. Thanks Guido for the contribution! Currently we only support vLLM as the external engine, but we would love to also support SGLang or hosted inference providers (if they support inference from custom LoRA models) and welcome contributions. Using an external inference engine can be an attractive option for people who depend on performance optimizations or custom kernels that are not part of the internal engine built in tx. We added some comparisons of runs between the internal and external inference engine at the end of the blog post.
LoRA support for all the different layers is now complete (the remaining layer was embeddings added in #511, and we now support configuring which layers use LoRA in #664). Thanks Guido and Henry for the contributions!
There is a large number of performance improvements:
- Jitting the prefill phase of sampling in #634, and also jitting both prefill and decoding together in #717
- Jitting saving and loading LoRA adapters from the model (#688 and the associated cleanup #669)
- Speeding up device/host synchronization in #642
- Batching database updates in #672
- Optimizing padding in #675
- Fusing forward backward passes and gradient accumulation in #691
Thanks Ago and Henry for the contributions. While there are still a few things that can be optimized (contributions welcome!), we think the single-node implementation is pretty performant now and doesn’t leave too much performance on the table.
Code cleanups and bug fixes: #636, #666, #701, including an important bug fix that affects accuracy #747. As part of fixing the last bug, we also did many more validations of the numerics compared to huggingface and vLLM and have much higher confidence that there are no other important bugs that affect accuracy. Thanks Henry for the contribution!
There is a number of fixes to the typing (#681, #686, #682). Thanks Taro for the contributions! We welcome more contributions on this, see #673.
We better aligned the backend with changes to the official Tinker API and client. In particular, we added session API endpoints (#661) and the weight_info endpoint (#736). Thanks Benji and Ago for the contributions!
We now support logprobs for prompt tokens (#577). Thanks Guido for the contribution!
We now support loading local models from disk (#738). Thanks Daocheng for the contribution!

There are a number of PRs that are currently in-flight and will be part of the next release, including supporting more sampling parameters (#742, #680), supporting the training_runs endpoint (#720), custom loss functions (#698), FSDP support (#674), Llama3 support (#657), enabling full parameter optimization (#611) and supporting migrations for the database (#580). Thanks Sriran, Ago, Thejas, Benji, Taro, Atem and Lukas for the contributions!

As always, we welcome more contributions!

We are currently falling a little behind in extending the documentation. If you are excited about contributing to it, that would be particularly welcome. Similarly welcome is implementing more functionality of the Tinker API, performance optimizations, and any of the currently open tasks here or really anything you would like to see implemented. One bigger item we plan to work on next is multi-node support, if you are interested in collaborating let us know!

Comparing the performance with vLLM

We have recently implemented a number of performance improvements, especially for the sampling code path. In this section, we show some performance comparisons of SkyRL tx’s native inference support against using vLLM as an external inference engine.