Philipp Moritz, Hao Chen, Tyler Griggs, and the SkyRL Team
🗓️ Posted: February 8, 2026
<aside>
We are happy to announce SkyRL tx v0.3.0!
SkyRL tx is a LoRA native training and inference engine that implements the Tinker API and allows people to set up a Tinker-like service running on their own hardware.
In this release, we add expert parallel support, DeepSeekV3 model support (e.g. the GLM 4.7 Flash model), a number of optimizations for long sequence lengths, and some smaller features and performance optimizations as well as a few bug fixes.
</aside>
<aside> 📢
We gave a talk on SkyRL tx: A unified training and inference engine at this year’s Ray Summit, check out the recording and slides.
</aside>
We now support expert-parallel sharding, which leads to large performance improvements for MoE models: Weights are sharded by experts, so each shard only needs to process tokens routed to experts on that shard, which leads to a reduction in communication overhead. Also each shard has larger individual matrices since they don’t need to be split along the tensor-parallel dimension (though TP and EP sharding can be combined if desired). This makes the matrix multiplications more efficient. EP sharding was implemented in a series of PRs:
group_offset (see the upstream jax.lax.ragged_dot documentation). This was necessary, since the parameter is not implemented upstream, see this issue. The group_offset parameter is used to only evaluate the subset of tokens assigned to experts on each shard. While this naive version does extra work, integrating it already gives a good speedup.Qwen3-30B-A3B model reduced the step time from 110s for TP=8 to 40s for EP=8. This can further be reduced to 20s by using optimized kernels:We added support for DeepSeekV3 models (#889). Thanks Tanmay for the contribution! This is exciting because many modern models like GLM 4.7 Flash are based on the DeepSeekV3 architecture and can therefore be used now (#1023). Further down in the blog post you can find instructions of how you can train the GLM 4.7 Flash model.
We added a number of significant optimizations for long sequence support:
loss_chunk_size , which is a configurable engine parameter.Going forward, we are also planning to implement sequence / context parallelism to support longer context even better (#1056).
We implemented model unloading support #844 which makes sure models are properly cleaned up if there are no heartbeats from the clients that instantiated these models any more.
Improved Tinker API compatibility: Implement top_p sampling #830, thanks John for the contribution! Also we updated the API to support the latest SDK (#837).
There is a number of PRs to make it possible to use the SkyRL Train backend with the Tinker API server (#871, #978, #1010, #999, #1046, #1047). This is part of the ongoing SkyRL Tinkerification effort and we will soon release a fully functional version of this backend, supporting both megatron and FSDP2. Going forward, we are planning to integrate the API server more natively with SkyRL, possibly with a common PyPI package for all SkyRL Tinker backends (SkyRL train for pytorch and tx for jax). The goal with this restructure will be to expose one common UX and documentation for users (e.g. skyrl tinker --backend="jax" --backend-config="..."), but at the same time ensure the code can be as compact as possible and developers only need to be aware of the code for the subpart of the project they are interested in. As part of this effort, we will also work a lot more on the documentation going forward!
num_samples for the sampling endpoint was not passed through correctly. This is now fixed in #1015, thanks Chirag for reporting and fixing this! We also fixed sampling from the base model using the external inference engine #1039.num_samples > 1 (#1042)'database is locked' errors.sample endpoint directly though curl and not the Tinker SDK. Thanks Jared for fixing this!There are a number of exciting in-flight PRs, like support for the full GLM 4.7 model #989, support for mHC #1008, support for running on a Ray cluster #955 and support for the Olmo 3 model #1043. Thanks to Tanmay, Han-Ju Chen and Jiang for the contributions!
As always, we welcome more contributions!