by Eric Tang, Shu Liu, Tyler Griggs and the NovaSky Team
🗓️ Posted: July 10, 2025
<aside> 🔍
In this report, we detail how to train your own multi-turn search agent with SkyRL, based on SearchR1!
To dig into even more detail, check out the following:
📝 Documentation
📈 WandB report for our training runs (~28 hours for 331 steps on 8xH100)
🤗 Model Checkpoints: 2 turns, 3 turns, 4 turns
Try it out at SkyRL on GitHub:
</aside>
In this section, we’ll walk step by step through building a multi-turn search environment using SkyRL-Gym. One of the main goals of SkyRL is to make this process lightweight and low-burden for developers—you might be surprised by how little code is needed to get a fully functioning environment.
We’ll cover three parts:
For more details, you can read a walkthrough of the SkyRL-Gym interface and a guide on creating your own environment in the SkyRL docs.
We start by creating the search query logic as a tool
. In SkyRL-Gym, a tool
is simply a function marked with the @tool
decorator. Tools are grouped into a ToolGroup
, which lets related tools share state, such as database connections or caches.
The main benefit of this design is reusability—a tool can be implemented once and then used in any environment. For example, the same search tool can serve both deep research and software engineering workflows.
In this example, we define the SearchToolGroup
that contains a single @tool
called search
. We reuse Search-R1’s search query logic, decorate it with @tool
, and it’s ready to go:
class SearchToolGroup(ToolGroup):
@tool
def search(self, query: str) -> str:
# Search-R1's search implementation here...
Once defined, this tool can be called in any SkyRL-Gym environment without further changes.
Next, we define the search environment itself: SearchEnv
. In SkyRL-Gym, an environment is simply a class that describes the task for the agent to solve, and can integrate one or more tool groups that the agent will use during training.
The environment interface is intentionally lightweight and follows the familiar step()
-based pattern seen in Gymnasium-style APIs. In most cases, implementing a custom environment only requires defining a step()
method:
class SearchEnv(BaseTextEnv):
# step() definition pseudocode
def step(action: str) -> observations, rewards, done