by Eric Tang, Shu Liu, Tyler Griggs and the NovaSky Team

🗓️ Posted: July 10, 2025

<aside> 🔍

Training a Multi-Turn Search Agent with SkyRL

In this report, we detail how to train your own multi-turn search agent with SkyRL, based on SearchR1!

**First,** we show how to implement a multi-turn search environment using the simple SkyRL-Gym interface
- This is done with no modifications to the training stack
**Next,** we show how to setup and follow the training recipe from SearchR1
**Finally, we present our multi-turn training results with efficient async rollouts, demonstrating the benefits of multi-turn RL for multi-hop question answering!**

To dig into even more detail, check out the following:

📝 Documentation

📈 WandB report for our training runs (~28 hours for 331 steps on 8xH100)

🤗 Model Checkpoints: 2 turns, 3 turns, 4 turns

Try it out at SkyRL on GitHub:

Screenshot 2025-06-24 at 12.02.02 AM.png

</aside>

Creating a Multi-Turn Search Environment

In this section, we’ll walk step by step through building a multi-turn search environment using SkyRL-Gym. One of the main goals of SkyRL is to make this process lightweight and low-burden for developers—you might be surprised by how little code is needed to get a fully functioning environment.

We’ll cover three parts:

Implementing the search tool
Defining the search environment
Connecting it to the training workflow using SkyRL’s built-in Generator

For more details, you can read a walkthrough of the SkyRL-Gym interface and a guide on creating your own environment in the SkyRL docs.

Step #1: Implement the search tool

We start by creating the search query logic as a tool. In SkyRL-Gym, a tool is simply a function marked with the @tool decorator. Tools are grouped into a ToolGroup, which lets related tools share state, such as database connections or caches.

The main benefit of this design is reusability—a tool can be implemented once and then used in any environment. For example, the same search tool can serve both deep research and software engineering workflows.

In this example, we define the SearchToolGroup that contains a single @tool called search. We reuse Search-R1’s search query logic, decorate it with @tool, and it’s ready to go:

class SearchToolGroup(ToolGroup):
  @tool
  def search(self, query: str) -> str:
    # Search-R1's search implementation here...

Once defined, this tool can be called in any SkyRL-Gym environment without further changes.

Step #2: Implement the search environment

Next, we define the search environment itself: SearchEnv. In SkyRL-Gym, an environment is simply a class that describes the task for the agent to solve, and can integrate one or more tool groups that the agent will use during training.

The environment interface is intentionally lightweight and follows the familiar step()-based pattern seen in Gymnasium-style APIs. In most cases, implementing a custom environment only requires defining a step() method:

class SearchEnv(BaseTextEnv):
    # step() definition pseudocode
    def step(action: str) -> observations, rewards, done