OpenEnv documentation

FinQA Environment

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.4.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

FinQA Environment

A financial question-answering environment for RL training. Evaluates LLMs on their ability to answer complex financial questions using tool calls on SEC 10-K filing data.

Based on FinQABenchmark from Snorkel AI.

Overview

FinQA tests an agent’s ability to:

  • Explore available financial tables for a company
  • Query table metadata and execute SQL queries
  • Perform calculations on extracted data
  • Submit final answers to financial questions

Dataset: 290 questions from SEC 10-K filings across multiple companies (Alphabet, Amazon, Apple, AT&T, etc.)

Reward: Binary (1.0 for correct answer, 0.0 for incorrect) using fuzzy numerical matching with 1% tolerance.

Note: This dataset is for evaluation only. Do not train on it.

Quick Start

Using Docker

# Build the image (from OpenEnv repo root)
docker build -t finqa-env:latest -f envs/finqa_env/server/Dockerfile .

# Run the server
docker run -p 8000:8000 finqa-env:latest

# To run evaluation script (example model gpt-5)
API_BASE_URL=https://api.openai.com/v1 API_KEY=$OPENAI_API_KEY MODEL=gpt-5 python examples/finqa_inference.py

Local Development

# Install dependencies
uv pip install pandas

# Download data from HuggingFace
cd envs/finqa_env
./download_data.sh

Using the Client

The client uses the MCP protocol and is async by default:

import asyncio
from envs.finqa_env import FinQAEnv, CallToolAction

async def main():
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        # Reset to get a question
        obs = await env.reset()
        question = obs.metadata["question"]
        company = obs.metadata["company"]
        print(f"Question: {question}")
        print(f"Company: {company}")

        # Discover available tools
        tools = await env.list_tools()
        print([t.name for t in tools])

        # Use tools via call_tool (convenience method)
        result = await env.call_tool("get_descriptions", company_name=company)
        print(f"Available tables: {result}")

        # Or use step() with CallToolAction for full observation access
        step_result = await env.step(CallToolAction(
            tool_name="sql_query",
            arguments={
                "company_name": "alphabet",
                "table_name": "us_gaap_ScheduleOfIncomeBeforeIncomeTaxDomesticAndForeignTableTextBlock",
                "query": "SELECT * FROM data WHERE year = '2022'"
            }
        ))
        print(f"Done: {step_result.done}, Reward: {step_result.reward}")

        # Submit answer
        result = await env.call_tool("submit_answer", answer="6.118")

asyncio.run(main())

Available Tools

Tools are auto-discovered via MCP. Use await env.list_tools() to see all available tools at runtime.

Tool Description Arguments
get_descriptions Get list of available table names for a company company_name: str
get_table_info Get table metadata (columns, dtypes, unique values) company_name: str, table_name: str
sql_query Execute SQL query on a table (requires filters) company_name: str, table_name: str, query: str
submit_answer Submit final answer (ends episode) answer: str

Tool Constraints

  • sql_query: Must include filters (WHERE, HAVING, etc.). SELECT * is not allowed.

Environment Variables

Variable Default Description
FINQA_DATA_PATH /app/env/data Path to data directory
FINQA_MAX_STEPS 50 Maximum tool calls per episode
FINQA_TASK finqa Task name

Reward Computation

Rewards use fuzzy numerical matching:

  • Extracts numbers from \boxed{...} format
  • Handles percentages, fractions, and decimals
  • 1% relative tolerance or 0.01 absolute tolerance
  • Returns 1.0 for correct, 0.0 for incorrect

Local Development

# From OpenEnv repo root
cd envs/finqa_env

# Run server locally
FINQA_DATA_PATH=./data uvicorn server.app:app --reload --port 8000

# Test with curl
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset

Integration with RL Frameworks

TRL (GRPO)

import asyncio
from trl import GRPOTrainer
from envs.finqa_env import FinQAEnv

async def rollout_func(prompts, trainer):
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        # Your agent logic here using await env.call_tool(...)
        return {"reward": obs.reward, "completion": completion}

trainer = GRPOTrainer(
    model=model,
    rollout_func=rollout_func,
    ...
)

Project Structure

finqa_env/
β”œβ”€β”€ __init__.py           # Exports FinQAEnv, CallToolAction, ListToolsAction
β”œβ”€β”€ models.py             # FinQAState and tool name constants
β”œβ”€β”€ client.py             # MCP client (subclasses MCPToolClient)
β”œβ”€β”€ pyproject.toml        # Dependencies
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ data/                 # Benchmark data (run download_data.sh)
β”‚   β”œβ”€β”€ benchmark_questions/
β”‚   β”‚   └── finqa.csv
β”‚   └── input_companies/
β”‚       └── [company folders]
β”œβ”€β”€ download_data.sh      # Downloads data from HuggingFace
└── server/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ finqa_environment.py  # MCPEnvironment subclass with @mcp.tool decorators
    β”œβ”€β”€ tools.py              # Tool implementations
    β”œβ”€β”€ rewards.py            # Reward computation
    β”œβ”€β”€ app.py                # FastAPI server
    └── Dockerfile

References

Update on GitHub