FinQA Environment

A financial question-answering environment for RL training. Evaluates LLMs on their ability to answer complex financial questions using tool calls on SEC 10-K filing data.

Based on FinQABenchmark from Snorkel AI.

Overview

FinQA tests an agent’s ability to:

Explore available financial tables for a company
Query table metadata and execute SQL queries
Perform calculations on extracted data
Submit final answers to financial questions

Dataset: 290 questions from SEC 10-K filings across multiple companies (Alphabet, Amazon, Apple, AT&T, etc.)

Reward: Binary (1.0 for correct answer, 0.0 for incorrect) using fuzzy numerical matching with 1% tolerance.

Note: This dataset is for evaluation only. Do not train on it.

Quick Start

Using Docker

# Build the image (from OpenEnv repo root)
docker build -t finqa-env:latest -f envs/finqa_env/server/Dockerfile .

# Run the server
docker run -p 8000:8000 finqa-env:latest

# To run evaluation script (example model gpt-5)
API_BASE_URL=https://api.openai.com/v1 API_KEY=$OPENAI_API_KEY MODEL=gpt-5 python examples/finqa_inference.py

Local Development

# Install dependencies
uv pip install pandas

# Download data from HuggingFace
cd envs/finqa_env
./download_data.sh

Using the Client

The client uses the MCP protocol and is async by default:

import asyncio
from envs.finqa_env import FinQAEnv, CallToolAction

async def main():
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        # Reset to get a question
        obs = await env.reset()
        question = obs.metadata["question"]
        company = obs.metadata["company"]
        print(f"Question: {question}")
        print(f"Company: {company}")

        # Discover available tools
        tools = await env.list_tools()
        print([t.name for t in tools])

        # Use tools via call_tool (convenience method)
        result = await env.call_tool("get_descriptions", company_name=company)
        print(f"Available tables: {result}")

        # Or use step() with CallToolAction for full observation access
        step_result = await env.step(CallToolAction(
            tool_name="sql_query",
            arguments={
                "company_name": "alphabet",
                "table_name": "us_gaap_ScheduleOfIncomeBeforeIncomeTaxDomesticAndForeignTableTextBlock",
                "query": "SELECT * FROM data WHERE year = '2022'"
            }
        ))
        print(f"Done: {step_result.done}, Reward: {step_result.reward}")

        # Submit answer
        result = await env.call_tool("submit_answer", answer="6.118")

asyncio.run(main())

Available Tools

Tools are auto-discovered via MCP. Use await env.list_tools() to see all available tools at runtime.

Tool	Description	Arguments
`get_descriptions`	Get list of available table names for a company	`company_name: str`
`get_table_info`	Get table metadata (columns, dtypes, unique values)	`company_name: str, table_name: str`
`sql_query`	Execute SQL query on a table (requires filters)	`company_name: str, table_name: str, query: str`
`submit_answer`	Submit final answer (ends episode)	`answer: str`

Tool Constraints

sql_query: Must include filters (WHERE, HAVING, etc.). SELECT * is not allowed.

Environment Variables

Variable	Default	Description
`FINQA_DATA_PATH`	`/app/env/data`	Path to data directory
`FINQA_MAX_STEPS`	`50`	Maximum tool calls per episode
`FINQA_TASK`	`finqa`	Task name

Reward Computation

Rewards use fuzzy numerical matching:

Extracts numbers from \boxed{...} format
Handles percentages, fractions, and decimals
1% relative tolerance or 0.01 absolute tolerance
Returns 1.0 for correct, 0.0 for incorrect

Local Development

# From OpenEnv repo root
cd envs/finqa_env

# Run server locally
FINQA_DATA_PATH=./data uvicorn server.app:app --reload --port 8000

# Test with curl
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset

Integration with RL Frameworks

TRL (GRPO)

import asyncio
from trl import GRPOTrainer
from envs.finqa_env import FinQAEnv

async def rollout_func(prompts, trainer):
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        # Your agent logic here using await env.call_tool(...)
        return {"reward": obs.reward, "completion": completion}

trainer = GRPOTrainer(
    model=model,
    rollout_func=rollout_func,
    ...
)

Project Structure

finqa_env/
├── __init__.py           # Exports FinQAEnv, CallToolAction, ListToolsAction
├── models.py             # FinQAState and tool name constants
├── client.py             # MCP client (subclasses MCPToolClient)
├── pyproject.toml        # Dependencies
├── README.md             # This file
├── data/                 # Benchmark data (run download_data.sh)
│   ├── benchmark_questions/
│   │   └── finqa.csv
│   └── input_companies/
│       └── [company folders]
├── download_data.sh      # Downloads data from HuggingFace
└── server/
    ├── __init__.py
    ├── finqa_environment.py  # MCPEnvironment subclass with @mcp.tool decorators
    ├── tools.py              # Tool implementations
    ├── rewards.py            # Reward computation
    ├── app.py                # FastAPI server
    └── Dockerfile

References

Update on GitHub