Machine Learning AI Agent Platform
Storm2
⚡ Staff ML Engineer, AI Agents – Wealth Management Infrastructure 🌍 San Francisco Bay Area (Hybrid) 💲 $250,000 – $300,000 Base + 20% bonus + Equity
The Company
Storm2’s client is a Series A-stage company building the AI infrastructure layer for institutional wealth management. They’re not selling a chatbot or a demo. Their systems run live in regulated environments, embedded into how some of the world’s largest financial institutions serve clients day to day.
The Role
A lot of “AI engineer” roles right now are about wrapping APIs and writing prompts. This one is not.
You’d be working at the level where agent systems are actually built: designing evaluation frameworks that determine whether a model is safe and reliable enough to operate in a live financial environment, building the orchestration and tooling that makes agents work at scale, and translating the latest LLM capabilities into production systems that actually hold up under enterprise constraints. The evals piece is central, not an afterthought. If an agent is advising on client suitability or supporting portfolio decisions, you need rigorous ways to know whether it’s working and when it’s failing.
The context is a demanding one. Enterprise deployment means security requirements, latency constraints, multi-tenant architecture, and partners who need stable APIs rather than moving targets. You’ll be building for that reality from day one, not as a later phase.
What makes this role interesting is the combination: enough research exposure to stay close to what LLMs are becoming capable of, paired with the engineering discipline to make those capabilities reliable in a high-stakes domain. If you’ve mostly lived on one side of that line, this will push you.
What you’ll be working on:
- Designing and building the AI Agent Platform: tool use, planning, memory, orchestration
- Building evaluation and benchmarking frameworks to assess agent quality, reliability, and safety in production
- LLM orchestration, prompt management, and workflow execution infrastructure
- APIs and platform abstractions for enterprise and external partners
- Self-hosted and multi-tenant deployment infrastructure with real enterprise constraints
- Bridging new LLM capabilities into stable, production-grade financial workflows
- Improving observability, failure handling, and reliability across agent systems
What you’ll bring:
- 7+ years building production ML or backend systems for ML-powered products
- Hands-on experience with LLMs, agent frameworks, or applied ML systems in production
- Experience building evaluation or benchmarking systems for LLMs or ML
- Strong Python and modern ML tooling
- Systems thinking at the level of latency, failure modes, and reliability tradeoffs
- Based in or willing to relocate to the Bay Area
Strong plus:
- Experience with self-hosted models or enterprise AI deployments
- Background in distributed systems or data infrastructure
- Prior exposure to financial or other high-stakes regulated domains
📧 Click ‘Easy Apply’ or email thomas.hill@storm2.com
⚡ Storm2 is a specialist FinTech recruitment firm with clients across Europe, APAC, and North America. Visit storm2.com or follow us on LinkedIn for the latest roles and intel.


