Software
AI Engineer, Testing & Moderation
NYC, SF, or Remote
Era powers AI across physical devices for major consumer brands, from premium headphones to lifestyle wearables to home objects. When a device "powered by Era" interacts with a user, the experience needs to be meticulous, seamless, and unforgettable. We just raised our seed and are launching with our first partner in Q2 2026.
The Role
Every interaction with an Era-powered device is a moment that either earns trust or loses it. The voice that comes through a pair of headphones, the personality of a wearable: each one needs to feel intentional, polished, and alive. You're the person who makes sure every one of those moments is frame-perfect. You're someone for which good enough, isn't.
You'll sit at the intersection of building and perfecting. You'll work directly with our AI and product engineers to shape experiences as they're developed. You'll bring the external lens that turns good into great and build out the evaluation systems, testing infrastructure, and safety frameworks that ensure everything we ship is nothing short of spectacular.
What You'll Do
- Be the quality bar for every Era experience. Review, evaluate, and refine AI interactions across partner products and Era's own devices before they reach users
- Further develop Era's AI evaluation and scoring frameworks - automated measurement of response quality, brand voice fidelity, conversational naturalness, and factual accuracy across different model providers and prompt architectures
- Test the full pipeline end-to-end. Every link in the chain is an opportunity for the experience to degrade, and you'll catch it
- Shape experiences during development. Work alongside AI engineers and designers to refine how experiences feel before they ship
- Build safety and content guardrails - systems that protect users and partners across different brand contexts and demographics
- Adversarial-test our AI systems - prompt injection, jailbreaking, edge cases that emerge when real users interact with physical devices in unpredictable environments
- Build real-time quality monitoring for deployed fleets - systems that track experience quality, detect model degradation, and surface regression across thousands of active devices
- Automate quality into the pipeline. Integrate evaluation frameworks into CI/CD so every deployment is validated against quality, safety, and performance baselines
What We're Looking For
- You have an obsessive eye for craft. The kind of person who notices when a voice response is 200ms too slow, when a conversational turn feels slightly off-brand
- Experience building evaluation and testing frameworks for AI/ML systems - scoring LLM outputs, measuring response quality, catching regressions in nondeterministic systems
- Strong programming skills across Python and TypeScript
- You've shipped consumer-facing products where quality was non-negotiable
- Understanding of AI safety and responsible deployment - content guardrails, demographic-aware moderation, compliance considerations
- You bring an external perspective to internal work. You think like the user, the partner, the person picking up the device for the first time
Nice to Have
- Experience with LLM evaluation tooling - prompt testing, model comparison frameworks, response quality benchmarking
- Background in trust & safety at a consumer-facing platform
- Experience evaluating real-time audio/voice systems
- Familiarity with compliance frameworks relevant to consumer devices and AI (COPPA, GDPR, EU AI Act)
- Red-teaming or adversarial ML experience
NYC, SF, or remote. We have teams in SF and Australia and are building out our NYC presence. Competitive salary, healthcare, and meaningful equity.