Software

AI Engineer, Testing & Moderation

NYC, SF, or Remote

Era powers AI across physical devices for major consumer brands, from premium headphones to lifestyle wearables to home objects. When a device "powered by Era" interacts with a user, the experience needs to be meticulous, seamless, and unforgettable. We just raised our seed and are launching with our first partner in Q2 2026.

The Role

Every interaction with an Era-powered device is a moment that either earns trust or loses it. The voice that comes through a pair of headphones, the personality of a wearable: each one needs to feel intentional, polished, and alive. You're the person who makes sure every one of those moments is frame-perfect. You're someone for which good enough, isn't.

You'll sit at the intersection of building and perfecting. You'll work directly with our AI and product engineers to shape experiences as they're developed. You'll bring the external lens that turns good into great and build out the evaluation systems, testing infrastructure, and safety frameworks that ensure everything we ship is nothing short of spectacular.

What You'll Do

Be the quality bar for every Era experience. Review, evaluate, and refine AI interactions across partner products and Era's own devices before they reach users
Further develop Era's AI evaluation and scoring frameworks - automated measurement of response quality, brand voice fidelity, conversational naturalness, and factual accuracy across different model providers and prompt architectures
Test the full pipeline end-to-end. Every link in the chain is an opportunity for the experience to degrade, and you'll catch it
Shape experiences during development. Work alongside AI engineers and designers to refine how experiences feel before they ship
Build safety and content guardrails - systems that protect users and partners across different brand contexts and demographics
Adversarial-test our AI systems - prompt injection, jailbreaking, edge cases that emerge when real users interact with physical devices in unpredictable environments
Build real-time quality monitoring for deployed fleets - systems that track experience quality, detect model degradation, and surface regression across thousands of active devices
Automate quality into the pipeline. Integrate evaluation frameworks into CI/CD so every deployment is validated against quality, safety, and performance baselines

What We're Looking For

You have an obsessive eye for craft. The kind of person who notices when a voice response is 200ms too slow, when a conversational turn feels slightly off-brand
Experience building evaluation and testing frameworks for AI/ML systems - scoring LLM outputs, measuring response quality, catching regressions in nondeterministic systems
Strong programming skills across Python and TypeScript
You've shipped consumer-facing products where quality was non-negotiable
Understanding of AI safety and responsible deployment - content guardrails, demographic-aware moderation, compliance considerations
You bring an external perspective to internal work. You think like the user, the partner, the person picking up the device for the first time

Nice to Have

Experience with LLM evaluation tooling - prompt testing, model comparison frameworks, response quality benchmarking
Background in trust & safety at a consumer-facing platform
Experience evaluating real-time audio/voice systems
Familiarity with compliance frameworks relevant to consumer devices and AI (COPPA, GDPR, EU AI Act)
Red-teaming or adversarial ML experience

NYC, SF, or remote. We have teams in SF and Australia and are building out our NYC presence. Competitive salary, healthcare, and meaningful equity.

Apply for this position