Senior Multimodal Engineer - Harnham : Job Details

Senior Multimodal Engineer

Harnham

Job Location : New York,NY, USA

Posted on : 2025-08-05T07:36:05Z

Job Description :

Senior Multimodal Engineer (Visual Language Model & Object Detection)REMOTE - US$200,000 - $250,000 + Equity + Benefits Are you passionate about building cutting-edge AI solutions that solve real-world challenges? A rapidly scaling Series A startup is looking for a Senior Computer Vision Engineer to join their elite engineering team and transform one of the world's oldest and most foundational industries! Backed by top-tier VCs and already working with Fortune 500 clients, this is a rare opportunity to join a technically ambitious and mission-driven company at a critical stage of growth. With a team of experts they're combining advanced ML, computer vision, and multi-modal AI to automate some of the most complex and manual workflows in the construction world. THE COMPANY This AI startup is pioneering intelligent automation across the construction lifecycle-from blueprint analysis to jobsite monitoring leveraging computer vision to bring previously untapped data to life. With $30M in funding and a growing team of 50+ employees, including 20+ engineers, they're launching a suite of AI agents that tackle challenges in:

Image understanding
Document intelligence
Visual Q&A
Multi-modal reasoning

They're reimagining how visual and spatial data is used on-site and across operations, using AI to drive accuracy, speed, and cost efficiency. THE ROLE As a Senior Computer Vision Engineer, you'll take ownership of designing, building, and deploying CV models that process and interpret a wide range of visual data. You'll be instrumental in shaping the company's AI roadmap and embedding sophisticated visual understanding into production-ready tools used by field and office teams alike. You will:

Develop, train, and deploy CV models for object detection, image classification, segmentation, and more
Design scalable ML pipelines to support model training, evaluation, and integration
Work with proprietary datasets and define efficient labeling strategies to improve model performance
Collaborate cross-functionally with ML, backend, and product teams to build user-centric solutions
Apply state-of-the-art CV research, including visual transformers and multi-modal learning techniques, to production use cases
Stay up to date with innovations in LLMs, RAG pipelines, and vision-language models, contributing where appropriate

YOUR SKILLS & EXPERIENCE We're looking for a CV engineer who brings both depth in technical knowledge and a builder's mindset. Must-haves:

MS or PhD in Computer Science, Computer Vision, Robotics, or a related field
5+ years of experience building and deploying computer vision systems in production
Proficiency in Python and frameworks such as PyTorch, TensorFlow, and OpenCV
Deep understanding of image processing, feature extraction, and model evaluation techniques
Hands-on experience with transformer-based CV models (e.g., ViT, DINO, SAM) and/or multi-modal architectures

Bonus skills:

Exposure to LLM-based systems or Retrieval-Augmented Generation (RAG) pipelines
Experience working with blueprints, schematics, or construction-specific datasets

BENEFITS & CULTURE

Highly competitive salary ($200K-$250K) + meaningful equity package
Fully remote flexibility across the US, with optional hybrid setups in NYC
Comprehensive health benefits, medical, dental, vision (100% covered plan available)
Unlimited PTO to support work-life balance
Annual learning & development budget to fuel your growth
Frequent team offsites, meetups, and daily in-office lunches (NYC)
Work with an exceptionally talented and collaborative team solving challenging technical problems

READY TO APPLY? If you're excited about shaping the future of AI-driven automation in construction, we want to hear from you.

Apply Now!

Similar Jobs ( 0)

-- View More Similar Jobs --