Senior Multimodal Engineer - Harnham : Job Details

Senior Multimodal Engineer

Harnham

Job Location : New York,NY, USA

Posted on : 2025-08-05T07:36:05Z

Job Description :
Senior Multimodal Engineer (Visual Language Model & Object Detection)REMOTE - US$200,000 - $250,000 + Equity + Benefits Are you passionate about building cutting-edge AI solutions that solve real-world challenges? A rapidly scaling Series A startup is looking for a Senior Computer Vision Engineer to join their elite engineering team and transform one of the world's oldest and most foundational industries! Backed by top-tier VCs and already working with Fortune 500 clients, this is a rare opportunity to join a technically ambitious and mission-driven company at a critical stage of growth. With a team of experts they're combining advanced ML, computer vision, and multi-modal AI to automate some of the most complex and manual workflows in the construction world. THE COMPANY This AI startup is pioneering intelligent automation across the construction lifecycle-from blueprint analysis to jobsite monitoring leveraging computer vision to bring previously untapped data to life. With $30M in funding and a growing team of 50+ employees, including 20+ engineers, they're launching a suite of AI agents that tackle challenges in:
  • Image understanding
  • Document intelligence
  • Visual Q&A
  • Multi-modal reasoning
They're reimagining how visual and spatial data is used on-site and across operations, using AI to drive accuracy, speed, and cost efficiency. THE ROLE As a Senior Computer Vision Engineer, you'll take ownership of designing, building, and deploying CV models that process and interpret a wide range of visual data. You'll be instrumental in shaping the company's AI roadmap and embedding sophisticated visual understanding into production-ready tools used by field and office teams alike. You will:
  • Develop, train, and deploy CV models for object detection, image classification, segmentation, and more
  • Design scalable ML pipelines to support model training, evaluation, and integration
  • Work with proprietary datasets and define efficient labeling strategies to improve model performance
  • Collaborate cross-functionally with ML, backend, and product teams to build user-centric solutions
  • Apply state-of-the-art CV research, including visual transformers and multi-modal learning techniques, to production use cases
  • Stay up to date with innovations in LLMs, RAG pipelines, and vision-language models, contributing where appropriate
YOUR SKILLS & EXPERIENCE We're looking for a CV engineer who brings both depth in technical knowledge and a builder's mindset. Must-haves:
  • MS or PhD in Computer Science, Computer Vision, Robotics, or a related field
  • 5+ years of experience building and deploying computer vision systems in production
  • Proficiency in Python and frameworks such as PyTorch, TensorFlow, and OpenCV
  • Deep understanding of image processing, feature extraction, and model evaluation techniques
  • Hands-on experience with transformer-based CV models (e.g., ViT, DINO, SAM) and/or multi-modal architectures
Bonus skills:
  • Exposure to LLM-based systems or Retrieval-Augmented Generation (RAG) pipelines
  • Experience working with blueprints, schematics, or construction-specific datasets
BENEFITS & CULTURE
  • Highly competitive salary ($200K-$250K) + meaningful equity package
  • Fully remote flexibility across the US, with optional hybrid setups in NYC
  • Comprehensive health benefits, medical, dental, vision (100% covered plan available)
  • Unlimited PTO to support work-life balance
  • Annual learning & development budget to fuel your growth
  • Frequent team offsites, meetups, and daily in-office lunches (NYC)
  • Work with an exceptionally talented and collaborative team solving challenging technical problems
READY TO APPLY? If you're excited about shaping the future of AI-driven automation in construction, we want to hear from you.
Apply Now!

Similar Jobs ( 0)