Student Researcher [Seed LLM Post Training – Reward Modeling] - 2026 Start (PhD) - Bytedance : Job Details

Student Researcher [Seed LLM Post Training – Reward Modeling] - 2026 Start (PhD)

Bytedance

Job Location : San Jose,CA, USA

Posted on : 2025-10-07T01:02:34Z

Job Description :
Overview

Student Researcher [Seed LLM Post Training – Reward Modeling] - 2026 Start (PhD) at ByteDance. PhD internships provide students with opportunities to contribute to products and research, with a dynamic experience blending hands-on learning, community-building, development events, and collaboration with industry experts. Applications are reviewed on a rolling basis; please state your availability (Start date, End date) in your resume.

Responsibilities
  • Design and train reward models that reflect nuanced human preferences in LLM outputs.
  • Develop and evaluate components of a Reward Model System that integrates model predictions, verifier feedback, tool usage, and agent signals to produce reliable, generalizable reward estimates.
  • Develop reward models to enhance controllability and instruction-following performance, especially for complex, multi-part user requests.
  • Contribute to data selection and synthesis pipelines that improve post-training data quality, leveraging reward signals to expand the model's capabilities.
  • Research scalable methods for learning from pairwise comparisons, rankings, or human demonstrations across diverse tasks.
Qualifications
  • Currently pursuing a PhD in Computer Science, Machine Learning, or a related technical field.
  • First-author publications in top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP).
  • Research experience in reward modeling, human preference learning, or LLM post-training.
  • Proficient in Python and deep learning frameworks such as PyTorch or JAX.
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
Preferred Qualifications
  • Experience with RLHF, DPO, rejection sampling, or ranking-based supervision methods.
  • Familiarity with model-based reward composition, verifier integration, or synthetic data pipelines.
  • Understanding of how reward models interact with large-scale RL and agent systems.
Job Information
  • Employment type: Internship
  • Seniority level: Internship
  • Location: Palo Alto, CA
  • Compensation: Hourly, $65 per hour
About ByteDance

ByteDance Doubao (Seed) Team focuses on pioneering advanced AI foundation models and cutting-edge post-training technologies for unified multimodal large models, including SFT, RM, RL, and self-learning.

Inclusion and Accommodation
  • Diversity & Inclusion: ByteDance seeks to celebrate diverse voices and create an inclusive environment.
  • Reasonable Accommodation: We provide accommodations in our recruitment processes for candidates with disabilities or other protected reasons. If you need assistance, contact us at the provided accommodation request link.
Notes

For Los Angeles County candidates, the company complies with applicable laws including the Los Angeles County Fair Chance Ordinance and the California Fair Chance Act. This job description does not create a contract of employment or imply guaranteed employment terms.

#J-18808-Ljbffr
Apply Now!

Similar Jobs ( 0)