“The only way to do great work is to love what you do” - Steve Jobs

If you haven’t failed yet, you haven’t tried anything.
— Reshma Saujani

Machine Learning & Big Data Intern

Job Title: Software Engineering Intern - Machine Learning & Big Data

Location: Remote

Duration: 6-12 months

Positions: Two

About the Project: This project focuses on analyzing the participation of underrepresented students and women in STEM fields. The goal is to identify systemic barriers and develop data-driven strategies to increase diversity, retention, and success. The study uses advanced machine learning and big data technologies to extract actionable insights from large datasets on education, demographics, and career pathways to drive meaningful change in STEM diversity.

Role Overview: You will work in a multidisciplinary team consisting of product owner, data scientist, and engineer. You'll apply machine learning models, build scalable data pipelines, and work with complex data sets to uncover patterns that affect underrepresented groups in STEM fields. This is a unique opportunity to apply cutting-edge technology to a socially impactful project to solve one of the most critical challenges in STEM education.

Responsibilities:

• Collaborate with the research and engineering team to understand project objectives and technical requirements.

• Design, develop, and optimize data pipelines to process and analyze large-scale educational and demographic data.

• Implement and fine-tune machine learning models to identify trends, predict outcomes, and analyze factors affecting under-represented groups in STEM.

• To prepare large datasets for machine learning analysis and perform data wrangling, feature extraction, and data cleaning.

• Develop data visualizations and reports to communicate findings to non-technical stakeholders and educational policymakers.

• Document workflows, processes, and code to ensure reproducibility and scalability.

• Participate in technical meetings and brainstorming, offering insights on applying ML techniques to the research goals.

Qualifications:

• Currently pursuing a degree or obtained a Bachelor degree in Computer Science, Computer Engineering, Big Data, Engineering, or a related field.

• Proficient in Python (or Java), with experience in ML libraries such as TensorFlow, PyTorch, or Scikit-learn.

• Familiarity with machine learning techniques, including supervised and unsupervised learning, classification, regression, and clustering.

• Understanding of big data processing frameworks such as Hadoop, Apache Spark, or similar.

• Experience with data manipulation libraries like Pandas and Numpy.

• Passion for using technology to address social and educational challenges, particularly diversity and inclusion.

Preferred Skills:

• Knowledge of cloud platforms (e.g., AWS, Google Cloud, or Azure) for running machine learning models and processing large datasets.

• Worked with visualization tools such as Tableau, Matplotlib, or Seaborn for presenting research findings.

• Understanding of statistical analysis and hypothesis testing.

• Familiarity with natural language processing (NLP) for text analysis and sentiment evaluation.

How to Apply:

Submit your resume, a brief cover letter outlining your interest in the role, and any relevant project work that demonstrates experience in machine learning or data analytics to stemuplift@gmail.com.

This role offers the chance to make a real-world impact while gaining valuable experience in both software engineering and machine learning in a research context.