08. Creative Projects Strategy for Zara Okonkwo

Zara, your academic profile—GPA 3.94 and SAT 1530—positions you strongly for Data Science and Statistics programs at UC Berkeley, Carnegie Mellon, and Georgia Tech. The committee emphasized that your portfolio should not just display technical competence but also convey clarity of methodology and social impact through data. This section translates those expectations into concrete, achievable creative projects and portfolio strategies you can complete before deadlines.


🎯 Core Objectives

  • Publish a “Data for Good” project with transparent methods, code, and public data sources.
  • Develop one additional technical project that demonstrates reproducible data analysis or machine learning methodology.
  • Document your use of Python, R, SQL, and visualization libraries to highlight technical breadth.
  • Build a concise, polished portfolio summarizing your applied data projects and their outcomes.

💡 Project 1: Data for Good — Public Impact Analysis

This project anchors your portfolio with a socially relevant theme. Admissions officers at Berkeley and Carnegie Mellon often respond well to students who apply data science toward community or policy insights. You have not provided specific community activities yet, so the project should rely only on publicly available datasets—no external partnerships required.

  • Concept: Analyze publicly available Georgia datasets (for example, education, health, or transportation) to uncover a measurable insight—such as disparities in internet access or school resource allocation.
  • Tech Stack:
    • Python: Data cleaning and analysis using Pandas and NumPy.
    • R: Statistical modeling and visualization using ggplot2.
    • SQL: Query and organize structured datasets efficiently.
    • Visualization: Create interactive dashboards using Plotly or Tableau Public.
  • Deliverables:
    • Jupyter Notebook or R Markdown file with complete, reproducible workflow.
    • GitHub repository with README explaining the problem, methods, and findings.
    • Short summary paragraph (under 150 words) for your Common App activities section.
  • Admissions Edge: Demonstrates ethical data use and technical independence—traits valued by Berkeley’s Data Science program and CMU’s Statistics & Machine Learning track.

⚙️ Project 2: Reproducible Machine Learning Pipeline

The committee urged you to include a second technical project emphasizing reproducibility and methodological clarity. This can be a compact but rigorous demonstration of predictive modeling or clustering analysis using open data.

  • Concept: Build a small-scale machine learning pipeline that predicts or classifies outcomes—such as housing prices, student performance, or air quality metrics—using a clean, public dataset.
  • Tech Stack:
    • Python: scikit-learn for model training and evaluation.
    • Data Versioning: DVC or Git LFS to track dataset changes.
    • Visualization: seaborn and matplotlib for model performance plots.
    • Documentation: Use Jupyter Notebook markdown cells to explain each step.
  • Deliverables:
    • Clean, commented code published on GitHub with reproducible setup instructions.
    • Short write-up describing your modeling approach and lessons learned.
    • Optional: Deploy results via Streamlit or Flask for a simple web demo.
  • Admissions Edge: CMU and Georgia Tech value applicants who can explain model logic clearly—this project provides that evidence.

🗂️ Portfolio Assembly & Presentation

Once both projects are complete, assemble a concise portfolio that integrates technical documentation and visual summaries. Keep the tone analytical and professional—avoid overly casual phrasing or unverified claims.

  • Platform: GitHub (primary), optionally link to a personal site built with GitHub Pages.
  • Structure:
    • README.md: Overview of your Data Science focus, tools used, and project summaries.
    • Project Folders: Separate folders for each project with data, notebooks, and results.
    • Documentation: Include a “requirements.txt” file and clear installation instructions.
  • Portfolio Summary (for applications): Create a one-page PDF highlighting project names, data sources, tools, and outcomes. Upload this as a supplemental document if allowed by CMU or Berkeley.
  • GitHub Strategy:
    • Use consistent naming conventions and commit messages to show professionalism.
    • Pin your top two repositories so they appear first on your profile.
    • Add a short bio mentioning “Data Science & Statistics | Georgia | Class of 2025.”

📅 Monthly Action Plan

Month Key Actions Target Outcome
September
  • Select datasets for both projects (education, health, or housing data).
  • Set up GitHub account and create repository templates.
  • Confirm technical environment (Python, R, SQL installed).
Project scaffolding complete; GitHub ready for uploads.
October
  • Complete “Data for Good” analysis and initial visualization.
  • Write clear README and methodology notes.
  • Begin machine learning pipeline setup.
First project finalized and published; second project underway.
November
  • Finish ML pipeline and test reproducibility.
  • Refine documentation and code comments.
  • Compile one-page portfolio summary PDF.
Both projects complete with polished documentation.
December
  • Integrate portfolio links into applications.
  • Cross-check with essay narrative (see §06 Essay Strategy).
  • Submit Early Action or Regular Decision applications.
Portfolio fully embedded in submissions; ready for review.

🔍 Final Checks Before Submission

  • Ensure all datasets are public and properly cited.
  • Run notebooks from start to finish to confirm reproducibility.
  • Proofread README files for clarity and professional tone.
  • Use GitHub’s “Insights → Traffic” feature to verify that links are accessible.

📈 Strategic Impact

By completing these two data-driven projects and publishing them with transparent methodology, Zara, you will present a portfolio that aligns precisely with the committee’s recommendations. It will demonstrate your command of data analysis tools, your capacity for independent inquiry, and your commitment to ethical, socially relevant applications of Data Science. This portfolio will strengthen your candidacy for UC Berkeley’s Data Science major, Carnegie Mellon’s Statistics & Machine Learning program, and Georgia Tech’s Data Analytics track.