Advanced Data Science (Honors)

  11-12 graders

  Credits awarded on transcript  

  Precalculus completed with B- or better

  UC A-G approved for [D] Science credits

  90 minutes per class

  8-10 students per class

  Twice per week over 36 weeks

  1249 per student, per semester  

  Self paced instructor-guided  

  Online community

  Office hours on-demand

  879 per student, per semester  

  4 hours per day (summer)  

  8-10 students per class

  5 days per week 2, 4, or 6 weeks

  489 per student, per week  

Data is everywhere around us. We generate more data every 40 minutes than all of the data generated since the dawn of civilization until 2003. The ability to work with data, understand what it tells us, and use it in your communication has become an essential life and career skill.

90%

of all the world's data was created in just the last two years

1000x

computing power in a smartphone vs. a 1970s mainframe computer

11%

of all U.S. high school students complete any statistics coursework

Decisions that used to be straightforward are increasingly more complex and driven by data. Individuals across all disciplines need to constantly separate fact from friction. The need to analyze and interpret data has permeated every discipline — across engineering, business, finance, social sciences, humanities, and even journalism. Several leading academics now agree that the mathematics we teach in high school is rooted in the 1950s space race and needs to be updated to reflect the realities of the digital and information age of today.

2Sigma School takes an interactive approach to data exploration, rather than a lecture based approach. Our classes are hands-on and use several tools that are used by leading data scientists as well as higher education universities, as illustrated by the following video clip of a live session in a small cohort.

The Advanced Data Science is equivalent to a 1-semester college level course, adapted for high school. It combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand the phenomenon and draw conclusions? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis such as privacy and design.

Data science is more than just a combination of programming and statistics. Effective data science requires understanding problem domains and correctly interpreting domain-specific approaches. The examples in this course are largely drawn from real-world data sets, and one of the main goals of this course is to develop the ability to apply analysis and prediction techniques to real-world scenarios.

This is an advanced course meant for students who have experience with Python programming or who have previously taken the AP Computer Science A class. At the end of the course, students will have a portfolio of their data science work to showcase their newly developed knowledge and understanding.

In order to maximize our time together during the live sessions, we use a flipped classroom model that includes pre-work for every class. This allows students to program with the support of an instructor during the class. The pre-work includes pre-recorded videos, online reading, and some programming practice.


  University of California A-G approved for recommended third year of [D] Science credits.

Course Outline

    hide details
  1. Introduction to Python for Statistics
    Students learn about the fundamentals of data science using Python and statistics. Students will explore sample data and recreate visualizations that will demonstrate the power of programmatic data processing using powerful libraries like Numpy, matplotlib, and pandas.
  2. Visualizations
    Students create data visualizations through various Python libraries and tools from simple visualizations such as bar, line, pie charts, to more advanced visualizations using plotly package. They analyze case studies like Minard’s Map, other historical data visualizations and explore the process of finding reliable data, organizing and cleansing it, then visualizing and creating a narrative.
  3. Iteration and Sampling
    Students use Python programming tools to process data sets efficiently. They learn how to iterate through the data, create control structures, and use sampling techniques. Students will understand the pros and cons of random variables in sampling and are introduced to complex statistics to collect and organize the data.
  4. Assessing Models
    Students will explore how models include assumptions about chance processes used to generate data including examples such as jury selection and other areas where public policy can be influenced by better models. They will learn tools to improve model analysis that include error probabilities, A/B testing, and causality.
  5. Sample Means
    Students will learn why the mean is such an important part of statistics and data science and why the empirical distribution of the sample mean is bell-shaped. They will learn about normal distribution, the central limit theorem, and confidence intervals and understand the validity of their sampling decisions.
  6. Regression
    Students will learn how to build a regression model that will allow them to make predictions for a new data point that is not part of the original sample. They will work with multiple large datasets and will create regression models to predict future outcomes for different domains.
  7. Conditional Probability
    Students explore machine learning in more depth and learn about classification techniques such as nearest neighbors. Machine learning is a class of techniques for automatically finding patterns in data and using it to draw inferences or make predictions. Students will implement a classifier, measure the accuracy of the classifier, and make iterative model improvement.
  8. Capstone Project
    Students will explore multiple heath case studies and then choose an area of health or medicine to apply their data science and machine learning skills. They will identify a problem within the health/medical field, find a reliable data set, create a model that uses conditional probability, and build classifiers to make detailed predictions. Students will create a presentation that shows their data source, their analysis process, and how their predictions will impact the medical system.

Summer of Code
    see detailed summer schedule

Our technology requirements are similar to that of most Online classes.

A desktop or laptop computer running Windows (PC), Mac OS (Mac), or Chrome OS (Chromebook).
Students must be able to run a Zoom Client.
A working microphone, speaker, and webcam.
A high-speed internet connection with at least 10mbps download speed (check your Internet speed).

Students must have a quiet place to study and participate in the class for the duration of the class. Some students may prefer a headset to isolate any background noise and help them focus in class.

Most course lectures and content may be viewed on mobile devices but programming assignments and certain quizzes require a desktop or laptop computer.

This course includes several timed tests where you will be asked to complete a given number of questions within a 1-3 hour time limit. These tests are designed to keep you competitively prepared but you can take them as often as you like. We do not proctor these exams, neither do we require that you install special lockdown browser.

In today's environment, when students have access to multiple devices, most attempts to avoid cheating in online exams are symbolic. Our exams are meant to encourage you to learn and push yourself using an honor system.

We do assign a grade at the end of the year based on a number of criteria which includes class participation, completion of assignments, and performance in the tests. We do not reveal the exact formula to minimize students' incentive to optimize for a higher grade.

We believe that your grade in the course should reflect how well you have learnt the skills, and a couple of timed-tests, while traditional, aren't the best way to evaluate your learning.

Top