Fall 2021 Syllabus (subject to change)

A course unlike any other data science course...?

Course Name: Applied Data Science with Venture Applications: Data-X

Course Number: 

  • INDENG 135 (undergraduate students)
  • INDENG 235 (graduate students)

Units: 3

Semester: Fall 2021

Instructors and GSIs:

Role Name and Email Office Hours
Instructor Ikhaq Sidhu, sidhu@berkeley.edu TBD
Instructor Derek S. Chan, derekschan@berkeley.edu Fridays, for 30 mins after class ends

Tuesdays, 4:30-5:00pm, via Zoom

GSI Mahan Tajrobehkar, mahan_tajrobehkar@berkeley.edu  Wednesdays, 4-5pm PST, via Zoom (Password: 800392) 
GSI Ruiqi Guo, ruiqiguo@berkeley.edu   Mondays, 2-3 PM PST, via Zoom

*At office hours and on Slack Data-X (INDENG 135 / 235), for specific questions on details about algorithms, coding, and math, please ask the GSIs, Mahan and Ruiqi. As students, please also ask and help each other.

Meeting Day/Time: Fridays, 2:00-5:00pm (usually 2.0 to 2.5 hours) from 8/27 to 12/3/2021. No class Friday 11/26 due to the Thanksgiving holiday 11/25-11/26.

Meeting Location: Room 0105, North Gate

Course Website: You are at the course website, https://datax.berkeley.edu/berkeley-course/ 🙂

Course Prerequisites: Due to the technical nature of this class, students are recommended to have the ability to write code in Python, and have taken a probability or statistics course. At the same time, students from all majors are welcome.

  • A good way to apply Python further is to partner on a team to build real-world projects through a proven framework like Innovation Engineering. Data-X incorporates that framework for AI, data, and systems projects, for example, with students' storytelling, customer validation through low-tech designs / demos, and executing while learning technically.
  • Though stronger programming is correlated with stronger team projects, student diversity adds value. Plus the course not only includes Python coding but is scheduled to introduce additional “low code” (limited coding) tech tools for building real-world projects.

Course Description: Today, the world is literally reinventing itself with Data and AI. However, learning a set of ‘related theories’ and being able to ‘make it work’ are not the same. And, in areas as important as Artificial Intelligence, Data Science, and Machine Learning, if we collectively cannot actually implement and create, then we'll reduce our competitive advantage, economic strength, and even national/global security.

The Data-X framework is designed to bridge the gap between theory and practice as well as academia and industry, by exposing students to state-of-the-art implementation techniques and mindsets.

This highly-applied course surveys a variety of key concepts and tools that are useful for designing and building data science, AI, and Machine Learning applications and systems. The course introduces modern, open source computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decision-making, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python libraries like NumPy for math array functions, Pandas for manipulation of tables, scikit-learn for machine learning modeling, TensorFlow and Keras for deep learning, and many other topics related to NLP, Neural Networks, Recommender Systems, etc.

Almost weekly, the course intends to cover not 1-2 but 3 tracks, each aimed to guide student teams' project-based work: 1) broader insight from inductive learning games and industry guests; 2) code and theory; and 3) teams and projects (e.g., live team demos and feedback), and helps you with the Innovation Engineering framework below. The framework includes story development, execution while learning, innovation behaviors, and leadership.

Course Objectives

You will learn

  • To define and execute what, why, and how to build real world AI, data, and systems applications for users – working on a project in a team
  • Computer science tools for data science
  • Relevant theory, critical thinking, and insights on AI, data, and systems

Textbook/Resources

Course Communication

Announcements will be made via Slack Data-X (INDENG 135 / 235). As students, much of your learning will be from each other. Slack can facilitate class conversations and team building, and is used in industry. Slack will also be an option for you to ask questions in live class, in case you prefer to ask via text than aloud.

Attendance/Participation Policy

Because the course is a project team course, attendance and participation are important. If a student has noticeable absences from team activities in and out of class, please connect with instructions on rationale. Students may be dropped from the class due to absences. All classes are scheduled to be automatically recorded (e.g., to support infrequent exception cases of absence) and can be found at bCourses -> Fall 2021 Data-X (INDENG 135 / 235) -> Media Gallery.

In class, students must adhere to current campus directives related to COVID-19 and refusal to do so may result in the student being asked to leave.

Weekly Schedule and Assignments (subject to change)

The weekly schedule and assignments are meant to provide an outline of the course material and structure. However, it is not set in stone and may be modified as the semester unfolds. If substantive updates occur on the syllabus, instructors will communicate via Slack Data-X (INDENG 135 / 235) and in class so you are aware this webpage is updated.

Classes run 2.0-2.5 hours but have a 10-minute break in between.

Acronyms below: IS=Ikhlaq Sidhu, DSC=Derek S. Chan

Week Date Broader insight (mostly) Code and theory Teams and projects Due by subsequent class Recommended by subsequent class
1 8/27 (30 mins) IS & DSC: Why the Data X course is important for students, set course expectations and grading, and cover the history of Data-X. (30-45 mins) IS & DSC Lecture: Intro to ML insights (45 mins) IS & DSC: Cover project definition and your questions (reference module 020) * Read Innovation Engineering, Chapter 4: A Step-by-Step Guide for Innovative Projects (23 pages). For the encryption password, please see Slack Data-X (INDENG 135 / 235), pinned in the #general channel.

* Sheet: Everyone brainstorms, writes down 3 ideas for a project (also quantify potential problem or impact), and a one-paragraph introduction of who you are and what skills and background you bring to the class.

Set up your Jupyter environment (modules 030A, 030B). Use these modules if needed. 030: Installation Instructions. Review basic Python code from cookbook module 030C.

Search and evaluate potential project datasets

2 9/3 Venture track

* (30 mins) Guest Lecture 2:10-2:40pm PST: "Common pitfalls of entrepreneurship" by Shuo Chen (General Partner at IOVC | Faculty at UC Berkeley | CEO at Shinect)

(30 mins) IS: Lecture: Review of Python Data Handling Tools.

Recommended: Review code from cookbook Numpy and Pandas modules 110 and 120 on your own. Be able to run the notebooks on your own computer. If not familiar with these tools, view code videos.

(30 mins) DSC Interactive Session: Diversity, quantity, and quality of project datasets

(60 mins)

* IS: Explain how teams are formed

* DSC: Review of collaborators/industry experts background and project interests via LinkedIn profiles

* 5-10 students pitch project ideas for 1 min each for 1 extra participation point.

* Note: Industry experts do not decide or direct projects, but can act as sounding boards.

* Everyone completes a survey about their interests, behaviors, and skills. Everyone fills out their top industry expert choices. Due Wednesday, 9/8, 11:59pm PST.

* Read Innovation Engineering, Chapter 6: Common Strategic Errors and Story Narrative Mistakes (13 pages). For the encryption password, please see Slack Data-X (INDENG 135 / 235), pinned in the #general channel.

Review code from cookbook Numpy and Pandas modules 110 and 120 on your own. Be able to run the notebooks on your own computer. If not familiar with these tools, view code videos.

Search and evaluate potential project datasets (continued)

3 9/10 Non-Venture track

* (30 mins) Guest Lecture 2:10-2:40pm PST: "AI for Social Good Projects" by Ruth Alcantara (Program Manager, AI for Social Good at Google)

Data science application

* (30-45 mins) IS Lecture: A System's View of Data Science with Prediction

* (30 mins) DSC Interactive Session: Code and then application of classification, feature importances, precision vs. recall for defining success

(30 mins) Team assignments announced. Project topics not specified yet.

* IS: Demonstrate Navigator.

* DSC: Review Dataset Slide.

* Select preliminary project idea as a team and fill out best elevator pitch of your team project in NABC (Need, Approach, Benefit, Competition) and Dataset slides 2-3 and 7

* List links / locations of potential dataset(s) for your idea

* Start background research on what is available to you

4 9/17 Low-tech demo

* (40 mins) DSC Inductive Learning Game: Customer validation (incentives for winning teams). Each project team is advised to bring at least 1 laptop.

* (15 mins) IS review Low Tech Demo slides due for next week

(30+ mins) IS low-code and high-code tools - including Anvil and licenses.

* Introduce tools, and demo Berkeley Innovation Index (BII) application code

(45 mins)

* IS: Student team checklist review

* Master Class Format: 3-5 project pitches in NABC format

* Continue to identify, select, and refine your project idea as a team in NABC format

* Read Innovation Engineering, specific section withheld on purpose until week 4

* Submit first version of low-tech demo

5 9/24 * (30 mins) DSC Inductive Learning Game: To evaluate whether companies use or don’t use AI (incentives for top team)

* (30 mins) DSC Interactive Session: H2O.ai (low-code automated ML option on Titanic dataset)

(30 mins) DSC Titanic Notebook walkthrough (Modules 160A-160D plus a new module), and compare results vs. H2O.ai (60 mins) 3-5 sample presentations of low-tech demos and Q&A * Submit team self-review slide(s) on whether you addressed "Common Strategic Errors and Story Narrative Mistakes"

* Potential for more background research on history, concepts, approaches.

Modules 160A-160D
6 10/1 (20 mins) IS: Review tech strategy tools on the Innovation Engineering site and slide decks you will create, and review technology strategy template due future week (40 mins) DSC Lecture: Deep Learning and/or NLP part 1 Sprint 1: Starting Agile Implementation. * Submit Technology Strategy template. Each student picks one component or platform tool. * Module 170 ML Algorithm Overview video to cover the meaning of the classifications. Explain in your own words the difference between Logistic Regression, Trees and Neural Networks, and turn in 1 page.
7 10/8 (30 mins) DSC Inductive Learning Game: AI bias and marginalization (40 mins) IS Lecture: System's view of correlation, plus example of correlation vs. causation 1-3 students demo technology strategy presentations

Sprint 2: Improve your first Minimal Viable Demo

Every team select time and meet at ad-hoc office hours with assigned instructor group for 30 minutes

Everyone starts to identify at least 3 modules which are closest to their project to learn on their own, and submit 1 module. GSIs, when grading, can give feedback on whether that module does make most sense for your project (1-5 rating list on how related to submission)

How To Stop Artificial Intelligence From Marginalizing Communities?, by Timnit Gebru 

How I'm fighting bias in algorithms, by Joy Buolamwini 

Artificial Intelligence needs all of us, by Rachel Thomas 

Bias in Data and A.I., by Ruja Benjamin 

8 10/15 (45 mins) Guest Lecture Speaker on Technical Topic (Details To Be Confirmed) (30 mins) DSC Lecture: Some AI best practices to use in industry and for projects

* Module 180 Cross-Validation and Regularization plus random search and ensemble methods

* Review systematic high-code that might outperform H2O.ai low-code results

Sprint 3: Improve your first Minimal Viable Demo

A few teams demo, hold Q&A, and receive feedback

9 10/22 (30 mins) DSC Inductive Learning Game: Generating trust (most important element within team), including around AI, data, and systems (60 mins) IS Theory Lecture: Spectral Information in Data (similar to Module 250), plus potential use in your projects/systems Sprint 4: Improve your first Minimal Viable Demo

A few teams demo, hold Q&A, and receive feedback

Submit 3-minute demo recording of your project
10 10/29 (30 mins) DSC Interactive Game: What technical strategy aspect will give you an unfair advantage as you build something? (40 mins) Survey of DevOps (Development & Operations) tools not covered yet. Cover directly or have companies represent tools for 10-20 mins each. Sprint 5: Improve your first Minimal Viable Demo

A few teams demo, hold Q&A, and receive feedback

Submit another module. GSIs, when grading, can give feedback on whether that module does make most sense for your project (1-5 rating list on how related to submission)
11 11/5 (45 mins) Guest Lecture 3:00-3:45pm PST: “10 commandments of startup creation” with stories of success/failure by Oren Etzioni (CEO at AI2; Professor Emeritus at University of Washington; Venture Partner at the Madrona Venture Group) (45 mins) IS Theory Lecture: How Information Theory Relates to Data and Signals, plus potential use in your projects/systems Sprint 6: Improve your first Minimal Viable Demo

A few teams demo, hold Q&A, and receive feedback

Every team select time and meet at ad-hoc office hours with assigned instructor group for 30 minutes
12 11/12 (30 mins) DSC Inductive Learning Game: AI business impact To Be Determined based on student need Preparation for Final Demo, Turn in Google Form with Project News Story, Image, Team, Code links, etc.

A few teams demo, hold Q&A, and receive feedback

13 11/19 (60 mins) Work opportunities

Guest recruiter panel: AI / data roles

Continue preparation for Final Demo, Turn in Google Form with Project News Story, Image, Team, Code links, etc. We allow for updates in the week of the final demo. (Dry runs with at least 2-3 teams, but can allow more teams for who wants feedback now as if in-class office hours) Preparation for Final Demo, Turn in Google Form with Project News Story, Image, Team, Code links, etc.

A few teams demo, hold Q&A, and receive feedback

* Submit final projects

* Submit team reviews: Write examples on each team member's contributions, and your own contributions. Submit your Stackshare.io data on what tools you most used.

* Everyone re-completes a survey about their interests, behaviors, and skills.

14 11/26 Thanksgiving holiday 11/25-11/26
15 12/3 Final Demo (20 minutes per team for 9 of 18 teams, but all 18 teams expected to attend to learn and support Additional 3 hours for the remaining 9 of 18 teams, but all 18 teams expected to attend to learn and support

 

Grading

Grade Adjusted Range*
86-110%
B 70-85%
C 66-75%
D 56-65%

*Adjusted Range: The top 25 students’ average (e.g., 90%) will be adjusted to 100%. An example illustration is below.

Top 25 students (average) = 90%, so 90% / 90% = 100% (Adjusted)
Student 1 = 99%, so 99% / 90% = 110% (Adjusted)
Student 25 = 87%, so 87% / 90% =   97% (Adjusted)
Student X = 70%, so 70% / 90% =   77% (Adjusted)

At week 11, each student can receive confidentially where their interim grade is at.

 

Week(s) Deliverable % of final grade
4 Team selection of product idea and submission of initial slides 5%
Team low-tech demos 5%
6 Team self-review on "Common Strategic Errors and Story Narrative Mistakes" --
7 Individual tech strategy templates (one template per person) 5%
8 Team’s 1st 30-min meeting (office hours) with instructor group for guidance --
9 Individual project work for team related to a relevant self-selected module 5%
10 Team submits 3-minute project video to storytell value and progress 10%
11 Individual project work for team related to a relevant self-selected module 5%
12 Team’s 2nd 30-minute meeting (office hours) with instructor group for guidance --
15 Team final project

Individual peer reviews

35%

10%

2, 15 Complete brief surveys on your interests, behaviors, and skills 5%
All Participation in live class activities (e.g., team games, presentations) 15%
Semester total 100%

 

Final team project: What is graded
Live presentation, including live demo
Final slides
Final code / tools
News story
Originality and Impact: Project is likely to provide a highly usable tool or insight to users
Completeness: Project is 'ready for market' and offers a complete, working set of features or results
Justified tech decision-making: Team demonstrates methods used properly address the intended problem space
Presentation: Team explains clearly, concisely, and persuasively

 

Course Evaluations

At the middle and end of the term, students will be asked to fill out an evaluation to give feedback about the course. SCET values and appreciates student responses, which are used to better understand and improve our courses. Students are strongly encouraged to submit the evaluations. 

Focus Groups

Two optional focus group events (perhaps with pizza served) will be held during the semester outside class hours for your feedback aloud on the course.

Scheduling Conflicts

Please notify us in writing as soon as possible about any known or potential extracurricular conflicts. We will try our best to help you with making accommodations, but cannot guarantee them in all cases.

Student Code of Conduct & Academic Integrity

Berkeley honor code: Everyone in this class is expected to adhere to this code: “As a member of the UC Berkeley community, I act with honesty, integrity, and respect for others.”

Student Conduct: Ethical conduct is of utmost importance in your education and career. The instructors, the College of Engineering, and U.C. Berkeley are responsible for supporting you by enforcing all students’ compliance with the Code of Student Conduct and the policies listed in the CoE Student Guide. The Center for Student Conduct is set up to support you when you have been affected by actions that may violate these community rules. This includes an organized and transparent process, student participation in the process, mechanisms for appeals, and other mechanisms to protect fairness (https://sa.berkeley.edu/conduct).

Academic Integrity: Any assignment submitted by you and that bears your name is presumed to be your own original work that has not previously been submitted for credit in another course unless you obtain prior written approval to do so from your instructor. In all of your assignments, you may use words or ideas written by other individuals, but only with proper attribution. To copy text or ideas from another source without appropriate reference is plagiarism and will result in a failing grade for your assignment and usually further disciplinary action. For additional information on plagiarism, self-plagiarism, and how to avoid it, see the Berkeley Library website.

If you are not clear about the expectations for completing an assignment or taking a test or examination, be sure to seek clarification from your instructor beforehand. Anyone caught committing academic misconduct will be reported to the University Office of Student Conduct. Potential consequences of cheating and academic dishonesty may include a formal discipline file, probation, dismissal from the University, or other disciplinary actions. 

Inclusion: We are committed to creating a learning environment welcoming of all students. To do so, we intend to support a diversity of perspectives and experiences and respect each others’ identities and backgrounds (including race/ethnicity, nationality, gender identity, socioeconomic class, sexual orientation, language, religion, ability, etc.). To help accomplish this:

  • If you have a name and/or set of pronouns that differ from your legal name, designate a preferred name for use in the classroom at: https://registrar.berkeley.edu/academic-records/your-name-records-rosters.
  • If you feel like your performance in the class is being impacted by your experiences outside of class (e.g., family matters, current events), please don’t hesitate to come and talk with the instructor(s).  We want to be resources for you.
  • We are all in the process of learning how to respect and include diverse perspectives and identities. Please take care of yourself and those around you as we work through the challenging but important learning process.
  • As a participant in this class, recognize that you can be proactive about making other students feel included and respected.  

Student Accommodations

We honor and respect the different learning needs of our students, and are committed to ensuring you have the resources you need to succeed in our class.  If you need accommodations for any reason (e.g. religious observance, health concerns, insufficient resources, etc.) please discuss with your instructor or academic advisor how to best support you.  We will respect your privacy under state and Federal laws, and you will not be asked to share more than you are comfortable sharing.  The disabled student program is a related resource, listed below. UC Berkeley is committed to creating a learning environment that meets the needs of its diverse student body. If you anticipate or experience any barriers to learning in this course, please feel welcome to discuss your concerns with me.

If you have a disability, or think you may have a disability, you can work with the Disabled Students' Program (DSP) to request an official accommodation. The Disabled Students' Program (DSP) is the campus office responsible for authorizing disability-related academic accommodations, in cooperation with the students themselves and their instructors. You can find more information about DSP, including contact information and the application process here: dsp.berkeley.edu. If you have already been approved for accommodations through DSP, please meet with me so we can develop an implementation plan together.

Students who need academic accommodations or have questions about their accommodations should contact DSP, located at 260 César Chávez Student Center. Students may call 642-0518 (voice), 642-6376 (TTY), or e-mail dsp@berkelely.edu.

Prevention of Harassment and Discrimination

The University is committed to creating and maintaining a community dedicated to the advancement, application and transmission of knowledge and creative endeavors through academic excellence, where all individuals who participate in University programs and activities can work and learn together in an atmosphere free of discrimination, harassment, exploitation, or intimidation. For more information on related policies, resources and how to report an incident, see the Office for the Prevention of Harassment and Discrimination (OPHD) website

Safety and Emergency Preparedness/Evacuation Procedures

As class activities may keep you on campus at night, check out the Cal’s Night Safety Services website for details on the University’s comprehensive free night safety services. See the Office of Emergency Management website for details on Emergency Preparedness/Evacuation Procedures. The UC Berkeley Police Department website also has information regarding safety on campus. Dial 510-642-3333 or use a Blue Light emergency phone if you need help.

Grievances

If you have a problem with this class, you should seek to resolve the grievance concerning a grade or academic practice by speaking first with the instructor. Then, if necessary, take your case to the SCET Chief Learning Officer, SCET Faculty Director, IEOR Department Chair, and to the College of Engineering Dean, in that order. Additional resources can be found on the Student Advocate’s Office website and the Ombuds Office for Students website.

SCET Certificate in Entrepreneurship & Technology

This class can be used towards requirements to earn the SCET Certificate in Entrepreneurship & Technology. For details on the certificate requirements and other opportunities to engage with the Center, see the SCET website

Support during Remote Learning: 

We understand that your specific situation may present challenges to class participation. Please contact the instructors if you would like to discuss these and co-develop strategies for engaging with the course. 

The Student Technology Equity Program (STEP) is available to help access a laptop, Wi-Fi hotspot, and other peripherals (https://technology.berkeley.edu/STEP).

Additional Resources

See the Student Affairs website for more information on campus and community resources.

Center for Access to Engineering Excellence (CAEE)                           

The Center for Access to Engineering Excellence (227 Bechtel Engineering Center;

https://engineering.berkeley.edu/student-services/academic-support) is an inclusive center that offers study spaces, nutritious snacks, and tutoring in >50 courses for Berkeley engineers and other majors across campus.  The Center also offers a wide range of professional development, leadership, and wellness programs, and loans iclickers, laptops, and professional attire for interviews.  

Counseling and Psychological Services       

University Health Services Counseling and Psychological Services staff are available to you at the Tang Center (http://uhs.berkeley.edu; 2222 Bancroft Way; 510-642-9494) and in the College of Engineering (https://engineering.berkeley.edu/students/advising-counseling/counseling/; 241 Bechtel Engineering Center), and provide confidential assistance to students managing problems that can emerge from illness such as financial, academic, legal, family concerns, and more. Long wait times at the Tang Center in the past led to a significant expansion to include a 24/7 counseling line at (855) 817-5667.  This line will connect you with help in a very short time-frame.  Short-term help is also available from the Alameda County Crisis hotline: 800-309-2131.  If you or someone you know is experiencing an emergency that puts their health at risk, please call 911.  

The Care Line (PATH to Care Center)

The Care Line (510-643-2005; https://care.berkeley.edu/care-line/) is a 24/7, confidential, free, campus-based resource for urgent support around sexual assault, sexual harassment, interpersonal violence, stalking, and invasion of sexual privacy. The Care Line will connect you with a confidential advocate for trauma-informed crisis support including time-sensitive information, securing urgent safety resources, and accompaniment to medical care or reporting.

Ombudsperson for Students                                            

The Ombudsperson for Students (102 Sproul Hall; 642-5754; http://students.berkeley.edu/Ombuds)  provides a confidential service for students involved in a University-related problem (academic or administrative), acting as a neutral complaint resolver and not as an advocate for any of the parties involved in a dispute. The Ombudsman can provide information on policies and procedures affecting students, facilitate students' contact with services able to assist in resolving the problem, and assist students in complaints concerning improper application of University policies or procedures. All matters referred to this office are held in strict confidence. The only exceptions, at the sole discretion of the Ombudsman, are cases where there appears to be imminent threat of serious harm.

UC Berkeley Food Pantry

The UC Berkeley Food Pantry (#68 Martin Luther King Student Union; https://pantry.berkeley.edu) aims to reduce food insecurity among students and staff at UC Berkeley, especially the lack of nutritious food. Students and staff can visit the pantry as many times as they need and take as much as they need while being mindful that it is a shared resource. The pantry operates on a self-assessed need basis; there are no eligibility requirements.  The pantry is not for students and staff who need supplemental snacking food, but rather, core food support.

Disclaimer: Syllabus/Schedule are subject to change.

References

Innovation Engineering Textbook. Data-X students can access required sections for free. Encryption password is at Slack Data-X (INDENG 135 / 235), pinned in the #general channel.

Navigator Tool template at Innovation Engineering website -> Google Slides to reinforce inductive learning

Low Tech Demo template at Innovation Engineering website -> Google Slides

 

Example dataset source Link
Kaggle https://www.kaggle.com/datasets

https://www.kaggle.com/datasets?sort=votes&datasetsOnly=true

AWS (e.g., Data Exchange) https://aws.amazon.com/data-exchange/

https://registry.opendata.aws/ 

Google Dataset Search https://datasetsearch.research.google.com/ 
Towards AI (article of dataset links) https://pub.towardsai.net/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f 
Ubuntu Pit (article of dataset links) https://www.ubuntupit.com/best-machine-learning-datasets-for-practicing-applied-ml/

 

At the end of the semester, one Data-X team and project can qualify to compete for the Collider Cup, SCET's all-star showcase.

Directory of Advisors and Industry Experts for Data-X

The Data-X course and project brings together students, technical experts, start-up companies, and executives.  Each brings a different perspective to data, algorithms, and scale. Please refer to the People webpage.