An introduction to foundational concepts in data science, including: information retrieval and storage, preprocessing, visualization, exploratory data analysis, applied machine learning, research methods, and experimental design. Students will develop solutions to computational problems spanning a variety of disciplines using state-of-the-art scientific programming tools and techniques, with an emphasis on the interpretation and presentation of experimental results.
Students will understand:
This course consists of lectures and hands-on programming and data visualization exercises. Assignments will be carried out in the Python programming language. Some instruction in the use of this language and its supporting packages will be provided during lecture; however, I expect that you will consult additional resources to supplement your knowledge.
The course will include regular homework and/or programming assignments. There will be no credit given for late assignments (without an excused absence)—turn in as much as you can. Unless otherwise specified, no handwritten work will be accepted.
Reading should be completed before the lecture covering the material per the provided schedule. Not all reading material will be covered in the lectures, but you will be responsible for the material on homework and exams. Quizzes over the assigned reading may be given at any time.
See the GFU CS/IS/Cyber policies for collaboration and discussion of collaboration and academic integrity. Most students would be surprised at how easy it is to detect collaboration or other academic integrity violations such as plagiarism in programming—please do not test us! Remember: you always have willing and legal collaborators in the faculty. We encourage you visit office hours, ask questions in class, and use the class mailing list for assistance.
Unless otherwise specified (e.g., for a group assignment or project), you are expected to do your own work. This also applies to the use of online resources (e.g., StackOverflow, ChatGPT). Put simply: if you are representing someone else's work as your own, you are being dishonest. Any suspected incidents of academic integrity violations will be investigated and reported to the Academic Affairs Office as they arise.
Almost all of life is filled with collaboration (i.e., people working together). Yet in our academic system, we artificially limit collaboration. These limits are designed to force you to learn fundamental principles and build specific skills. It is very artificial, and you'll find that collaboration is a valuable skill in the working world. While some of you may be tempted to collaborate too much, others will collaborate too little. When appropriate, it's a good idea to make use of others—the purpose here is to learn. Be sure to make the most of this opportunity but do it earnestly and with integrity.
If you have specific physical, psychiatric, or learning disabilities and require accommodations, please contact Disability & Accessibility Services as early as possible so that your learning needs can be appropriately met. For more information, go to georgefox.edu/das or contact das@georgefox.edu).
My desire as a professor is for this course to be welcoming to, accessible to, and usable by everyone, including students who are English-language learners, have a variety of learning preferences, have disabilities, or are new to online learning systems. Be sure to let me know immediately if you encounter a required element or resource in the course that is not accessible to you. Also, let me know of changes I can make to the course so that it is more welcoming to, accessible to, or usable by students who take this course in the future.
The Academic Resource Center (ARC) on the Newberg campus provides all undergraduate students with free writing consultation, academic coaching, and learning strategy review (e.g., techniques to improve reading, note-taking, study, time management). The ARC offers in-person appointments; if necessary, Zoom appointments can be arranged by request. The ARC, located on the first floor of the Murdock Library, is open from 1:00–10:00 p.m., Monday through Thursday, and 12:00–4:00 p.m. on Friday. To schedule an appointment, go to the online schedule at web.penjiapp.com/schools/george-fox, call 503-554-2327, email the_arc@georgefox.edu, or stop by the ARC. Visit arc.georgefox.edu for information about ARC Consultants' areas of study, instructions for scheduling an appointment, learning tips, and a list of other tutoring options on campus.
Please review the entirety of the university's official COVID-19 web page for the most up-to-date community guidance.
The final course grade will be based on:
Graded course activities will be posted to Canvas. Take care to read the specifications carefully and proceed as directed. Failure to pay attention to detail will often result in few to zero points being awarded on a given activity.
Grades will be updated as often as possible; you are encouraged to use the "What-If" functionality to calculate your total grade by entering hypothetical scores for various items.Note that some graded activities in this course will be submitted via GitLab.
Week 1 · 1/10Introduction; Environment Setup |
Week 1 · 1/12Filesystem-Based Data
References: Filesystem, I/O, CSV |
Week 2 · 1/17Python Lists, Tuples, Sets, and Dictionaries
References: Python structures |
Week 2 · 1/19NumPy Arrays
References: numpy.ndarray, numpy.genfromtxt |
Week 3 · 1/24Exploratory Data Analysis and Visualization
References: scipy.stats, matplotlib.pyplot |
Week 3 · 1/26Plot Layout and Formatting; Plot Types
References: matplotlib guide, samples |
Week 4 · 1/31Outliers and Missing Values
References: numpy.genfromtxt, sklearn.impute |
Week 4 · 2/2Transforming and Encoding Data
References: sklearn.preprocessing |
Week 5 · 2/7, 2/9Data Exploration presentations
|
Week 6 · 2/14Pandas DataFrame and Series
References: Pandas overview, structures, I/O |
Week 6 · 2/16Additional Data Formats and Tools
References: numpy, scipy.io, json, sqlite3, skimage, skvideo |
Week 7 · 2/21Hypothesis Formulation and Testing
References: Statistical testing, scipy.stats |
Week 7 · 2/23Statistical Assumptions
References: scipy.stats, matplotlib.pyplot.hist |
Week 8 · 2/28Hypothesis presentations
|
Week 8 · 3/2Midterm exam
|
Week 9 · 3/7Clustering
References: sklearn.cluster |
Week 9 · 3/9Regression
References: sklearn.linear_model, sklearn.svm |
Week 10 · 3/14Classification
References: sklearn.svm |
Week 10 · 3/16Evaluation Metrics
References: sklearn.metrics |
Week 11 · 3/21Visualizing Results
References: sklearn.metrics.plot_confusion_matrix |
Week 11 · 3/23Cross-Validation and Hyper-Parameter Tuning
References: sklearn.model_selection (CV), sklearn.model_selection (grid search) |
Week 12 · 3/27–3/31Spring break—no classes
|
Week 13 · 4/4Domains, Libraries, and Visualization Tools
References: NumPy ecosystem, PyViz |
Week 13 · 4/6Case Studies
References: NumPy case studies |
Week 14 · 4/11, 4/13Selected Topics
|
Week 15 · 4/18, 4/20Project presentations
|
Week 16 · TBDFinal exam
|
This page was last modified on 2023-01-19 at 09:30:25.
George Fox University · 414 N Meridian St · Newberg, Oregon 97132 · 503-538-8383
Copyright © 2015–2023 George Fox University. All rights reserved.