Phil’s Data Science Curriculum #
Note that learning these skills is not a trivial undertaking. It can take over a hundred hours to learn the basics, and hundreds to learn the skills well. I am primarily self-taught, so naturally I advise others to take the route of self-teaching, but you may find it conducive to take an online course to learn these skills.
General Principles #
- When implementing a difficult program, break it up into more managable parts.
- When debugging a hard problem, bisect the problem space until you find the issue.
- If you haven’t implemented it yourself, you don’t understand it.
- Spending hours struggling through a problem is GOOD and NECESSARY. It is through this process you learn information auxilary to your current task and thus gain a broader foundation of knowledge. See James 1:2-3.
- Do not waste time by trying the same thing over and over.
Curriculum #
Basic Data Analysis #
-
Read and understand all of w3schools' Python Tutorial.
-
Understand what an Integrated Development Environment (IDE) is.
-
Install and use Visual Studio Code or similar IDE and run a Python program.
-
Understand the difference between running a Python program and running Python in a Jupyter Notebook.
-
If you are a student or faculty member at W&M, you should have or be able to gain access to the school’s Jupyter Hub server, so you don’t have to install it locally.
-
Understand what a Python module is.
-
NumPy is a Python module for numerical analysis. It is useful for manipulating data and calculating totals.
-
Pandas is a Python module that is like excel but programming in Python. It is very useful for organizing data.
-
I haven’t vetted any tutorials for quality, so just Google tutorials for NumPy and Pandas until you feel you have a solid understanding of what they are and why their most basic features are useful.
Easy Web Scraping #
Learn HTML basics from w3schools. The w3schools CSS and JavaScript tutorials would also be good to read, but not required.
Learn Selenium.
Web Scraping for Hackers #
This section is for people who want to scale up their web scraping efforts and can’t use Selenium due to computational constraints. This is the more elegant form of web scraping.
-
Understand what HTTP is. Understand the different parts of a HTTP request, the headers, payload, and response. Know what cookies and URL parameters are.
-
Learn cURL and the Python requests module. Use both to query any web page.
-
Learn to use the Chrome Developer Console and the Network tab. Understand how to replicate web requests from the client to the server programmatically.