2 What is Data Analysis?
I mean my definition is data science is like data analysis by programming. Which of course begs the question of what data analysis is, and so I think of data analysis as really any activity where the input is data and the output is understanding or knowledge or insights. So I think of that pretty broadly. And then to do data science you’re not doing it by pointing and clicking. You’re doing it by writing some code in a programming language.
-Hadley Wickham (Eremenko 2020)
Data analysis at its most simple form is the process of searching for meaning in data with the ultimate goal to draw insight from that meaning.
2.1 The Process of Data Analysis
The process of data analysis can be generally described in five steps:
Gathering Requirements - Before one embarks on an analysis, it’s important to make sure the requirements are understood. Requirements include the questions your stakeholders are hoping to answer as well as the technical requirements of how you are going to perform your analysis.
Data Acquisition - As you might imagine, you must acquire your data before conducting an analysis. This may be done through methods such as manual creation of datasets, importing pre-constructed data, or leveraging APIs.
Data Preparation - Most data will not be received in the precise format you need to begin your analysis. The process of data preparation involves structuring and adding features to your data.
Developing Insights - Once your data is prepared, you can begin to make sense of it and develop insights about its meaning.
Reporting - Finally, it’s important to report on your data in such a way that the information can be digested by the people who need to see it when they need to see it.
Other sources may include additional steps such as “acting on the analysis”. While this is a critical step for organizations to capture the full value of their data, I would argue that it occurs outside of the analysis process.
This book will focus on the technical skills required to conduct an analysis. Because of this, we will be covering steps two through five and omitting step one.
2.2 Resources
- “Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data” by EMC Education Services: https://onlinelibrary.wiley.com/doi/book/10.1002/9781119183686
- “Managing the Analytics Life Cycle for Decisions at Scale” by SAS: https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/manage-analytical-life-cycle-continuous-innovation-106179.pdf