15  Summary Statistics

Summary statistics (otherwise known as descriptive statistics) are usually where one starts when beginning to develop insights. You may hear the phrase “Exploratory Data Analysis” (sometimes abbreviated “EDA”) throughout your career. This is the point where you try to get a high-level understanding of the distributions and relationships within your dataset.

15.1 Quantitative Data

When dealing with continuous data, one of the quickest ways to get a high level view of your data is by using the “summary” function. This function will return your extreme (minimum and maximum) values, your median, mean, 1st quantile, and 3rd quantile.

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.43   19.20   20.09   22.80   33.90 

Alternatively, you can use the following eight functions to retrieve specific information about your data.

# Returns the average
[1] 20.09062
# Returns the median
[1] 19.2
# Returns the standard deviation
[1] 6.026948
# Returns the sample variance
[1] 36.3241
# Returns the minimum value
[1] 10.4
# Returns the maximum value
[1] 33.9
# Returns the minimum and maximum value
[1] 10.4 33.9
# Returns quantile data
    0%    25%    50%    75%   100% 
10.400 15.425 19.200 22.800 33.900 

15.2 Qualitative Data

If you’re working with data that is categorical and encoded as a factor, you can view all categories by using the “levels” function.

[1] "setosa"     "versicolor" "virginica" 

However, if you want to count the number of occurrences for each level, you can use the “table” function.


    setosa versicolor  virginica 
        50         50         50 

If you need to keep digging for insights, you can represent your categories however you’d like to using the “group_by” function covered in the last chapter.

15.3 Resources