The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
In October 2023, James Hoffman and Cometeer held the “Great American Coffee Taste Test” on YouTube, asking viewers to fill out a survey and coffee ordered from Cometeer.
The data is part of DSLC Tidy Tuesday program where data sets are provided to help data science learners how to create graphics.
Information on the data sets variables (columns) can be found here.
The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
Categorical data are data recordings that represented a category.
Data may be recorded as a “character” or “string” data.
Data may be recorded as a whole number, with an attached code book indicating the categories each number belongs to.
Are you a student?
What city do you live in?
What is your major?
Likert scales are the rating systems you may have answered in surveys.
Likert scales may be treated as numerical data if the jumps between scales are equal.
Once we have the data, how do we summarize it to other people.
The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
Continguency tables display how often a category is seen in the data.
There are two types of statistics that are reported in a table, the frequency and proportion.
Frequency represents the count of observing a specific category in your sample.
#> [1] NA "2" "3" "2" "1" "3" "2" "2" "4" "2"
Proportions represent the percentage that the category represents the sample.
This allows you to generalize your sample to the population, regardless of sample size.
The variable caffeine indicates how much caffeine a participant prefers.
The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
Plotting in R can be done via the ggplot2, a powerful library based on the Grammar of Graphics.
ggplot()+ to change the look of the base plotgeom_*stat_*theme_* function to add a theme to the plotBar Plots can be used to display the frequency or proportions on the data.
The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
The variable taste indicates if the participants like the taste of coffee.
Cross-tabulations, also known as contingency tables, are statistical tools used to analyze the relationship between two or more categorical variables by displaying their frequency distribution in a table format. Each cell in the table represents the count or frequency of observations that fall into a particular combination of categories for the variables.
Table proportions in cross-tabulations refer to the relative frequency or percentage of counts within the entire table, calculated by dividing each cell’s count by the total sum of all counts in the table. These proportions allow you to examine the contribution of each cell to the overall data set.
Row proportions refer to the relative frequency or percentage of counts within each row of a contingency table. In a cross-tabulation, row proportions allow you to compare how the distribution of one variable varies within each category of another variable, within a row.
Column proportions refer to the relative frequency or percentage of counts within each column of a contingency table. These proportions allow you to compare how the distribution of one variable varies across different categories of another variable, within a column.
The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
A pie chart is a circular statistical graphic divided into slices, where each slice represents a proportion or percentage of the whole. The size of each slice is proportional to the relative frequency or magnitude of the category it represents.
The Great American Coffee Taste Test
Categorical Data
Continguency Tables
Bar Plots
Cross-Tabulation
Pie Charts
Theming
The R packages ThemePark and ggthemes allows you to change the overall look of a plot.
All you need to do is add the theme to the plot.
m201.inqs.info/lectures/2