library(tidyverse)
class_survey <- read_csv("Your file path in RStudio here")Class Survey
You can access the real data from the Stat 20 Class Survey by adding the following line of code to a code cell at the top of your Quarto document. The function glimpse() gives you a peek at the data, that is, the dimensions of the data frame, the variables and the first few values for each variable.
The Data
We will be analyzing the real data from this semester’s Stat 20 Class Survey!
Now, it is time to access the data frame containing your survey responses in RStudio! You may download the import the class_survey data frame file from bCourses, upload into your session by using the yellow button in the File Directory in RStudio, and add the following line of code to a code cell at the top of your Quarto Document.
You can view the survey questions PDF on Ed. Use this PDF to help you answer the first two questions.
Question 1
part a
What appears to be the data type for the major variable according to the Taxonomy of Data? By “data type” we mean numerical discrete, numerical continuous, categorical nominal, categorical ordinal, or none of these.
part b
How did you input your major when filling out the survey? Was it a multiple choice question, a drop-down menu, or something else? Explain how this way of recording the major variable may make analysis of any one specific major difficult. Answer in at least two sentences.
Question 2
For this question, we’ll focus on the survey question:
How much do you agree with this statement: “So far at Berkeley, in terms of my social life, I’m satisfied with how things are going.”
Imagine you were working with the corresponding variable for this question as shown in the glimpse and wanted to make a one-variable bar chart with it.
part a
What is the data type for this variable according to the Taxonomy of Data?
part b
Based on the information on the variable in your environment (click the blue drop down arrow), what is the current class of this variable, according to R?
part c
Identify another potential class for this variable, according to R, that might be more appropriate based on its data type according to the Taxonomy of Data. Explain your reasoning in at least one sentence. You don’t need to write any code here - we are just looking at the data and thinking about it.
Question 3
Do Stat 20 students prefer to spend time at the beach or in the mountains?
part a
Construct a plot that answers this question.
part b
If you wanted to summarize the “typical” student’s beach-vs-mountains preference, which of the following measures would make the most sense: mean, median, or mode?
part c
Then use the plot and this summary to answer the original question in a sentence.
Question 4
Is there an association between Stat 20 students’ favorite season and terrain preference (beach or mountains)?
part a
Construct a plot that answers this question.
part b
Use this plot to answer the question in one sentence.
Question 5
What do Stat 20 students believe the chance is that a new COVID variant that disrupts instruction during this semester?
part a
Construct a plot that answers this question.
part b
Based on the plot, calculate or state the value of a typical observation (mean, median, or mode, as appropriate).
part c
Use both the plot and the measure of a typical observation to answer the question in one sentence.
Question 6
In terms of their social lives, how satisfied are Stat 20 students with how things are going?“
part a
Construct a bar chart to help answer this question. Because some of the answer options to this variable are lengthy, use the template below to make interpreting the visualization easier.
ggplot(class_survey,
aes(y = put_the_column_name_here)) +
geom_bar()Describe what you see in at least two sentences. Do you notice anything that seems off?
part b
Based on Question 2 and Question 6a, modify the social satisfication variable! Hint: you’ll have to use mutate to change the column…
part c
Then, recerate the plot of Question 6a. Hint: your code should be the same as 6a, but the plot should look different.
part d
Use your new plot to answer the original question in at least one sentence.
Question 7
Six variables appear in the survey data frame that were derived from the original prof_label question: is_artist, is_humanist, is_nat_sci, is_soc_sci, is_comp_sci and is_entrepreneur. The is_artist variable is TRUE for those students who most identified as an artist and FALSE otherwise. The other five variables are defined similarly.
part a
Propose your own question involving one of the is_ variables and numerical variable of your choice below.
part b
Construct a plot that addresses the question.
part c
Calculate a measure of center for the two groups (is and is not) separately.
part d
Use the results of part b and part c to answer your proposed question in at least two sentences.
Question 8
This last one is a full-on choose-your-own adventure!
part a
Propose your own question involving two or three variables.
part b
Create a visualization using ggplot2 and answer your question with at least one sentence.
Last Question
Will you ensure that your submission to Gradescope…
- is of a pdf generated from a qmd file,
- has all of your code visible to readers,
- and assigns each of the questions to all pages that show your work for that question?
(This one is easy! Just answer “yes” or “no”)