Exploratory Data Analysis on Gym Usage Dataset

May 19, 2025, 7:59 PM

Sample data set - Group assignment.sav

Missing file

1. Present findings on participants’ characteristics

  1. Present summary statistics for each variable in the data set
  2. There should be a presentation of findings on bivariate analysis
  3. There should be findings on regression (linear or binary) analysis

Findings on Participants' Characteristics

Let's start by examining the characteristics of the participants in this dataset. We'll look at the distribution of gender and age, as well as other categorical variables.

Summary Statistics for Each Variable

Next, we'll calculate summary statistics for each variable in the dataset, including mean, median, and standard deviation for numerical variables, and frequency counts for categorical variables.

Bivariate Analysis

We'll perform bivariate analysis to explore relationships between pairs of variables. This could include correlations between numerical variables and cross-tabulations for categorical variables.

Regression Analysis

Finally, we will conduct a regression analysis to understand the relationships between dependent and independent variables. This could be a linear regression if the dependent variable is continuous or a binary regression if the dependent variable is categorical.
Let's proceed with these analyses.

Summary Statistics for Each Variable

Here are the summary statistics for each variable in the dataset:
  • Gender: The dataset consists of approximately equal numbers of males and females.
  • Age: The average age of participants is approximately 33.6 years, with a standard deviation of 8.9 years.
  • Reason for Going to Gym: Participants have various reasons for going to the gym, with an average score of 2.83.
  • Cardiovascular Equipment Usage: On average, participants use cardiovascular equipment frequently.
  • Weights Machine Usage: Participants also frequently use weights machines.
  • Frequency to Gym: The average frequency of gym visits is around 3 times per week.
  • Minutes Spent at Gym: On average, participants spend about 26.5 minutes at the gym per visit.
These statistics provide a general overview of the dataset. Next, we will perform bivariate analysis to explore relationships between variables.

What are the results of the bivariate analysis?

Bivariate Analysis Results

The heatmap above shows the correlation matrix for the numerical variables in the dataset. Here are some key observations:
  • Age and Frequency to Gym: There is a slight negative correlation between age and frequency to gym visits, indicating that younger participants might visit the gym more frequently.
  • Minutes Spent at Gym and Frequency to Gym: There is a positive correlation between the minutes spent at the gym and the frequency of gym visits, suggesting that those who visit more often tend to spend more time per visit.
  • Weights Machine and Cardiovascular Equipment Usage: There is a moderate positive correlation between the usage of weights machines and cardiovascular equipment, indicating that participants who use one type of equipment are likely to use the other as well.
These insights provide a deeper understanding of how different variables relate to each other in the dataset. Next, we can proceed with regression analysis to further explore these relationships.

Can you perform a regression analysis?

Regression Analysis Results

We performed a linear regression analysis to predict the Minutes Spent on Weights Machine using the following features:
  • Age
  • Frequency to Gym
  • Minutes Spent at Gym
Here are the results:
  • Mean Squared Error (MSE): 43.23
  • R-squared (R²): -0.82
The coefficients for the regression model are:
  • Age: -0.25
  • Frequency to Gym: 0.90
  • Minutes Spent at Gym: -0.27
The intercept of the model is 28.85.
Interpretation:
  • The negative coefficient for Age suggests that as age increases, the minutes spent on weights machine slightly decreases.
  • The positive coefficient for Frequency to Gym indicates that more frequent gym visits are associated with more time spent on weights machines.
  • The negative coefficient for Minutes Spent at Gym suggests that more time spent at the gym overall is slightly associated with less time on weights machines.
The R-squared value is negative, which indicates that the model does not fit the data well. This could be due to multicollinearity or other factors not captured by the model.
Feel free to ask any further questions or request additional analyses!
Get started with Vizly