Graduate students usually contact me because they don’t get full support from their advisor or they don’t have enough knowledge in statistical analysis. Dissertation data analysis forms a crucial part of the entire dissertation. If the data analysis is not correct, then the whole research project will fail.
How do I analyze data for a dissertation? Let me show you on the following illustrative example:
Illustrative example: Predicting car prices
Let us assume that the purpose of the statistical analysis is to verify which factors affect the price of the car, so we can predict it.
In the following table you can see a sample of the data that we will use:
First, we present data graphically. The graph below shows the percentage of each fuel type in the data.
To test if the factors above affect price we use linear regression. It describes the linear relationship between variables X and Y. Mathematically, we can write this relationship as:
where yi is dependent variable, xi are independent variables and u is error term. We focus on the results of regression parameters, which show if there is an effect on the dependent variable or not. If the p-value is higher than 0.05, it means that this factor is not statistically significant and it does not affect the price of the car.
Table below describes the basic regression statistics. R-Square is 0.987198, which means that almost 99 % of variation of a dependent variable (Price) is explained by the independent variables (Year,Mileage,No. of doors)
P-value of Mileage is lower than 0.05. It means that this factor is statistically significant and it does affect the price of the car.
We can see that there is a negative linear relationship between price and mileage. The higher the mileage, the lower the price.
Chi-square test in pedagogy
Chi-square test was used to find the differences between these groups (students in different years of study and between liberal arts and science studies curriculum).
Factors affecting flat prices – Linear regression
Using R, linear regression was performed to identify which of the examined factors affected the outcome.
ANOVA and Post-hoc tests in pedagogy
ANOVA and Post-hoc tests were used to determine if there are differences between examined groups and assess the extent of them. The analysis was conducted in SPSS.
Logistic regression model in the medical field
Logistic regression and odds models were performed to identify factors that may have influence on the outcome variables of interest.
Questionnaires with ordinal variables
Several ordinal logistic regressions (OLRs) and one cumulative OLR were performed. The influence of individual independent variables was examined, and the most significant predictors of the model were selected.
Models with a numeric dependent variable
Linear regression models were chosen as a suitable method for predicting scores in performance tests. During the analysis, both numerical and categorical variables were analyzed.
Data analysis of online survey
Descriptive statistics and data visualization (bar charts, pie charts) in Excel.
Time series analysis of crime data
Step-wise linear regression was applied to find factors (economic, social, and cultural) that best explain the USA’s changing crime rates. The report was delivered in APA format.
And many more…