 ## R Studio Solution on Resit Task

• 22nd Dec, 2022
• 16:27 PM

```{r setup, include=FALSE}
knitr::opts_chunk\$set(echo = TRUE)
options(warn = -1)
```

## 1. Provide appropriate graphs to display frequency distributions for school, age, visperc and flags variables

Below are the graphs for school, age, visperc and flags

```{r, echo=FALSE, eval=TRUE}
library(ggplot2)
attach(holzinger)

ggplot(holzinger, aes(x = school)) + geom_bar()
ggplot(holzinger, aes(x = ageyr)) + geom_bar()
ggplot(holzinger, aes(x = visperc))+ geom_histogram()
ggplot(holzinger, aes(x = flags))+ geom_histogram()

```

## 2. Create a variable from ageyr and agemo of those above the median age and those equal to or below median age.

```{r,echo=FALSE, eval=TRUE}
age_abs = ageyr*12+agemo
med <- median(age_abs)
age_med <- as.factor(as.integer(age_abs>med))
holzinger <- cbind(holzinger, age_med)

```

We have created the median split on age and will compare with the school variable.

### (a) Is a median split on age related to school? (i.e., produce a crosstabulation and chi-square test).

```{r, echo=FALSE, eval=TRUE}
table(school, age_med)
chisq.test(table(school, age_med))
```

The p-value of the chi-square test is \$0.000067\$ which is extremely small and hence, we conclude that there's some association between age and school.

### (b) Produce a graph comparing educational attainment by household income.
No such variables found.

## 3. Using t-tests, are there gender differences on: (a) visperc (b) wordmean (c) addition.

We shall use 0.05 as out level of significance.

A. Visperc

```{r, echo=FALSE, eval=TRUE}
t.test(visperc~sex)
```

There is no gender differences in visperc as the t-test p-value is 0.1597 which is higher than out level of significance 0.05.

B. Wordmean

```{r, echo=FALSE, eval=TRUE}
t.test(wordmean~sex)
```
There is no gender differences in wordmean as the t-test p-value is 0.8491 which is higher than out level of significance 0.05. The test is _not_ statistically insignificant.

```{r, echo=FALSE, eval=TRUE}
```
There is significant gender differences in addition as the t-test p-value is 0.04365 which is lower than out level of significance 0.05. The test is statistically insignificant.

## 4. Provide graphs to show these gender differences in (a) visperc (b)wordmean (c) addition

```{r, echo=FALSE, eval=TRUE}
ggplot(holzinger, aes(x = as.factor(sex), y = visperc)) + geom_boxplot() + xlab("Gender") + ylab("scores on visual perception test, test 1") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = wordmean)) + geom_boxplot() + xlab("Gender") + ylab("scores on word meaning test, test 9") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = addition)) + geom_boxplot() + xlab("Gender") + ylab("scores on add test, test 10") + labs(title = "Box plot")
```
## 5. Run a multiple regression with visperc as the dependent variable and cubes, sencomp, wordmean, and addition as predictors

```{r, echo=FALSE, eval=TRUE}
lm1 <- lm(visperc ~ cubes + sencomp + wordmean + addition)
summary(lm1)
```

The regression model with visperc as dependent variable and cubes, sencomp, wordmean, and addition as predictors is significant with \$R^2\$ of 0.1865. This means that predictors were able to explain 18.65\% of variance in the data. Among predictor variables, only cubes and wordmean were statistically significant at 5\% level is significance.

## 6. Produce a scatterplot of visperc on cubes. Put the regression line with standard error on the graph.

```{r, echo=FALSE, eval=TRUE}
ggplot(holzinger, aes(y = visperc, x = cubes)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Scatterplot with regression line")
```

## Appendix
```{r, echo=TRUE, eval=FALSE}
library(ggplot2)
attach(holzinger)

ggplot(holzinger, aes(x = school)) + geom_bar()
ggplot(holzinger, aes(x = ageyr)) + geom_bar()
ggplot(holzinger, aes(x = visperc))+ geom_histogram()
ggplot(holzinger, aes(x = flags))+ geom_histogram()

age_abs = ageyr*12+agemo
med <- median(age_abs)
age_med <- as.factor(as.integer(age_abs>med))
holzinger <- cbind(holzinger, age_med)

table(school, age_med)
chisq.test(table(school, age_med))

t.test(visperc~sex)

t.test(wordmean~sex)

ggplot(holzinger, aes(x = as.factor(sex), y = visperc)) + geom_boxplot() + xlab("Gender") + ylab("scores on visual perception test, test 1") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = wordmean)) + geom_boxplot() + xlab("Gender") + ylab("scores on word meaning test, test 9") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = addition)) + geom_boxplot() + xlab("Gender") + ylab("scores on add test, test 10") + labs(title = "Box plot")

lm1 <- lm(visperc ~ cubes + sencomp + wordmean + addition)

ggplot(holzinger, aes(y = visperc, x = cubes)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Scatterplot with regression line")
```