## R Programming Homework Solution on Simulation in R

- 20th Sep, 2022
- 15:24 PM

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

options(warn = -1)

```

`This part of the homework will involve a simulation in R similar to what has been done in the notes. The goals will be to investigate the CLT and Delta method normality! You can turn in an R code with written answers as text, or write this up in a report that includes the R code and output.`

# 1. Use R to generate N=10,000 samples of size n=5 from a normal distribution with mean 15 and variance 7. For each sample of size 5, you should find the mean of the sample (store this in a vector or matrix).

```{r}

set.seed(100)

mu_5 <- c()

repl = 10000

n = 5

for(i in 1:repl){

x <- rnorm(n, 15,7)

mu_5[i] <- mean(x)

}

```

# 2. Repeat the above steps for sample sizes n=20 and n=100.

```{r}

## For 20

mu_20 <- c()

repl = 10000

n = 20

for(i in 1:repl){

x <- rnorm(n, 15,7)

mu_20[i] <- mean(x)

}

# For 100

mu_100 <- c()

repl = 10000

n = 100

for(i in 1:repl){

x <- rnorm(n, 15,7)

mu_100[i] <- mean(x)

}

```

# 3. Create a 1x3 plotting window so we can have 3 plots displayed at once $(par(mfrow = c(1, 3)))$. All plots should be given titles and have their axes labeled appropriately.

# 4. Create histograms of the standardized means of size 5, 20, and 100. Overlay the standard normal distribution on each plot (use freq = FALSE on the histogram calls).

```{r}

z_5 <- (mu_5 - 15)*sqrt(5)/7

z_20 <- (mu_20 - 15)*sqrt(20)/7

z_100 <- (mu_100 - 15)*sqrt(100)/7

par(mfrow = c(1,3))

hist(z_5, main = "Std Mean for n = 5", xlab = "Std Mean", freq = F)

hist(z_20,main = "Std Mean for n = 20", xlab = "Std Mean", freq = F)

hist(z_100,main = "Std Mean for n = 100", xlab = "Std Mean", freq = F)

```

# 5. Below the code to do the above, you should have a section that answers the following questions:

## (a) Explain what the above process and plots are attempting to investigate about the sample mean.

Ans: The above plots are attempts to see verify the limiting distribution of sample mean indeed converges to normal distribution as proved by CLT. The histogram looks like sample from normal distribution. Although, here the underlying distribution is normal with known variance so the _standardized_ sample mean will have exact normal distribution.

## (b) In your opinion, which histograms are well approximated by the standard normal distribution? Give a theoretical argument relating to what you see.

Ans: Here all the histograms are very well approximated by the standard normal distribution. The reason for that is the underlying sampling distribution itself is normal.

# 6. Now suppose we want to investigate $W = e^{\bar{Y}}$ . Use your vectors of Y to create 10,000 w values (you shouldn't need to do any new random values).

```{r}

w_5 <- exp(mu_5)

w_20 <- exp(mu_20)

w_100 <- exp(mu_100)

```

# 7. Use the first order delta method to standardize all of these values (by subtracting off the approximate mean and dividing by the approximate standard error).

The transformation we have used is: $$ g(x) = e^x$$

The mean of the final transformed random variable is:

$$ \mu \rightarrow g(\mu) $$

and the variance goes to:

$$ \sigma^2 \rightarrow \sigma^2 (g^{\prime} (\mu))^2 $$

$$ \sqrt{n}[g(X_{n})-g(\theta )]\,{\xrightarrow {D}}\,{\mathcal {N}}(0,\sigma ^{2}\cdot [g'(\theta )]^{2}) $$

```{r}

gW_5 <- (w_5 - exp(15))*sqrt(5)/(7*exp(15))

gW_20 <- (w_20 - exp(15))*sqrt(20)/(7*exp(15))

gW_100 <- (w_100 - exp(15))*sqrt(100)/(7*exp(15))

```

# 8. Create histograms of the standardized w values for each sample size (again a 1x3

plotting window) with standard normal distributions overlayed.

```{r}

par(mfrow = c(1,3))

hist(gW_5, main = "Std Mean after Delta Method n = 5", xlab = "Std Mean", freq = F)

hist(gW_20,main = "Std Mean after Delta Method n = 20", xlab = "Std Mean", freq = F)

hist(gW_100,main = "Std Mean after Delta Method n = 100", xlab = "Std Mean", freq = F)

```

# 9. Below the code to do the above, you should have a commented section that answers the following questions:

## (a) Explain what the above process is attempting to investigate about W.

Ans: Here we are trying to see how delta method works and what is the sample after which convergence is good. W is the transformed variable by exponential distribution of the sample mean. We will see if standardized mean of W converges to normal distribution.

## (b) In your opinion, which histograms are well approximated by the standard normal distribution? Give a theoretical argument relating to what you see.

We see that the histogram for $n=100$ is the only one which somehow resembles closely with the normal distribution. Histogram for $n=5$ and $n=20$ are just not close to bell shape. Although, even for the $n=100$ case, we see right skewed property and not resembling the standard normal very well. This is due to the weak convergence for delta method. The distribution of standardized transformed mean is not exact but only converges as $$n \rightarrow \infty $$. So, at small values of n, the distribution will not resemble the normal shape. If we go higher and higher, we see better approximation.