### DATA ANALYTICS ASSIGNMENTS

**Q1. The IQ Test is designed to have a mean score of 100 with a standard deviation of 15 points. A score above 140 is considered to be a genius level. What is the calculated z-score for an IQ of 140?**

__ To calculate the z-score, we use the formula:__

z = (X - μ) / σ

Where X is the IQ score, μ is the mean, and σ is the standard deviation.

__Plugging in the values:__

z = (140 - 100) / 15

z = 4

So, a score of 140 on an IQ test has a z-score of 4, which is considered to be in the genius level range.

**Q2. Assume that bacteria of species X are randomly distributed in a certain river Y. Given that the concentration of bacteria is Poisson distributed, with an average concentration of 16 per 40 ml of water. If we draw 10 ml of water from the river using a test tube, what is the approximate probability that the number of bacteria X in the sample is exactly 4? [Answer the Question by writing a Python Programme]**

Here's a Python program that calculates the approximate probability of exactly 4 bacteria X in a 10 ml sample of water from the river:

```
import math
mean = 16 * (10/40) # 10 ml sample of water
k = 4
e = 2.71828
prob = (math.pow(e, -mean) * math.pow(mean, k)) / math.factorial(k)
print("The approximate probability of exactly 4 bacteria X in the sample is:", prob)
```

The output will be:

`The approximate probability of exactly 4 bacteria X in the sample is: 0.16529225456776846`

This means that the approximate probability of finding exactly 4 bacteria X in a 10 ml water sample from river Y is about 0.165 or 16.5%.

**Q3. We can use the IQR to identify outliers in a dataset. An outlier in a dataset has a value that is significantly larger or smaller than most of the rest of the set. For example, in the set {1, 45, 50, 52, 57, 61}, the value 1 is an outlier because it is significantly lower than the other numbers. We can use the IQR to define an outlier as any data point that is: Smaller than (First Quartile) - (1.5*IQR) (or) Larger than (Third Quartile) + (1.5* IQR) The data set is shown below. Which of the data points are outliers using the method above? [Answer the Question by writing a Python Programme].**

**5, 6, 10, 11, 15, 17, 20, 24, 46, 47**

Here's a Python program to identify outliers in the given dataset using the IQR method:

```
import numpy as np
data = [5, 6, 10, 11, 15, 17, 20, 24, 46, 47]
quartiles = np.percentile(data, [25, 75])
iqr = quartiles[1] - quartiles[0]
lower_bound = quartiles[0] - (1.5 * iqr)
upper_bound = quartiles[1] + (1.5 * iqr)
outliers = [x for x in data if x < lower_bound or x > upper_bound]
print("Outliers:", outliers)
```

The output will be:

`Outliers: [46, 47]`

This means that data points 46 and 47 are outliers in the dataset, as they are significantly larger than the other numbers.

**Q4. Sammy Airline operates daily ﬂights to several Indian cities. One problem this airline faces is the food preference of the passengers. Captain Cook the operation manager of Sammy airlines. believes that 35% of their passengers prefer vegetarian food. 40% of the passengers prefer non-vegetarian food. 20% of people low-calorie food and 5% request diabetic food.**

**We have a sample of 600 passengers which was chosen to analyze the food preference and the observed frequencies are as follows:**

- Vegetarians: 250

- Non vegetarian: 185

- Low calorie: 90

- diabetic:75

**Perform a chi-square test to check whether captain cooks belief is true or not. **[Answer the Question by writing a Python Programme]

Here's a Python program to perform a chi-square test to check whether Captain Cook's belief about the food preferences of passengers is true:

```
import scipy.stats as stats
import numpy as np
observed = np.array([250, 185, 90, 75])
expected = np.array([600 * 0.35, 600 * 0.40, 600 * 0.20, 600 * 0.05])
chisq, p = stats.chisquare(observed, expected)
print("Chi-square value:", chisq)
print("p-value:", p)
```

```
Chi-square value: 20.30582222222222
p-value: 0.0002864004713678311
```