Free Solved[December 2023] BCS40 - Statistical Techniques Question Paper

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

Question 1. (a) An electric bulb manufacturing company chooses a random sample of 10 bulbs, received from one of the suppliers. It determines life of each bulb. The result (in thousands of hours) are as follows: 3, 4.5, 5.0, 4.2, 4.8, 4.2, 5.1, 4.0, 4.2, 4.2, 4.5. Compute and analyse a point estimate of the mean length of the life of the bulbs received from the supplier.
(b) Compare parametric and non-parametric tests. (2 Marks)

Solution:

(a) To compute the point estimate of the mean length of the life of the bulbs received from the supplier, we first calculate the sample mean.

Given data: 3, 4.5, 5.0, 4.2, 4.8, 4.2, 5.1, 4.0, 4.2, 4.2, 4.5

To find the sample mean:

Sample Mean=3+4.5+5.0+4.2+4.8+4.2+5.1+4.0+4.2+4.2+4.510\text{Sample Mean} = \frac{3 + 4.5 + 5.0 + 4.2 + 4.8 + 4.2 + 5.1 + 4.0 + 4.2 + 4.2 + 4.5}{10}

Sample Mean=48.710=4.87 thousand hours\text{Sample Mean} = \frac{48.7}{10} = 4.87 \text{ thousand hours}

Therefore, the point estimate of the mean length of the life of the bulbs received from the supplier is 4.874.87 thousand hours.

The analysis of this estimate could include discussing the variability of the data, potential biases in the sample, and the reliability of using this estimate to make inferences about the entire population of bulbs received from the supplier.

(b) Parametric tests and non-parametric tests are two broad categories of statistical tests used for hypothesis testing.

Parametric tests assume that the data being analyzed follow a specific probability distribution, often the normal distribution. Examples of parametric tests include t-tests, ANOVA, and linear regression. These tests typically require certain assumptions about the data distribution and variance.

Non-parametric tests, on the other hand, make fewer assumptions about the data distribution. They are used when the data do not meet the assumptions of parametric tests or when the data are ordinal or categorical. Examples of non-parametric tests include the Wilcoxon signed-rank test, Mann-Whitney U test, and Kruskal-Wallis test.

In summary, parametric tests are more powerful when their assumptions are met, but non-parametric tests are more robust and can be used in a wider range of situations where parametric assumptions are violated or when dealing with non-normally distributed data.

Question 2. An insurance company has insured 1000 truck drivers, 3000 car drivers and 6000 scooter drivers. The probabilities that the truck, car and scooter drivers meet with an accident are 0.20.2, 0.040.04 and 0.250.25, respectively. One of the insured persons meets with an accident. What is the probability that the person is a car driver? (5 Marks)

Solution:

To find the probability that the person involved in the accident is a car driver, we can use Bayes' theorem.
Let AA represent the event that the person involved in the accident is a car driver.
Let BB represent the event that the person involved in the accident is any type of driver (truck, car, or scooter).
We want to find P(AB)P(A | B), the probability that the person involved in the accident is a car driver given that they are any type of driver.
According to Bayes' theorem:
P(AB)=P(BA)P(A)P(B)P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}
Given:
- P(A)P(A) = Probability that the person involved in the accident is a car driver = 30001000+3000+6000=300010000\frac{3000}{1000 + 3000 + 6000} = \frac{3000}{10000}
- P(BA)P(B | A) = Probability that the person involved in the accident is any type of driver given that they are a car driver = 0.040.04 (as provided in the question)
- P(B)P(B) = Probability that the person involved in the accident is any type of driver Now, let's calculate P(B)P(B):
P(B)=1000×0.2+3000×0.04+6000×0.2510000P(B) = \frac{1000 \times 0.2 + 3000 \times 0.04 + 6000 \times 0.25}{10000}
P(B)=200+120+150010000P(B) = \frac{200 + 120 + 1500}{10000}
P(B)=182010000P(B) = \frac{1820}{10000}
P(B)=0.182P(B) = 0.182
Now, substitute the values into Bayes' theorem:
P(AB)=0.04×3000100000.182P(A | B) = \frac{0.04 \times \frac{3000}{10000}}{0.182}
P(AB)=0.04×0.30.182P(A | B) = \frac{0.04 \times 0.3}{0.182}
P(AB)=0.0120.182P(A | B) = \frac{0.012}{0.182}
P(AB)0.0120.1820.0659P(A | B) ≈ \frac{0.012}{0.182} ≈ 0.0659
Therefore, the probability that the person involved in the accident is a car driver is approximately 0.06590.0659 or 6.59%6.59\%.

Question 3. A football manufacturing company wants to check the variation in the weight of balls. For this, 25 samples (each of size 4) are selected. The weight of each ball is measured (in grams), the sum of sample averages and sum of sample ranges were found to be i=125xˉi=4010\sum_{i=1}^{25} \bar{x}_i = 4010 grams and i=125Ri=72\sum_{i=1}^{25} R_i = 72 grams, respectively. Compute the control limits for the Xˉ\bar{X} and R-chart. It is given that A2=0.729A2 = 0.729, D3=0D3 = 0 and D4=2.282D4 = 2.282 (5 Marks).

Solution:

Given:

Number of samples (kk) = 25
Sample size (nn) = 4
Sum of sample averages (i=125xˉi\sum_{i=1}^{25} \bar{x}_i) = 4010 grams
Sum of sample ranges (i=125Ri\sum_{i=1}^{25} R_i) = 72 grams
Constants: A2=0.729A2 = 0.729, D3=0D3 = 0, D4=2.282D4 = 2.282

Step-by-Step Solution:

1. Calculate the average of sample means (xˉˉ\bar{\bar{x}}):
xˉˉ=i=125xˉi25=401025=160.4 grams\bar{\bar{x}} = \frac{\sum_{i=1}^{25} \bar{x}_i}{25} = \frac{4010}{25} = 160.4 \text{ grams}

2. Calculate the average range (Rˉ\bar{R}):
Rˉ=i=125Ri25=7225=2.88 grams\bar{R} = \frac{\sum_{i=1}^{25} R_i}{25} = \frac{72}{25} = 2.88 \text{ grams}

3. Control Limits for Xˉ\bar{X}-Chart:
The control limits for the Xˉ\bar{X}-chart are calculated as follows:

Upper Control Limit (UCL):
UCLXˉ=xˉˉ+A2Rˉ\text{UCL}_{\bar{X}} = \bar{\bar{x}} + A2 \cdot \bar{R}
UCLXˉ=160.4+0.7292.88=160.4+2.10032=162.50032 grams\text{UCL}_{\bar{X}} = 160.4 + 0.729 \cdot 2.88 = 160.4 + 2.10032 = 162.50032 \text{ grams}

Center Line (CL):
CLXˉ=xˉˉ=160.4 grams\text{CL}_{\bar{X}} = \bar{\bar{x}} = 160.4 \text{ grams}

Lower Control Limit (LCL):
LCLXˉ=xˉˉA2Rˉ\text{LCL}_{\bar{X}} = \bar{\bar{x}} - A2 \cdot \bar{R}
LCLXˉ=160.40.7292.88=160.42.10032=158.29968 grams\text{LCL}_{\bar{X}} = 160.4 - 0.729 \cdot 2.88 = 160.4 - 2.10032 = 158.29968 \text{ grams}

4. Control Limits for R-Chart:
The control limits for the R-chart are calculated as follows:

Upper Control Limit (UCL):
UCLR=D4Rˉ\text{UCL}_R = D4 \cdot \bar{R}
UCLR=2.2822.88=6.57136 grams\text{UCL}_R = 2.282 \cdot 2.88 = 6.57136 \text{ grams}

Center Line (CL):
CLR=Rˉ=2.88 grams\text{CL}_R = \bar{R} = 2.88 \text{ grams}

Lower Control Limit (LCL):
LCLR=D3Rˉ\text{LCL}_R = D3 \cdot \bar{R}
LCLR=02.88=0 grams\text{LCL}_R = 0 \cdot 2.88 = 0 \text{ grams}

Question 4. The frequency distribution of the accidental data of the factory for the last 50 weeks is shown below: No. of Accidents No. of Weeks is shown below:
No. of AccidentsNo. of Weeks
0 - 58
5 - 1022
10 - 1510
15 - 208
20 - 252
Draw the histogram and calculate the average number of accidents per week. (5 Marks)

Solution:

The frequency distribution of the accidental data of the factory for the last 50 weeks is as follows:

No. of AccidentsNo. of Weeks
0 - 58
5 - 1022
10 - 1510
15 - 208
20 - 252
No. of AccidentsMidpoint (m)No. of Weeks (f)m * f
0 - 52.5820
5 - 107.522165
10 - 1512.510125
15 - 2017.58140
20 - 2522.5245

Total of m * f = 20 + 165 + 125 + 140 + 45 = 495

Average number of accidents per week:

Average=Total of m×fTotal number of weeks=49550=9.9\text{Average} = \frac{\text{Total of } m \times f}{\text{Total number of weeks}} = \frac{495}{50} = 9.9

So, the average number of accidents per week is 9.9.

Question 5. In order to test whether there is any significant difference between the proportion of safety consciousness of men and women, while driving a car, a study was conducted. The study includes a sample of 300 men and 300 women. Out of 300 men, 130 said that they used seat belts, and out of 300 women, 90 said that they used seat belts. Based on the given data, test the claim that there is no significant difference between the proportion of safety consciousness of men and women, while driving a car at 5% level of significance. (Given that Z0.025=1.96Z_{0.025} = 1.96).

Solution:

Step 1: Formulate the hypotheses.
Null hypothesis: H0:p1=p2H_0: p_1 = p_2 (There is no significant difference between the proportions)
Alternative hypothesis: Ha:p1p2H_a: p_1 \neq p_2 (There is a significant difference between the proportions)

Step 2: Calculate the sample proportions.
p1=130300=0.4333p_1 = \frac{130}{300} = 0.4333
p2=90300=0.3p_2 = \frac{90}{300} = 0.3

Step 3: Calculate the pooled proportion.
p=130+90300+300=220600=0.3667p = \frac{130 + 90}{300 + 300} = \frac{220}{600} = 0.3667

Step 4: Calculate the standard error.
SE=p(1p)(1n1+1n2)SE = \sqrt{p(1 - p)\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}
SE=0.3667×0.6333(1300+1300)SE = \sqrt{0.3667 \times 0.6333 \left(\frac{1}{300} + \frac{1}{300}\right)}
SE=0.3667×0.6333×0.0067SE = \sqrt{0.3667 \times 0.6333 \times 0.0067}
SE=0.001556SE = \sqrt{0.001556}
SE=0.0394SE = 0.0394

Step 5: Calculate the z-statistic.
z=p1p2SEz = \frac{p_1 - p_2}{SE}
z=0.43330.30.0394z = \frac{0.4333 - 0.3}{0.0394}
z=0.13330.0394z = \frac{0.1333}{0.0394}
z=3.38z = 3.38

Step 6: Compare the z-statistic to the critical value.
The critical value at 5% level of significance for a two-tailed test is Z0.025=1.96Z_{0.025} = 1.96.
Since 3.38>1.96|3.38| > 1.96, we reject the null hypothesis.

Conclusion:
There is a significant difference between the proportion of safety consciousness of men and women while driving a car at the 5% level of significance.

Question 6. A company manufactures two types of machines (A and B). The manager of the company tests a random sample of 50 machines of Type A and 60 machines of Type B and found the following information:
Mean Life (in hours) Standard Deviation(in hourse)
Type A130050
Type B120060

Obtain 99% confidence interval for the difference of the average life of the two types of machines. (Given that Z0.005=2.58Z_{0.005} = 2.58).

Solution:

Step 1: Identify the given data.
Sample size for Type A, n1=50n_1 = 50
Sample size for Type B, n2=60n_2 = 60
Mean life for Type A, X1=1300\overline{X}_1 = 1300
Mean life for Type B, X2=1200\overline{X}_2 = 1200
Standard deviation for Type A, S1=50S_1 = 50
Standard deviation for Type B, S2=60S_2 = 60

Step 2: Calculate the standard error of the difference between means.
SE=S12n1+S22n2SE = \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}
SE=50250+60260SE = \sqrt{\frac{50^2}{50} + \frac{60^2}{60}}
SE=250050+360060SE = \sqrt{\frac{2500}{50} + \frac{3600}{60}}
SE=50+60SE = \sqrt{50 + 60}
SE=110SE = \sqrt{110}
SE=10.488SE = 10.488

Step 3: Calculate the difference between the sample means.
(X1X2)=13001200=100(\overline{X}_1 - \overline{X}_2) = 1300 - 1200 = 100

Step 4: Determine the margin of error using the z-value for 99% confidence interval.
ME=Z0.005×SEME = Z_{0.005} \times SE
ME=2.58×10.488ME = 2.58 \times 10.488
ME=27.458ME = 27.458

Step 5: Calculate the confidence interval.
(X1X2)±ME(\overline{X}_1 - \overline{X}_2) \pm ME
100±27.458100 \pm 27.458
72.542 to 127.45872.542 \text{ to } 127.458

Conclusion:
The 99% confidence interval for the difference in the average life of the two types of machines is from 72.542 hours to 127.458 hours.

Question 7. To enforce the speed limit at four different locations in the city, the Police plans to install radar traps at each of the locations L1, L2, L3 and L4. The radar traps at each of the locations L1, L2, L3 and L4 are operated 40%, 30%, 20% and 30% of the time. If a person who is speeding on his way to work has probabilities of 0.2, 0.1, 0.5 and 0.2 respectively, of passing through these locations, what is the probability that he will receive a speeding ticket? Find also the probability that he will receive a speeding ticket at locations L1, L2, L3 and L4.

Solution:

Step 1: Identify the given data.
Probability of radar trap being operated at L1, P(T1)=0.4P(T_1) = 0.4
Probability of radar trap being operated at L2, P(T2)=0.3P(T_2) = 0.3
Probability of radar trap being operated at L3, P(T3)=0.2P(T_3) = 0.2
Probability of radar trap being operated at L4, P(T4)=0.3P(T_4) = 0.3
Probability of passing through L1, P(L1)=0.2P(L_1) = 0.2
Probability of passing through L2, P(L2)=0.1P(L_2) = 0.1
Probability of passing through L3, P(L3)=0.5P(L_3) = 0.5
Probability of passing through L4, P(L4)=0.2P(L_4) = 0.2

Step 2: Calculate the probability of receiving a speeding ticket at each location.
Probability of receiving a speeding ticket at L1, P(SL1)=P(T1)×P(L1)=0.4×0.2=0.08P(S|L_1) = P(T_1) \times P(L_1) = 0.4 \times 0.2 = 0.08
Probability of receiving a speeding ticket at L2, P(SL2)=P(T2)×P(L2)=0.3×0.1=0.03P(S|L_2) = P(T_2) \times P(L_2) = 0.3 \times 0.1 = 0.03
Probability of receiving a speeding ticket at L3, P(SL3)=P(T3)×P(L3)=0.2×0.5=0.10P(S|L_3) = P(T_3) \times P(L_3) = 0.2 \times 0.5 = 0.10
Probability of receiving a speeding ticket at L4, P(SL4)=P(T4)×P(L4)=0.3×0.2=0.06P(S|L_4) = P(T_4) \times P(L_4) = 0.3 \times 0.2 = 0.06

Step 3: Calculate the total probability of receiving a speeding ticket.
P(S)=P(SL1)+P(SL2)+P(SL3)+P(SL4)P(S) = P(S|L_1) + P(S|L_2) + P(S|L_3) + P(S|L_4)
P(S)=0.08+0.03+0.10+0.06P(S) = 0.08 + 0.03 + 0.10 + 0.06
P(S)=0.27P(S) = 0.27

Conclusion:
The probability that the person will receive a speeding ticket is 0.27.
The probabilities of receiving a speeding ticket at locations L1, L2, L3, and L4 are 0.08, 0.03, 0.10, and 0.06 respectively.

Question 8. Find and plot the regression line of y on x, for the data given below:
Speed(Km/hr)30405060
Stopping Distance(in feet)160240330435

Solution:

Speed (X): 30, 40, 50, 60
Stopping Distance (Y): 160, 240, 330, 435

To find the regression line Y=a+bXY = a + bX, we need to calculate the slope bb and the intercept aa.

Step 1: Calculate the means of X and Y.
X=Xn=30+40+50+604=45\overline{X} = \frac{\sum X}{n} = \frac{30 + 40 + 50 + 60}{4} = 45
Y=Yn=160+240+330+4354=291.25\overline{Y} = \frac{\sum Y}{n} = \frac{160 + 240 + 330 + 435}{4} = 291.25

Step 2: Calculate the slope bb.
b=(XX)(YY)(XX)2b = \frac{\sum (X - \overline{X})(Y - \overline{Y})}{\sum (X - \overline{X})^2}

Calculate (XX)(YY)\sum (X - \overline{X})(Y - \overline{Y}):
(XX)(YY)=(3045)(160291.25)+(4045)(240291.25)+(5045)(330291.25)+(6045)(435291.25)\sum (X - \overline{X})(Y - \overline{Y}) = (30 - 45)(160 - 291.25) + (40 - 45)(240 - 291.25) + (50 - 45)(330 - 291.25) + (60 - 45)(435 - 291.25)
=(15)(131.25)+(5)(51.25)+(5)(38.75)+(15)(143.75)= (-15)(-131.25) + (-5)(-51.25) + (5)(38.75) + (15)(143.75)
=1968.75+256.25+193.75+2156.25= 1968.75 + 256.25 + 193.75 + 2156.25
=4575= 4575

Calculate (XX)2\sum (X - \overline{X})^2:
(XX)2=(3045)2+(4045)2+(5045)2+(6045)2\sum (X - \overline{X})^2 = (30 - 45)^2 + (40 - 45)^2 + (50 - 45)^2 + (60 - 45)^2
=(15)2+(5)2+(5)2+(15)2= (-15)^2 + (-5)^2 + (5)^2 + (15)^2
=225+25+25+225= 225 + 25 + 25 + 225
=500= 500

Calculate the slope bb:
b=4575500=9.15b = \frac{4575}{500} = 9.15

Step 3: Calculate the intercept aa.
a=YbXa = \overline{Y} - b\overline{X}
a=291.259.15×45a = 291.25 - 9.15 \times 45
a=291.25411.75a = 291.25 - 411.75
a=120.5a = -120.5

Step 4: Form the regression equation.
Y=a+bXY = a + bX
Y=120.5+9.15XY = -120.5 + 9.15X

The regression line of y on x is Y=120.5+9.15XY = -120.5 + 9.15X.

Step 5: Plot the regression line.

To plot the regression line, use the equation Y=120.5+9.15XY = -120.5 + 9.15X to calculate Y for various values of X. Then, plot the points and draw the line through them.

Using the equation for a few points:
For X=30X = 30, Y=120.5+9.15×30=154.5Y = -120.5 + 9.15 \times 30 = 154.5
For X=40X = 40, Y=120.5+9.15×40=246Y = -120.5 + 9.15 \times 40 = 246
For X=50X = 50, Y=120.5+9.15×50=337.5Y = -120.5 + 9.15 \times 50 = 337.5
For X=60X = 60, Y=120.5+9.15×60=429Y = -120.5 + 9.15 \times 60 = 429

These points can be used to plot the regression line on a graph.

Question 9. A chemical firm wants to determine how four catalysts differ in yield? The firm runs the experiment in three of its plant, namely A, B & C. In each plant, the yield is measured with each catalyst. The yield (in quintals) are as follows:
PlantCatalyst
1234
A2124
B3213
C1331
Perform an ANOVA and comment whether the yield due to a particular catalyst is significant or not at 5% level of significance (Given F3,6=4.76F_{3,6} = 4.76).

Solution:

Step 1: Calculate the means

Total mean:
Y=2+1+2+4+3+2+1+3+1+3+3+112=2612=2.17\overline{Y} = \frac{2 + 1 + 2 + 4 + 3 + 2 + 1 + 3 + 1 + 3 + 3 + 1}{12} = \frac{26}{12} = 2.17

Means for each catalyst:

Y1=2+3+13=2\overline{Y_1} = \frac{2 + 3 + 1}{3} = 2
Y2=1+2+33=2\overline{Y_2} = \frac{1 + 2 + 3}{3} = 2
Y3=2+1+33=2\overline{Y_3} = \frac{2 + 1 + 3}{3} = 2
Y4=4+3+13=2.67\overline{Y_4} = \frac{4 + 3 + 1}{3} = 2.67

Step 2: Calculate the sum of squares between groups (SSB)

SSB=i=1kni(YiY)2SSB = \sum_{i=1}^{k} n_i (\overline{Y_i} - \overline{Y})^2
=3(22.17)2+3(22.17)2+3(22.17)2+3(2.672.17)2= 3(2 - 2.17)^2 + 3(2 - 2.17)^2 + 3(2 - 2.17)^2 + 3(2.67 - 2.17)^2
=3(0.03)+3(0.03)+3(0.03)+3(0.25)= 3(0.03) + 3(0.03) + 3(0.03) + 3(0.25)
=0.09+0.09+0.09+0.75= 0.09 + 0.09 + 0.09 + 0.75
=1.02= 1.02

Step 3: Calculate the sum of squares within groups (SSW)

SSW=i=1kj=1ni(YijYi)2SSW = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (Y_{ij} - \overline{Y_i})^2
=(22)2+(12)2+(22)2+(42.67)2+(32.67)2+(12.67)2+(32)2+(22)2+(12)2+(12)2+(32)2+(32)2= (2-2)^2 + (1-2)^2 + (2-2)^2 + (4-2.67)^2 + (3-2.67)^2 + (1-2.67)^2 + (3-2)^2 + (2-2)^2 + (1-2)^2 + (1-2)^2 + (3-2)^2 + (3-2)^2
=0+1+0+1.76+0.11+2.78+1+0+1+1+1+1= 0 + 1 + 0 + 1.76 + 0.11 + 2.78 + 1 + 0 + 1 + 1 + 1 + 1
=11.65= 11.65

Step 4: Calculate the mean squares

MSB=SSBk1=1.023=0.34MSB = \frac{SSB}{k-1} = \frac{1.02}{3} = 0.34
MSW=SSWNk=11.658=1.46MSW = \frac{SSW}{N-k} = \frac{11.65}{8} = 1.46

Step 5: Calculate the F-statistic

F=MSBMSW=0.341.46=0.23F = \frac{MSB}{MSW} = \frac{0.34}{1.46} = 0.23

Step 6: Compare the calculated F-statistic with the critical value

Since the calculated F-statistic (0.23)(0.23) is less than the critical value (4.76)(4.76), we fail to reject the null hypothesis.

Conclusion: There is no significant difference in the yield due to the different catalysts at the 5% level of significance.

Question 10. In order to study the impact of air pollution on households, a random sample of 200 households was selected from each of the two communities. The respondent in each house was asked whether or not any one in the house was bothered by air pollution. The responses are tabulated below (Given χ0.05,12=3.841\chi^2_{0.05,1} = 3.841, α=0.05\alpha = 0.05):
CommunityYesNoTotal
I43157200
II81119200
Total124276400
Can the researcher conclude that the 2 communities are bothered differently by air pollution?

Solution:

Step 1: State the Hypotheses

Null Hypothesis (H0H_0): There is no significant difference between the two communities in terms of being bothered by air pollution.
Alternative Hypothesis (HaH_a): There is a significant difference between the two communities in terms of being bothered by air pollution.


Step 2: Calculate the Expected Frequencies

The expected frequency for each cell in the table is calculated using the formula:
Eij=(Row Total)×(Column Total)Grand TotalE_{ij} = \frac{(Row\ Total) \times (Column\ Total)}{Grand\ Total}

For Community I (Yes):
E11=(Total for Community I)×(Total Yes)Grand Total=200×124400=62E_{11} = \frac{(Total\ for\ Community\ I) \times (Total\ Yes)}{Grand\ Total} = \frac{200 \times 124}{400} = 62

For Community I (No):
E12=(Total for Community I)×(Total No)Grand Total=200×276400=138E_{12} = \frac{(Total\ for\ Community\ I) \times (Total\ No)}{Grand\ Total} = \frac{200 \times 276}{400} = 138

For Community II (Yes):
E21=(Total for Community II)×(Total Yes)Grand Total=200×124400=62E_{21} = \frac{(Total\ for\ Community\ II) \times (Total\ Yes)}{Grand\ Total} = \frac{200 \times 124}{400} = 62

For Community II (No):
E22=(Total for Community II)×(Total No)Grand Total=200×276400=138E_{22} = \frac{(Total\ for\ Community\ II) \times (Total\ No)}{Grand\ Total} = \frac{200 \times 276}{400} = 138


Step 3: Calculate the Chi-Square Statistic

The chi-square statistic (χ2\chi^2) is calculated using the formula:
χ2=(OijEij)2Eij\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where OijO_{ij} are the observed frequencies and EijE_{ij} are the expected frequencies.

Let's compute it:

For Community I (Yes):
χ112=(4362)262=(19)262=361625.82\chi^2_{11} = \frac{(43 - 62)^2}{62} = \frac{(-19)^2}{62} = \frac{361}{62} \approx 5.82

For Community I (No):
χ122=(157138)2138=192138=3611382.62\chi^2_{12} = \frac{(157 - 138)^2}{138} = \frac{19^2}{138} = \frac{361}{138} \approx 2.62

For Community II (Yes):
χ212=(8162)262=19262=361625.82\chi^2_{21} = \frac{(81 - 62)^2}{62} = \frac{19^2}{62} = \frac{361}{62} \approx 5.82

For Community II (No):
χ222=(119138)2138=(19)2138=3611382.62\chi^2_{22} = \frac{(119 - 138)^2}{138} = \frac{(-19)^2}{138} = \frac{361}{138} \approx 2.62

Summing these, we get:
χ2=5.82+2.62+5.82+2.62=16.88\chi^2 = 5.82 + 2.62 + 5.82 + 2.62 = 16.88


Step 4: Compare the Chi-Square Statistic to the Critical Value

Given χ0.05,12=3.841\chi^2_{0.05,1} = 3.841, since 16.88>3.84116.88 > 3.841, we reject the null hypothesis.


Conclusion

The researcher can conclude that the two communities are bothered differently by air pollution at the 0.05 significance level.