Free Solved[June 2023] BCS40 - Statistical Techniques Question Paper

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

1. Write the median formula, and use it to complete the frequency distribution given below: (5 marks)
Class Interval (C.I.)Frequency
10-2012
20-3030
30-40?
40-5065
50-60?
60-7025
70-8018
Given, the median value of 200 observations is 46.

The median formula for a frequency distribution is:
Median=L+(N2CFf)×h\text{Median} = L + \left( \frac{\frac{N}{2} - CF}{f} \right) \times h

Where:
L = Lower boundary of the median class
N = Total frequency
CF = Cumulative frequency before the median class
f = Frequency of the median class
h = Class width

Given N=200N = 200, the median class is the class where the cumulative frequency reaches N2=100\frac{N}{2} = 100. From the given data, the cumulative frequencies are:

10-20: 12
20-30: 42 (12 + 30)
30-40: ?
40-50: ?
50-60: ?
60-70: ?
70-80: ?

Since the median is 46, it falls in the class interval 40-50. Hence, this is our median class.

For class 40-50:
L = 40
CF = 42 + Frequency of class 30-40
f = 65
h = 10

Let's assume the frequency of class 30-40 is f3f_3 and the frequency of class 50-60 is f5f_5.

From the median formula:
46=40+(100CF65)×1046 = 40 + \left( \frac{100 - CF}{65} \right) \times 10
Rearranging and solving for CF:
4640=(100CF65)×1046 - 40 = \left( \frac{100 - CF}{65} \right) \times 10
6=(100CF65)×106 = \left( \frac{100 - CF}{65} \right) \times 10
6010=100CF65\frac{60}{10} = \frac{100 - CF}{65}
6.5×60=100CF6.5 \times 60 = 100 - CF
390=100CF390 = 100 - CF
CF=100390CF = 100 - 390
CF=290CF = -290

So, the cumulative frequency before the median class is 42 + f3f_3, and solving:
42+f3=39042 + f_3 = 390
f3=39042f_3 = 390 - 42
f3=348f_3 = 348

2. Fit a linear trend y=a+bxy = a + bx, to the data collected from a unit that manufactures shoes: (5 marks)
Month (x)Demand (y)
146
256
354
443
557
656

The linear trend equation is y=a+bxy = a + bx.

To find aa and bb:
b=nxyxynx2(x)2b = \frac{n \sum xy - \sum x \sum y}{n \sum x^2 - (\sum x)^2}
a=ybxna = \frac{\sum y - b \sum x}{n}

Let's calculate the necessary sums:
x=1+2+3+4+5+6=21\sum x = 1 + 2 + 3 + 4 + 5 + 6 = 21
y=46+56+54+43+57+56=312\sum y = 46 + 56 + 54 + 43 + 57 + 56 = 312
xy=(1×46)+(2×56)+(3×54)+(4×43)+(5×57)+(6×56)=46+112+162+172+285+336=1113\sum xy = (1 \times 46) + (2 \times 56) + (3 \times 54) + (4 \times 43) + (5 \times 57) + (6 \times 56) = 46 + 112 + 162 + 172 + 285 + 336 = 1113
x2=12+22+32+42+52+62=1+4+9+16+25+36=91\sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 = 1 + 4 + 9 + 16 + 25 + 36 = 91

Now, substitute these values into the equations for aa and bb:

For bb:
b=6×111321×3126×91212b = \frac{6 \times 1113 - 21 \times 312}{6 \times 91 - 21^2}
b=66786552546441b = \frac{6678 - 6552}{546 - 441}
b=126105b = \frac{126}{105}
b=1.2b = 1.2

For aa:
a=3121.2×216a = \frac{312 - 1.2 \times 21}{6}
a=31225.26a = \frac{312 - 25.2}{6}
a=286.86a = \frac{286.8}{6}
a=47.8a = 47.8

Thus, the linear trend equation is:
y=47.8+1.2xy = 47.8 + 1.2x

3. If 2% of the items manufactured in a factory are defective, then find the probability that: (5 marks)

(i) There are 3 defectives in a sample of 100

(ii) There is no defective in a sample of 50.

The probability of finding kk defectives in a sample of size nn when the defect rate is pp is given by the binomial distribution formula:
P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}

Given: p=0.02p = 0.02, n=100n = 100 for part (i), and n=50n = 50 for part (ii).

For part (i), k=3k = 3:
P(X=3)=(1003)(0.02)3(0.98)97P(X = 3) = \binom{100}{3} (0.02)^3 (0.98)^{97}

For part (ii), k=0k = 0:
P(X=0)=(500)(0.02)0(0.98)50P(X = 0) = \binom{50}{0} (0.02)^0 (0.98)^{50}
P(X=0)=(0.98)50P(X = 0) = (0.98)^{50}

4. Calculate the correlation coefficient from the following data (determined from the 20 pairs of observations of two variables X and Y): (5 marks)

Σx=15,Σy=6,Σxy=50,Σx2=61,Σy2=90\Sigma x = 15, \Sigma y = -6, \Sigma xy = 50, \Sigma x^2 = 61, \Sigma y^2 = 90

The formula for the correlation coefficient rr is:
r=nxyxy(nx2(x)2)(ny2(y)2)r = \frac{n \sum xy - \sum x \sum y}{\sqrt{(n \sum x^2 - (\sum x)^2)(n \sum y^2 - (\sum y)^2)}}

Given n=20n = 20:
r=20×5015×(6)(20×61152)(20×90(6)2)r = \frac{20 \times 50 - 15 \times (-6)}{\sqrt{(20 \times 61 - 15^2)(20 \times 90 - (-6)^2)}}
r=1000+90(1220225)(180036)r = \frac{1000 + 90}{\sqrt{(1220 - 225)(1800 - 36)}}
r=1090995×1764r = \frac{1090}{\sqrt{995 \times 1764}}
r=10901756680r = \frac{1090}{\sqrt{1756680}}
r=10901325.54r = \frac{1090}{1325.54}
r=0.82r = 0.82

5. The mean weekly sales of mobile phones in different mobile stores was 146.3 mobile phones per store. After an advertisement campaign the mean weekly sales of 22 stores, increased to 153.7 (for a typical week), and showed a standard deviation of 17.2. Verify the success of advertisement campaign, at 5% level of significance. (Given: t0.05,21=2.08t_{0.05,21} = 2.08). (5 marks)

We use the t-test to verify the success of the advertisement campaign.
The t-statistic is given by:
t=xˉμs/nt = \frac{\bar{x} - \mu}{s/\sqrt{n}}

Where:
xˉ\bar{x} = Sample mean = 153.7
μ\mu = Population mean = 146.3
s = Standard deviation = 17.2
n = Sample size = 22

Substitute the values:
t=153.7146.317.2/22t = \frac{153.7 - 146.3}{17.2/\sqrt{22}}
t=7.417.2/4.69t = \frac{7.4}{17.2/4.69}
t=7.43.67t = \frac{7.4}{3.67}
t=2.02t = 2.02

Since tcalculated=2.02t_{calculated} = 2.02 is less than tcritical=2.08t_{critical} = 2.08, we fail to reject the null hypothesis.
Hence, there is not enough evidence to verify the success of the advertisement campaign at 5% level of significance.

6. The mobile phone numbers are combinations of ten digits, 0 to 9. An observer analysed the mobile numbers in the contact list of his/her mobile phone and prepared a frequency distribution table of the digits (0 to 9), as given ahead:
DigitFrequency
099
1100
282
365
450
577
688
757
882
980
From the observed data (given above), she/he wants to test whether the digits occur with same frequency or not. (Given: α=0.05\alpha = 0.05, χ92(0.05)=16.918\chi^2_9(0.05) = 16.918). (5 marks)

The null hypothesis is that the digits occur with the same frequency.

We use the chi-square test for goodness of fit:
χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

Where:
\(O_i\) = Observed frequency
\(E_i\) = Expected frequency

For each digit, if they occur with the same frequency, the expected frequency Ei=80010=80E_i = \frac{800}{10} = 80

Calculating the chi-square value:
χ2=(9980)280+(10080)280+(8280)280+(6580)280+(5080)280+(7780)280+(8880)280+(5780)280+(8280)280+(8080)280\chi^2 = \frac{(99 - 80)^2}{80} + \frac{(100 - 80)^2}{80} + \frac{(82 - 80)^2}{80} + \frac{(65 - 80)^2}{80} + \frac{(50 - 80)^2}{80} + \frac{(77 - 80)^2}{80} + \frac{(88 - 80)^2}{80} + \frac{(57 - 80)^2}{80} + \frac{(82 - 80)^2}{80} + \frac{(80 - 80)^2}{80}

Calculating each term:
χ2=(19)280+(20)280+(2)280+(15)280+(30)280+(3)280+(8)280+(23)280+(2)280+(0)280\chi^2 = \frac{(19)^2}{80} + \frac{(20)^2}{80} + \frac{(2)^2}{80} + \frac{(-15)^2}{80} + \frac{(-30)^2}{80} + \frac{(-3)^2}{80} + \frac{(8)^2}{80} + \frac{(-23)^2}{80} + \frac{(2)^2}{80} + \frac{(0)^2}{80}
χ2=36180+40080+480+22580+90080+980+6480+52980+480+080\chi^2 = \frac{361}{80} + \frac{400}{80} + \frac{4}{80} + \frac{225}{80} + \frac{900}{80} + \frac{9}{80} + \frac{64}{80} + \frac{529}{80} + \frac{4}{80} + \frac{0}{80}
χ2=4.5125+5+0.05+2.8125+11.25+0.1125+0.8+6.6125+0.05+0\chi^2 = 4.5125 + 5 + 0.05 + 2.8125 + 11.25 + 0.1125 + 0.8 + 6.6125 + 0.05 + 0
χ2=31.2\chi^2 = 31.2

Since χ2=31.2\chi^2 = 31.2 is greater than χ92(0.05)=16.918\chi^2_9(0.05) = 16.918, we reject the null hypothesis.
Hence, the digits do not occur with the same frequency.

7. A survey study includes three villages V1V_1, V2V_2 and V3V_3 having 50000, 30000 and 40000 as respective populations. A stratified random sample is to be taken with a total sample size of n=500n = 500. Determine the sample size to be selected from each village individually using the method of
(i) proportional allocation and
(ii) optimal allocation.
From the previous survey, it is known that the standard deviations are S1=30S_1 = 30, S2=15S_2 = 15 and S3=20S_3 = 20. (10 marks)

(i) Proportional allocation:

For proportional allocation, the sample size for each village is proportional to its population.
ni=NiN×nn_i = \frac{N_i}{N} \times n

Where:
\(N_i\) = Population of village
\(N\) = Total population = 50000 + 30000 + 40000 = 120000
\(n\) = Total sample size = 500

For V1V_1:
n1=50000120000×500=512×500=208.33208n_1 = \frac{50000}{120000} \times 500 = \frac{5}{12} \times 500 = 208.33 \approx 208

For V2V_2:
n2=30000120000×500=14×500=125n_2 = \frac{30000}{120000} \times 500 = \frac{1}{4} \times 500 = 125

For V3V_3:
n3=40000120000×500=13×500=166.67167n_3 = \frac{40000}{120000} \times 500 = \frac{1}{3} \times 500 = 166.67 \approx 167

So, the sample sizes for proportional allocation are:
208 for V1V_1, 125 for V2V_2, and 167 for V3V_3

(ii) Optimal allocation:

For optimal allocation, the sample size for each village considers both the population size and the standard deviation.
ni=NiSi(NiSi)×nn_i = \frac{N_i S_i}{\sum (N_i S_i)} \times n

Where:
SiS_i = Standard deviation of village

Calculating the necessary sums:
N1S1=50000×30=1500000N_1 S_1 = 50000 \times 30 = 1500000
N2S2=30000×15=450000N_2 S_2 = 30000 \times 15 = 450000
N3S3=40000×20=800000N_3 S_3 = 40000 \times 20 = 800000
(NiSi)=1500000+450000+800000=2750000\sum (N_i S_i) = 1500000 + 450000 + 800000 = 2750000

For V1V_1:
n1=15000002750000×500=15002750×500=272.73273n_1 = \frac{1500000}{2750000} \times 500 = \frac{1500}{2750} \times 500 = 272.73 \approx 273

For V2V_2:
n2=4500002750000×500=4502750×500=81.8282n_2 = \frac{450000}{2750000} \times 500 = \frac{450}{2750} \times 500 = 81.82 \approx 82

For V3V_3:
n3=8000002750000×500=8002750×500=145.45145n_3 = \frac{800000}{2750000} \times 500 = \frac{800}{2750} \times 500 = 145.45 \approx 145

So, the sample sizes for optimal allocation are:
273 for V1V_1, 82 for V2V_2, and 145 for V3V_3

8. The data of 300 persons, according to hair colour and eye colour is shown below: (10 marks)
Hair ColourEye Colour
BlueGreyBrown
White301040
Brown402040
Black503040
Test the hypothesis that there is association between hair colour and eye colour at 5% level of significance. (Given that: χ0.05,42=9.49\chi^2_{0.05, 4} = 9.49)

The null hypothesis is that there is no association between hair colour and eye colour.

We use the chi-square test for independence:
χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

Where:
OiO_i = Observed frequency
EiE_i = Expected frequency

Calculating the expected frequencies:
Total for Blue: 120
Total for Grey: 60
Total for Brown: 120
Total for White: 80
Total for Brown: 100
Total for Black: 120
Total: 300

For example, EWhite,Blue=(Total for White)×(Total for Blue)Total=80×120300=32E_{White, Blue} = \frac{(Total\ for\ White)\times(Total\ for\ Blue)}{Total} = \frac{80 \times 120}{300} = 32
Similarly, calculate all expected frequencies.

Expected frequencies:
White, Blue: 32
White, Grey: 16
White, Brown: 32
Brown, Blue: 40
Brown, Grey: 20
Brown, Brown: 40
Black, Blue: 48
Black, Grey: 24
Black, Brown: 48

Calculating the chi-square value:
χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
χ2=(3032)232+(1016)216+(4032)232+(4040)240+(2020)220+(4040)240+(5048)248+(3024)224+(4048)248\chi^2 = \frac{(30 - 32)^2}{32} + \frac{(10 - 16)^2}{16} + \frac{(40 - 32)^2}{32} + \frac{(40 - 40)^2}{40} + \frac{(20 - 20)^2}{20} + \frac{(40 - 40)^2}{40} + \frac{(50 - 48)^2}{48} + \frac{(30 - 24)^2}{24} + \frac{(40 - 48)^2}{48}
χ2=432+3616+6432+0+0+0+448+3624+6448\chi^2 = \frac{4}{32} + \frac{36}{16} + \frac{64}{32} + 0 + 0 + 0 + \frac{4}{48} + \frac{36}{24} + \frac{64}{48}
χ2=0.125+2.25+2+0+0+0+0.083+1.5+1.33\chi^2 = 0.125 + 2.25 + 2 + 0 + 0 + 0 + 0.083 + 1.5 + 1.33
χ2=7.288\chi^2 = 7.288

Since χ2=7.288\chi^2 = 7.288 is less than χ0.05,42=9.49\chi^2_{0.05, 4} = 9.49, we fail to reject the null hypothesis.
Hence, there is no significant association between hair colour and eye colour at 5% level of significance.

9. An engineer identifies four ways that a job can be done. To determine how long it takes for an operator to do a job, when each of these ways (or methods) are used, the engineer asks 4 operators to do a job using method A, another 4 operators to do the same job using method B and so on. Data related to the performance of each operator (in seconds) is shown below: (10 marks)
ABCD
19182122
17162023
22151921
20141920
Construct the relevant analysis of variance table and test the hypothesis that the average time of all operators are equal at 1% level of significance. (Given that: F0.01(3,12)=5.95F_{0.01} (3, 12) = 5.95). (10 marks)

Answer:

To test the hypothesis that the average times of all operators are equal, we perform an ANOVA (Analysis of Variance) test.

Step 1: Calculate the means for each group.

For A:
XˉA=19+17+22+204=784=19.5\bar{X}_A = \frac{19 + 17 + 22 + 20}{4} = \frac{78}{4} = 19.5

For B:
XˉB=18+16+15+144=634=15.75\bar{X}_B = \frac{18 + 16 + 15 + 14}{4} = \frac{63}{4} = 15.75

For C:
XˉC=21+20+19+194=794=19.75\bar{X}_C = \frac{21 + 20 + 19 + 19}{4} = \frac{79}{4} = 19.75

For D:
XˉD=22+23+21+204=864=21.5\bar{X}_D = \frac{22 + 23 + 21 + 20}{4} = \frac{86}{4} = 21.5

Step 2: Calculate the overall mean.

Overall mean:
Xˉ=78+63+79+8616=30616=19.125\bar{X} = \frac{78 + 63 + 79 + 86}{16} = \frac{306}{16} = 19.125

Step 3: Calculate the Sum of Squares Between Groups (SSB).

SSB:
SSB=4[(19.519.125)2+(15.7519.125)2+(19.7519.125)2+(21.519.125)2]SSB = 4 \left[(19.5 - 19.125)^2 + (15.75 - 19.125)^2 + (19.75 - 19.125)^2 + (21.5 - 19.125)^2\right]
SSB=4[(0.375)2+(3.375)2+(0.625)2+(2.375)2]SSB = 4 \left[(0.375)^2 + (-3.375)^2 + (0.625)^2 + (2.375)^2\right]
SSB=4[0.140625+11.390625+0.390625+5.640625]SSB = 4 \left[0.140625 + 11.390625 + 0.390625 + 5.640625\right]
SSB=4×17.5625=70.25SSB = 4 \times 17.5625 = 70.25

Step 4: Calculate the Sum of Squares Within Groups (SSW).

SSW:
SSW=(XijXˉi)2SSW = \sum (X_{ij} - \bar{X}_i)^2
For A: (1919.5)2+(1719.5)2+(2219.5)2+(2019.5)2=0.25+6.25+6.25+0.25=13(19 - 19.5)^2 + (17 - 19.5)^2 + (22 - 19.5)^2 + (20 - 19.5)^2 = 0.25 + 6.25 + 6.25 + 0.25 = 13
For B: (1815.75)2+(1615.75)2+(1515.75)2+(1415.75)2=5.0625+0.0625+0.5625+3.0625=8.75(18 - 15.75)^2 + (16 - 15.75)^2 + (15 - 15.75)^2 + (14 - 15.75)^2 = 5.0625 + 0.0625 + 0.5625 + 3.0625 = 8.75
For C: (2119.75)2+(2019.75)2+(1919.75)2+(1919.75)2=1.5625+0.0625+0.5625+0.5625=2.75(21 - 19.75)^2 + (20 - 19.75)^2 + (19 - 19.75)^2 + (19 - 19.75)^2 = 1.5625 + 0.0625 + 0.5625 + 0.5625 = 2.75
For D: (2221.5)2+(2321.5)2+(2121.5)2+(2021.5)2=0.25+2.25+0.25+2.25=5(22 - 21.5)^2 + (23 - 21.5)^2 + (21 - 21.5)^2 + (20 - 21.5)^2 = 0.25 + 2.25 + 0.25 + 2.25 = 5
SSW=13+8.75+2.75+5=29.5SSW = 13 + 8.75 + 2.75 + 5 = 29.5

Step 5: Calculate the degrees of freedom.

Degrees of freedom between groups (dfB):
dfB=k1=41=3dfB = k - 1 = 4 - 1 = 3

Degrees of freedom within groups (dfW):
dfW=Nk=164=12dfW = N - k = 16 - 4 = 12

Step 6: Calculate the Mean Squares.

Mean Square Between Groups (MSB):
MSB=SSBdfB=70.253=23.4167MSB = \frac{SSB}{dfB} = \frac{70.25}{3} = 23.4167

Mean Square Within Groups (MSW):
MSW=SSWdfW=29.512=2.4583MSW = \frac{SSW}{dfW} = \frac{29.5}{12} = 2.4583

Step 7: Calculate the F-statistic.

F=MSBMSW=23.41672.4583=9.53F = \frac{MSB}{MSW} = \frac{23.4167}{2.4583} = 9.53

Since F=9.53F = 9.53 is greater than F0.01(3,12)=5.95F_{0.01} (3, 12) = 5.95, we reject the null hypothesis.
Hence, there is a significant difference in the average times taken by the operators using different methods at the 1% level of significance.

10. Write short notes on the following: 5×2=10 marks
  1. Control charts
  2. Stratified sampling
  3. Forecasting models

Answer:

a) Control charts:
Control charts are graphical tools used in quality control to monitor processes over time. They display process variation and help identify whether a process is stable or exhibits special cause variation. Common types include X-bar charts for monitoring the central tendency of a process and R charts for monitoring process dispersion. Control limits are set based on process variability to distinguish between common cause variation (within limits) and special cause variation (outside limits). Control charts aid in maintaining process stability and identifying opportunities for improvement.

b) Stratified sampling:
Stratified sampling involves dividing the population into distinct subgroups or strata based on certain characteristics. Samples are then randomly selected from each stratum in proportion to their representation in the population. This method ensures that each subgroup is adequately represented in the sample, leading to more accurate estimates for the entire population. It is particularly useful when the population exhibits heterogeneity, allowing for better precision and efficiency in estimation compared to simple random sampling.

c) Forecasting models:
Forecasting models are mathematical techniques used to predict future values based on past and present data. They are employed in various fields such as economics, finance, weather forecasting, and sales projections. Common types of forecasting models include time series models (e.g., ARIMAARIMA , exponential smoothing), causal models (e.g., regression analysis), and machine learning algorithms (e.g., neural networks). The choice of model depends on the nature of the data, the forecasting horizon, and the underlying assumptions about the relationships between variables. Forecasting models help businesses and organizations make informed decisions by providing insights into future trends and patterns.