Free Solved[June 2023] BCS40 - Statistical Techniques Question Paper
Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!
1. Write the median formula, and use it to complete the frequency distribution given below: (5 marks)
Class Interval (C.I.)
Frequency
10-20
12
20-30
30
30-40
?
40-50
65
50-60
?
60-70
25
70-80
18
Given, the median value of 200 observations is 46.
The median formula for a frequency distribution is: Median=L+(f2N−CF)×h
Where: L = Lower boundary of the median class N = Total frequency CF = Cumulative frequency before the median class f = Frequency of the median class h = Class width
Given N=200, the median class is the class where the cumulative frequency reaches 2N=100. From the given data, the cumulative frequencies are:
Since the median is 46, it falls in the class interval 40-50. Hence, this is our median class.
For class 40-50: L = 40 CF = 42 + Frequency of class 30-40 f = 65 h = 10
Let's assume the frequency of class 30-40 is f3 and the frequency of class 50-60 is f5.
From the median formula: 46=40+(65100−CF)×10 Rearranging and solving for CF: 46−40=(65100−CF)×10 6=(65100−CF)×10 1060=65100−CF 6.5×60=100−CF 390=100−CF CF=100−390 CF=−290
So, the cumulative frequency before the median class is 42 + f3, and solving: 42+f3=390 f3=390−42 f3=348
2. Fit a linear trend y=a+bx, to the data collected from a unit that manufactures shoes: (5 marks)
Month (x)
Demand (y)
1
46
2
56
3
54
4
43
5
57
6
56
The linear trend equation is y=a+bx.
To find a and b: b=n∑x2−(∑x)2n∑xy−∑x∑y a=n∑y−b∑x
Let's calculate the necessary sums: ∑x=1+2+3+4+5+6=21 ∑y=46+56+54+43+57+56=312 ∑xy=(1×46)+(2×56)+(3×54)+(4×43)+(5×57)+(6×56)=46+112+162+172+285+336=1113 ∑x2=12+22+32+42+52+62=1+4+9+16+25+36=91
Now, substitute these values into the equations for a and b:
For b: b=6×91−2126×1113−21×312 b=546−4416678−6552 b=105126 b=1.2
For a: a=6312−1.2×21 a=6312−25.2 a=6286.8 a=47.8
Thus, the linear trend equation is: y=47.8+1.2x
3. If 2% of the items manufactured in a factory are defective, then find the probability that: (5 marks)
(i) There are 3 defectives in a sample of 100
(ii) There is no defective in a sample of 50.
The probability of finding k defectives in a sample of size n when the defect rate is p is given by the binomial distribution formula: P(X=k)=(kn)pk(1−p)n−k
Given: p=0.02, n=100 for part (i), and n=50 for part (ii).
For part (i), k=3: P(X=3)=(3100)(0.02)3(0.98)97
For part (ii), k=0: P(X=0)=(050)(0.02)0(0.98)50 P(X=0)=(0.98)50
4. Calculate the correlation coefficient from the following data (determined from the 20 pairs of observations of two variables X and Y): (5 marks)
Σx=15,Σy=−6,Σxy=50,Σx2=61,Σy2=90
The formula for the correlation coefficient r is: r=(n∑x2−(∑x)2)(n∑y2−(∑y)2)n∑xy−∑x∑y
Given n=20: r=(20×61−152)(20×90−(−6)2)20×50−15×(−6) r=(1220−225)(1800−36)1000+90 r=995×17641090 r=17566801090 r=1325.541090 r=0.82
5. The mean weekly sales of mobile phones in different mobile stores was 146.3 mobile phones per store. After an advertisement campaign the mean weekly sales of 22 stores, increased to 153.7 (for a typical week), and showed a standard deviation of 17.2. Verify the success of advertisement campaign, at 5% level of significance. (Given: t0.05,21=2.08). (5 marks)
We use the t-test to verify the success of the advertisement campaign. The t-statistic is given by: t=s/nxˉ−μ
Where: xˉ = Sample mean = 153.7 μ = Population mean = 146.3 s = Standard deviation = 17.2 n = Sample size = 22
Substitute the values: t=17.2/22153.7−146.3 t=17.2/4.697.4 t=3.677.4 t=2.02
Since tcalculated=2.02 is less than tcritical=2.08, we fail to reject the null hypothesis. Hence, there is not enough evidence to verify the success of the advertisement campaign at 5% level of significance.
6. The mobile phone numbers are combinations of ten digits, 0 to 9. An observer analysed the mobile numbers in the contact list of his/her mobile phone and prepared a frequency distribution table of the digits (0 to 9), as given ahead:
Digit
Frequency
0
99
1
100
2
82
3
65
4
50
5
77
6
88
7
57
8
82
9
80
From the observed data (given above), she/he wants to test whether the digits occur with same frequency or not. (Given: α=0.05, χ92(0.05)=16.918). (5 marks)
The null hypothesis is that the digits occur with the same frequency.
We use the chi-square test for goodness of fit: χ2=∑Ei(Oi−Ei)2
Where: \(O_i\) = Observed frequency \(E_i\) = Expected frequency
For each digit, if they occur with the same frequency, the expected frequency Ei=10800=80
Calculating the chi-square value: χ2=80(99−80)2+80(100−80)2+80(82−80)2+80(65−80)2+80(50−80)2+80(77−80)2+80(88−80)2+80(57−80)2+80(82−80)2+80(80−80)2
Calculating each term: χ2=80(19)2+80(20)2+80(2)2+80(−15)2+80(−30)2+80(−3)2+80(8)2+80(−23)2+80(2)2+80(0)2 χ2=80361+80400+804+80225+80900+809+8064+80529+804+800 χ2=4.5125+5+0.05+2.8125+11.25+0.1125+0.8+6.6125+0.05+0 χ2=31.2
Since χ2=31.2 is greater than χ92(0.05)=16.918, we reject the null hypothesis. Hence, the digits do not occur with the same frequency.
7. A survey study includes three villages V1, V2 and V3 having 50000, 30000 and 40000 as respective populations. A stratified random sample is to be taken with a total sample size of n=500. Determine the sample size to be selected from each village individually using the method of (i) proportional allocation and (ii) optimal allocation. From the previous survey, it is known that the standard deviations are S1=30, S2=15 and S3=20. (10 marks)
(i) Proportional allocation:
For proportional allocation, the sample size for each village is proportional to its population. ni=NNi×n
Where: \(N_i\) = Population of village \(N\) = Total population = 50000 + 30000 + 40000 = 120000 \(n\) = Total sample size = 500
For V1: n1=12000050000×500=125×500=208.33≈208
For V2: n2=12000030000×500=41×500=125
For V3: n3=12000040000×500=31×500=166.67≈167
So, the sample sizes for proportional allocation are: 208 for V1, 125 for V2, and 167 for V3
(ii) Optimal allocation:
For optimal allocation, the sample size for each village considers both the population size and the standard deviation. ni=∑(NiSi)NiSi×n
Where: Si = Standard deviation of village
Calculating the necessary sums: N1S1=50000×30=1500000 N2S2=30000×15=450000 N3S3=40000×20=800000 ∑(NiSi)=1500000+450000+800000=2750000
For V1: n1=27500001500000×500=27501500×500=272.73≈273
For V2: n2=2750000450000×500=2750450×500=81.82≈82
For V3: n3=2750000800000×500=2750800×500=145.45≈145
So, the sample sizes for optimal allocation are: 273 for V1, 82 for V2, and 145 for V3
8. The data of 300 persons, according to hair colour and eye colour is shown below: (10 marks)
Hair Colour
Eye Colour
Blue
Grey
Brown
White
30
10
40
Brown
40
20
40
Black
50
30
40
Test the hypothesis that there is association between hair colour and eye colour at 5% level of significance. (Given that: χ0.05,42=9.49)
The null hypothesis is that there is no association between hair colour and eye colour.
We use the chi-square test for independence: χ2=∑Ei(Oi−Ei)2
Where: Oi = Observed frequency Ei = Expected frequency
Calculating the expected frequencies: Total for Blue: 120 Total for Grey: 60 Total for Brown: 120 Total for White: 80 Total for Brown: 100 Total for Black: 120 Total: 300
For example, EWhite,Blue=Total(TotalforWhite)×(TotalforBlue)=30080×120=32 Similarly, calculate all expected frequencies.
Calculating the chi-square value: χ2=∑Ei(Oi−Ei)2 χ2=32(30−32)2+16(10−16)2+32(40−32)2+40(40−40)2+20(20−20)2+40(40−40)2+48(50−48)2+24(30−24)2+48(40−48)2 χ2=324+1636+3264+0+0+0+484+2436+4864 χ2=0.125+2.25+2+0+0+0+0.083+1.5+1.33 χ2=7.288
Since χ2=7.288 is less than χ0.05,42=9.49, we fail to reject the null hypothesis. Hence, there is no significant association between hair colour and eye colour at 5% level of significance.
9. An engineer identifies four ways that a job can be done. To determine how long it takes for an operator to do a job, when each of these ways (or methods) are used, the engineer asks 4 operators to do a job using method A, another 4 operators to do the same job using method B and so on. Data related to the performance of each operator (in seconds) is shown below: (10 marks)
A
B
C
D
19
18
21
22
17
16
20
23
22
15
19
21
20
14
19
20
Construct the relevant analysis of variance table and test the hypothesis that the average time of all operators are equal at 1% level of significance. (Given that: F0.01(3,12)=5.95). (10 marks)
Answer:
To test the hypothesis that the average times of all operators are equal, we perform an ANOVA (Analysis of Variance) test.
Step 1: Calculate the means for each group.
For A: XˉA=419+17+22+20=478=19.5
For B: XˉB=418+16+15+14=463=15.75
For C: XˉC=421+20+19+19=479=19.75
For D: XˉD=422+23+21+20=486=21.5
Step 2: Calculate the overall mean.
Overall mean: Xˉ=1678+63+79+86=16306=19.125
Step 3: Calculate the Sum of Squares Between Groups (SSB).
Step 4: Calculate the Sum of Squares Within Groups (SSW).
SSW: SSW=∑(Xij−Xˉi)2 For A: (19−19.5)2+(17−19.5)2+(22−19.5)2+(20−19.5)2=0.25+6.25+6.25+0.25=13 For B: (18−15.75)2+(16−15.75)2+(15−15.75)2+(14−15.75)2=5.0625+0.0625+0.5625+3.0625=8.75 For C: (21−19.75)2+(20−19.75)2+(19−19.75)2+(19−19.75)2=1.5625+0.0625+0.5625+0.5625=2.75 For D: (22−21.5)2+(23−21.5)2+(21−21.5)2+(20−21.5)2=0.25+2.25+0.25+2.25=5 SSW=13+8.75+2.75+5=29.5
Step 5: Calculate the degrees of freedom.
Degrees of freedom between groups (dfB): dfB=k−1=4−1=3
Degrees of freedom within groups (dfW): dfW=N−k=16−4=12
Step 6: Calculate the Mean Squares.
Mean Square Between Groups (MSB): MSB=dfBSSB=370.25=23.4167
Mean Square Within Groups (MSW): MSW=dfWSSW=1229.5=2.4583
Step 7: Calculate the F-statistic.
F=MSWMSB=2.458323.4167=9.53
Since F=9.53 is greater than F0.01(3,12)=5.95, we reject the null hypothesis. Hence, there is a significant difference in the average times taken by the operators using different methods at the 1% level of significance.
10. Write short notes on the following: 5×2=10 marks
Control charts
Stratified sampling
Forecasting models
Answer:
a) Control charts: Control charts are graphical tools used in quality control to monitor processes over time. They display process variation and help identify whether a process is stable or exhibits special cause variation. Common types include X-bar charts for monitoring the central tendency of a process and R charts for monitoring process dispersion. Control limits are set based on process variability to distinguish between common cause variation (within limits) and special cause variation (outside limits). Control charts aid in maintaining process stability and identifying opportunities for improvement.
b) Stratified sampling: Stratified sampling involves dividing the population into distinct subgroups or strata based on certain characteristics. Samples are then randomly selected from each stratum in proportion to their representation in the population. This method ensures that each subgroup is adequately represented in the sample, leading to more accurate estimates for the entire population. It is particularly useful when the population exhibits heterogeneity, allowing for better precision and efficiency in estimation compared to simple random sampling.
c) Forecasting models: Forecasting models are mathematical techniques used to predict future values based on past and present data. They are employed in various fields such as economics, finance, weather forecasting, and sales projections. Common types of forecasting models include time series models (e.g., ARIMA , exponential smoothing), causal models (e.g., regression analysis), and machine learning algorithms (e.g., neural networks). The choice of model depends on the nature of the data, the forecasting horizon, and the underlying assumptions about the relationships between variables. Forecasting models help businesses and organizations make informed decisions by providing insights into future trends and patterns.