Free Solved[June 2023] BCS40 - Statistical Techniques Question Paper

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

1. Write the median formula, and use it to complete the frequency distribution given below: (5 marks)
Class Interval (C.I.) Frequency
10-20 12
20-30 30
30-40 ?
40-50 65
50-60 ?
60-70 25
70-80 18
Given, the median value of 200 observations is 46.

Class Interval (C.I.)	Frequency
10-20	12
20-30	30
30-40	?
40-50	65
50-60	?
60-70	25
70-80	18

The median formula for a frequency distribution is:
$\text{Median} = L + \left( \frac{\frac{N}{2} - CF}{f} \right) \times h$

Where:
L = Lower boundary of the median class
N = Total frequency
CF = Cumulative frequency before the median class
f = Frequency of the median class
h = Class width

Given $N = 200$ , the median class is the class where the cumulative frequency reaches $\frac{N}{2} = 100$ . From the given data, the cumulative frequencies are:

10-20: 12
20-30: 42 (12 + 30)
30-40: ?
40-50: ?
50-60: ?
60-70: ?
70-80: ?

Since the median is 46, it falls in the class interval 40-50. Hence, this is our median class.

For class 40-50:
L = 40
CF = 42 + Frequency of class 30-40
f = 65
h = 10

Let's assume the frequency of class 30-40 is $f_3$ and the frequency of class 50-60 is $f_5$ .

From the median formula:
$46 = 40 + \left( \frac{100 - CF}{65} \right) \times 10$
Rearranging and solving for CF:
$46 - 40 = \left( \frac{100 - CF}{65} \right) \times 10$
$6 = \left( \frac{100 - CF}{65} \right) \times 10$
$\frac{60}{10} = \frac{100 - CF}{65}$
$6.5 \times 60 = 100 - CF$
$390 = 100 - CF$
$CF = 100 - 390$
$CF = -290$

So, the cumulative frequency before the median class is 42 + $f_3$ , and solving:
$42 + f_3 = 390$
$f_3 = 390 - 42$
$f_3 = 348$

2. Fit a linear trend $y = a + bx$ , to the data collected from a unit that manufactures shoes: (5 marks)
Month (x) Demand (y)
1 46
2 56
3 54
4 43
5 57
6 56

Month (x)	Demand (y)
1	46
2	56
3	54
4	43
5	57
6	56

The linear trend equation is $y = a + bx$ .

To find $a$ and $b$ :
$b = \frac{n \sum xy - \sum x \sum y}{n \sum x^2 - (\sum x)^2}$
$a = \frac{\sum y - b \sum x}{n}$

Let's calculate the necessary sums:
$\sum x = 1 + 2 + 3 + 4 + 5 + 6 = 21$
$\sum y = 46 + 56 + 54 + 43 + 57 + 56 = 312$
$\sum xy = (1 \times 46) + (2 \times 56) + (3 \times 54) + (4 \times 43) + (5 \times 57) + (6 \times 56) = 46 + 112 + 162 + 172 + 285 + 336 = 1113$
$\sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 = 1 + 4 + 9 + 16 + 25 + 36 = 91$

Now, substitute these values into the equations for $a$ and $b$ :

For $b$ :
$b = \frac{6 \times 1113 - 21 \times 312}{6 \times 91 - 21^2}$
$b = \frac{6678 - 6552}{546 - 441}$
$b = \frac{126}{105}$
$b = 1.2$

For $a$ :
$a = \frac{312 - 1.2 \times 21}{6}$
$a = \frac{312 - 25.2}{6}$
$a = \frac{286.8}{6}$
$a = 47.8$

Thus, the linear trend equation is:
$y = 47.8 + 1.2x$

3. If 2% of the items manufactured in a factory are defective, then find the probability that: (5 marks)

(i) There are 3 defectives in a sample of 100

(ii) There is no defective in a sample of 50.

The probability of finding $k$ defectives in a sample of size $n$ when the defect rate is $p$ is given by the binomial distribution formula:
$P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}$

Given: $p = 0.02$ , $n = 100$ for part (i), and $n = 50$ for part (ii).

For part (i), $k = 3$ :
$P(X = 3) = \binom{100}{3} (0.02)^3 (0.98)^{97}$

For part (ii), $k = 0$ :
$P(X = 0) = \binom{50}{0} (0.02)^0 (0.98)^{50}$
$P(X = 0) = (0.98)^{50}$

4. Calculate the correlation coefficient from the following data (determined from the 20 pairs of observations of two variables X and Y): (5 marks)

$\Sigma x = 15, \Sigma y = -6, \Sigma xy = 50, \Sigma x^2 = 61, \Sigma y^2 = 90$

The formula for the correlation coefficient $r$ is:
$r = \frac{n \sum xy - \sum x \sum y}{\sqrt{(n \sum x^2 - (\sum x)^2)(n \sum y^2 - (\sum y)^2)}}$

Given $n = 20$ :
$r = \frac{20 \times 50 - 15 \times (-6)}{\sqrt{(20 \times 61 - 15^2)(20 \times 90 - (-6)^2)}}$
$r = \frac{1000 + 90}{\sqrt{(1220 - 225)(1800 - 36)}}$
$r = \frac{1090}{\sqrt{995 \times 1764}}$
$r = \frac{1090}{\sqrt{1756680}}$
$r = \frac{1090}{1325.54}$
$r = 0.82$

5. The mean weekly sales of mobile phones in different mobile stores was 146.3 mobile phones per store. After an advertisement campaign the mean weekly sales of 22 stores, increased to 153.7 (for a typical week), and showed a standard deviation of 17.2. Verify the success of advertisement campaign, at 5% level of significance. (Given: $t_{0.05,21} = 2.08$ ). (5 marks)

We use the t-test to verify the success of the advertisement campaign.
The t-statistic is given by:
$t = \frac{\bar{x} - \mu}{s/\sqrt{n}}$

Where:
$\bar{x}$ = Sample mean = 153.7
$\mu$ = Population mean = 146.3
s = Standard deviation = 17.2
n = Sample size = 22

Substitute the values:
$t = \frac{153.7 - 146.3}{17.2/\sqrt{22}}$
$t = \frac{7.4}{17.2/4.69}$
$t = \frac{7.4}{3.67}$
$t = 2.02$

Since $t_{calculated} = 2.02$ is less than $t_{critical} = 2.08$ , we fail to reject the null hypothesis.
Hence, there is not enough evidence to verify the success of the advertisement campaign at 5% level of significance.

6. The mobile phone numbers are combinations of ten digits, 0 to 9. An observer analysed the mobile numbers in the contact list of his/her mobile phone and prepared a frequency distribution table of the digits (0 to 9), as given ahead:
Digit Frequency
0 99
1 100
2 82
3 65
4 50
5 77
6 88
7 57
8 82
9 80
From the observed data (given above), she/he wants to test whether the digits occur with same frequency or not. (Given: $\alpha = 0.05$ , $\chi^2_9(0.05) = 16.918$ ). (5 marks)

Digit	Frequency
0	99
1	100
2	82
3	65
4	50
5	77
6	88
7	57
8	82
9	80

The null hypothesis is that the digits occur with the same frequency.

We use the chi-square test for goodness of fit:
$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

Where:
$O_i$ = Observed frequency
$E_i$ = Expected frequency

For each digit, if they occur with the same frequency, the expected frequency $E_i = \frac{800}{10} = 80$

Calculating the chi-square value:
$\chi^2 = \frac{(99 - 80)^2}{80} + \frac{(100 - 80)^2}{80} + \frac{(82 - 80)^2}{80} + \frac{(65 - 80)^2}{80} + \frac{(50 - 80)^2}{80} + \frac{(77 - 80)^2}{80} + \frac{(88 - 80)^2}{80} + \frac{(57 - 80)^2}{80} + \frac{(82 - 80)^2}{80} + \frac{(80 - 80)^2}{80}$

Calculating each term:
$\chi^2 = \frac{(19)^2}{80} + \frac{(20)^2}{80} + \frac{(2)^2}{80} + \frac{(-15)^2}{80} + \frac{(-30)^2}{80} + \frac{(-3)^2}{80} + \frac{(8)^2}{80} + \frac{(-23)^2}{80} + \frac{(2)^2}{80} + \frac{(0)^2}{80}$
$\chi^2 = \frac{361}{80} + \frac{400}{80} + \frac{4}{80} + \frac{225}{80} + \frac{900}{80} + \frac{9}{80} + \frac{64}{80} + \frac{529}{80} + \frac{4}{80} + \frac{0}{80}$
$\chi^2 = 4.5125 + 5 + 0.05 + 2.8125 + 11.25 + 0.1125 + 0.8 + 6.6125 + 0.05 + 0$
$\chi^2 = 31.2$

Since $\chi^2 = 31.2$ is greater than $\chi^2_9(0.05) = 16.918$ , we reject the null hypothesis.
Hence, the digits do not occur with the same frequency.

7. A survey study includes three villages $V_1$ , $V_2$ and $V_3$ having 50000, 30000 and 40000 as respective populations. A stratified random sample is to be taken with a total sample size of $n = 500$ . Determine the sample size to be selected from each village individually using the method of
(i) proportional allocation and
(ii) optimal allocation.
From the previous survey, it is known that the standard deviations are $S_1 = 30$ , $S_2 = 15$ and $S_3 = 20$ . (10 marks)

(i) Proportional allocation:

For proportional allocation, the sample size for each village is proportional to its population.
$n_i = \frac{N_i}{N} \times n$

Where:
$$N_i$$ = Population of village
$$N$$ = Total population = 50000 + 30000 + 40000 = 120000
$$n$$ = Total sample size = 500

For $V_1$ :
$n_1 = \frac{50000}{120000} \times 500 = \frac{5}{12} \times 500 = 208.33 \approx 208$

For $V_2$ :
$n_2 = \frac{30000}{120000} \times 500 = \frac{1}{4} \times 500 = 125$

For $V_3$ :
$n_3 = \frac{40000}{120000} \times 500 = \frac{1}{3} \times 500 = 166.67 \approx 167$

So, the sample sizes for proportional allocation are:
208 for $V_1$ , 125 for $V_2$ , and 167 for $V_3$

(ii) Optimal allocation:

For optimal allocation, the sample size for each village considers both the population size and the standard deviation.
$n_i = \frac{N_i S_i}{\sum (N_i S_i)} \times n$

Where:
$S_i$ = Standard deviation of village

Calculating the necessary sums:
$N_1 S_1 = 50000 \times 30 = 1500000$
$N_2 S_2 = 30000 \times 15 = 450000$
$N_3 S_3 = 40000 \times 20 = 800000$
$\sum (N_i S_i) = 1500000 + 450000 + 800000 = 2750000$

For $V_1$ :
$n_1 = \frac{1500000}{2750000} \times 500 = \frac{1500}{2750} \times 500 = 272.73 \approx 273$

For $V_2$ :
$n_2 = \frac{450000}{2750000} \times 500 = \frac{450}{2750} \times 500 = 81.82 \approx 82$

For $V_3$ :
$n_3 = \frac{800000}{2750000} \times 500 = \frac{800}{2750} \times 500 = 145.45 \approx 145$

So, the sample sizes for optimal allocation are:
273 for $V_1$ , 82 for $V_2$ , and 145 for $V_3$

8. The data of 300 persons, according to hair colour and eye colour is shown below: (10 marks)
Hair Colour Eye Colour
Blue Grey Brown
White 30 10 40
Brown 40 20 40
Black 50 30 40
Test the hypothesis that there is association between hair colour and eye colour at 5% level of significance. (Given that: $\chi^2_{0.05, 4} = 9.49$ )

Hair Colour	Eye Colour
Blue	Grey	Brown
White	30	10	40
Brown	40	20	40
Black	50	30	40

The null hypothesis is that there is no association between hair colour and eye colour.

We use the chi-square test for independence:
$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

Where:
$O_i$ = Observed frequency
$E_i$ = Expected frequency

Calculating the expected frequencies:
Total for Blue: 120
Total for Grey: 60
Total for Brown: 120
Total for White: 80
Total for Brown: 100
Total for Black: 120
Total: 300

For example, $E_{White, Blue} = \frac{(Total\ for\ White)\times(Total\ for\ Blue)}{Total} = \frac{80 \times 120}{300} = 32$
Similarly, calculate all expected frequencies.

Expected frequencies:
White, Blue: 32
White, Grey: 16
White, Brown: 32
Brown, Blue: 40
Brown, Grey: 20
Brown, Brown: 40
Black, Blue: 48
Black, Grey: 24
Black, Brown: 48

Calculating the chi-square value:
$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$
$\chi^2 = \frac{(30 - 32)^2}{32} + \frac{(10 - 16)^2}{16} + \frac{(40 - 32)^2}{32} + \frac{(40 - 40)^2}{40} + \frac{(20 - 20)^2}{20} + \frac{(40 - 40)^2}{40} + \frac{(50 - 48)^2}{48} + \frac{(30 - 24)^2}{24} + \frac{(40 - 48)^2}{48}$
$\chi^2 = \frac{4}{32} + \frac{36}{16} + \frac{64}{32} + 0 + 0 + 0 + \frac{4}{48} + \frac{36}{24} + \frac{64}{48}$
$\chi^2 = 0.125 + 2.25 + 2 + 0 + 0 + 0 + 0.083 + 1.5 + 1.33$
$\chi^2 = 7.288$

Since $\chi^2 = 7.288$ is less than $\chi^2_{0.05, 4} = 9.49$ , we fail to reject the null hypothesis.
Hence, there is no significant association between hair colour and eye colour at 5% level of significance.

9. An engineer identifies four ways that a job can be done. To determine how long it takes for an operator to do a job, when each of these ways (or methods) are used, the engineer asks 4 operators to do a job using method A, another 4 operators to do the same job using method B and so on. Data related to the performance of each operator (in seconds) is shown below: (10 marks)
A B C D
19 18 21 22
17 16 20 23
22 15 19 21
20 14 19 20
Construct the relevant analysis of variance table and test the hypothesis that the average time of all operators are equal at 1% level of significance. (Given that: $F_{0.01} (3, 12) = 5.95$ ). (10 marks)

A	B	C	D
19	18	21	22
17	16	20	23
22	15	19	21
20	14	19	20

Answer:

To test the hypothesis that the average times of all operators are equal, we perform an ANOVA (Analysis of Variance) test.

Step 1: Calculate the means for each group.

For A:
$\bar{X}_A = \frac{19 + 17 + 22 + 20}{4} = \frac{78}{4} = 19.5$

For B:
$\bar{X}_B = \frac{18 + 16 + 15 + 14}{4} = \frac{63}{4} = 15.75$

For C:
$\bar{X}_C = \frac{21 + 20 + 19 + 19}{4} = \frac{79}{4} = 19.75$

For D:
$\bar{X}_D = \frac{22 + 23 + 21 + 20}{4} = \frac{86}{4} = 21.5$

Step 2: Calculate the overall mean.

Overall mean:
$\bar{X} = \frac{78 + 63 + 79 + 86}{16} = \frac{306}{16} = 19.125$

Step 3: Calculate the Sum of Squares Between Groups (SSB).

SSB:
$SSB = 4 \left[(19.5 - 19.125)^2 + (15.75 - 19.125)^2 + (19.75 - 19.125)^2 + (21.5 - 19.125)^2\right]$
$SSB = 4 \left[(0.375)^2 + (-3.375)^2 + (0.625)^2 + (2.375)^2\right]$
$SSB = 4 \left[0.140625 + 11.390625 + 0.390625 + 5.640625\right]$
$SSB = 4 \times 17.5625 = 70.25$

Step 4: Calculate the Sum of Squares Within Groups (SSW).

SSW:
$SSW = \sum (X_{ij} - \bar{X}_i)^2$
For A: $(19 - 19.5)^2 + (17 - 19.5)^2 + (22 - 19.5)^2 + (20 - 19.5)^2 = 0.25 + 6.25 + 6.25 + 0.25 = 13$
For B: $(18 - 15.75)^2 + (16 - 15.75)^2 + (15 - 15.75)^2 + (14 - 15.75)^2 = 5.0625 + 0.0625 + 0.5625 + 3.0625 = 8.75$
For C: $(21 - 19.75)^2 + (20 - 19.75)^2 + (19 - 19.75)^2 + (19 - 19.75)^2 = 1.5625 + 0.0625 + 0.5625 + 0.5625 = 2.75$
For D: $(22 - 21.5)^2 + (23 - 21.5)^2 + (21 - 21.5)^2 + (20 - 21.5)^2 = 0.25 + 2.25 + 0.25 + 2.25 = 5$
$SSW = 13 + 8.75 + 2.75 + 5 = 29.5$

Step 5: Calculate the degrees of freedom.

Degrees of freedom between groups (dfB):
$dfB = k - 1 = 4 - 1 = 3$

Degrees of freedom within groups (dfW):
$dfW = N - k = 16 - 4 = 12$

Step 6: Calculate the Mean Squares.

Mean Square Between Groups (MSB):
$MSB = \frac{SSB}{dfB} = \frac{70.25}{3} = 23.4167$

Mean Square Within Groups (MSW):
$MSW = \frac{SSW}{dfW} = \frac{29.5}{12} = 2.4583$

Step 7: Calculate the F-statistic.

$F = \frac{MSB}{MSW} = \frac{23.4167}{2.4583} = 9.53$

Since $F = 9.53$ is greater than $F_{0.01} (3, 12) = 5.95$ , we reject the null hypothesis.
Hence, there is a significant difference in the average times taken by the operators using different methods at the 1% level of significance.

10. Write short notes on the following: 5×2=10 marks
Control charts
Stratified sampling
Forecasting models

Answer:

a) Control charts:
Control charts are graphical tools used in quality control to monitor processes over time. They display process variation and help identify whether a process is stable or exhibits special cause variation. Common types include X-bar charts for monitoring the central tendency of a process and R charts for monitoring process dispersion. Control limits are set based on process variability to distinguish between common cause variation (within limits) and special cause variation (outside limits). Control charts aid in maintaining process stability and identifying opportunities for improvement.

b) Stratified sampling:
Stratified sampling involves dividing the population into distinct subgroups or strata based on certain characteristics. Samples are then randomly selected from each stratum in proportion to their representation in the population. This method ensures that each subgroup is adequately represented in the sample, leading to more accurate estimates for the entire population. It is particularly useful when the population exhibits heterogeneity, allowing for better precision and efficiency in estimation compared to simple random sampling.

c) Forecasting models:
Forecasting models are mathematical techniques used to predict future values based on past and present data. They are employed in various fields such as economics, finance, weather forecasting, and sales projections. Common types of forecasting models include time series models (e.g., $ARIMA$ , exponential smoothing), causal models (e.g., regression analysis), and machine learning algorithms (e.g., neural networks). The choice of model depends on the nature of the data, the forecasting horizon, and the underlying assumptions about the relationships between variables. Forecasting models help businesses and organizations make informed decisions by providing insights into future trends and patterns.

Free Solved[June 2023] BCS40 - Statistical Techniques Question Paper

1. Write the median formula, and use it to complete the frequency distribution given below: (5 marks)Class Interval (C.I.)Frequency10-201220-303030-40?40-506550-60?60-702570-8018Given, the median value of 200 observations is 46.

2. Fit a linear trend y=a+bxy = a + bxy=a+bx, to the data collected from a unit that manufactures shoes: (5 marks)Month (x)Demand (y)146256354443557656

3. If 2% of the items manufactured in a factory are defective, then find the probability that: (5 marks)

4. Calculate the correlation coefficient from the following data (determined from the 20 pairs of observations of two variables X and Y): (5 marks)

10. Write short notes on the following: 5×2=10 marksControl chartsStratified samplingForecasting models

1. Write the median formula, and use it to complete the frequency distribution given below: (5 marks)
Class Interval (C.I.) Frequency
10-20 12
20-30 30
30-40 ?
40-50 65
50-60 ?
60-70 25
70-80 18
Given, the median value of 200 observations is 46.

2. Fit a linear trend $y = a + bx$ , to the data collected from a unit that manufactures shoes: (5 marks)
Month (x) Demand (y)
1 46
2 56
3 54
4 43
5 57
6 56

10. Write short notes on the following: 5×2=10 marks
Control charts
Stratified sampling
Forecasting models