Formation of frequency distribution
Definition :
The formation of a frequency distribution involves the organization of raw data into classes or categories and then determining the frequency or number of observations that fall into each class. The purpose of a frequency distribution is to provide a summary of the data in a concise and meaningful way, allowing for easy interpretation and analysis.
Objectives :
- To facilitate the analysis of data
- To estimate frequencies of the unknown population distribution from the distribution of sample data
- To facilitate the computation of various statistical measures.
Types :
- Univariate Frequency Distribution.
- Bivariate Frequency Distribution.
1.Univariate Frequency Distribution: A univariate frequency distribution is a statistical method used to display the number of occurrences of a single variable in a dataset. It summarizes the data by grouping values into intervals and displaying the frequency of each interval. The Univariate frequency distribution is further classified into three categories:
- Series of individual observations
- Discrete frequency distribution
- Continuous frequency distribution
(i) Series of individual observations: A series of individual observations refers to a collection of separate data points, each representing a unique instance or event.
(ii) Discrete frequency distribution: In a discrete frequency distribution, we count how many times each value of a variable occurs in a set of data. This is done by using tally bars, which make it easy to keep track of the frequency of each value.
(iii) Continuous frequency distribution: When using continuous frequency distribution, if the identity of individual units is not important and the order of observations doesn't matter, the data is divided into groups or classes and the number of observations in each group is recorded as the first step of condensation.
(ii) Discrete frequency distribution: In a discrete frequency distribution, we count how many times each value of a variable occurs in a set of data. This is done by using tally bars, which make it easy to keep track of the frequency of each value.
(iii) Continuous frequency distribution: When using continuous frequency distribution, if the identity of individual units is not important and the order of observations doesn't matter, the data is divided into groups or classes and the number of observations in each group is recorded as the first step of condensation.
2.Bivariate Frequency Distribution: Bivariate frequency distribution is a type of statistical distribution that shows the relationship between two variables. Specifically, it displays the frequency or number of occurrences of each possible combination of values for the two variables being studied.
Principles for Constructing Frequency Distributions:
While there are no strict rules for constructing frequency distributions, statisticians should use their experience and judgment to create an appropriate classification of the data. Some guidelines to consider include identifying an appropriate number of classes, ensuring that each class is mutually exclusive and exhaustive, and using intervals that are of equal width. Ultimately, the goal is to create a frequency distribution that accurately represents the data while being easy to interpret and analyze.
Guidelines to Construct a Frequency Distribution:
- Type of classes: The classes should be clearly defined and exhaustive with no ambiguity, ensuring that any value of the variable corresponds to only one class.
- Optimal Number of Classes: The choice of the number of classes should be based on the total frequency, the nature of the data, the desired accuracy, and the computational convenience. The number of classes should not be too small or too large, to avoid obscuring important features of the data or rendering the distribution unwieldy.
- Size of Class Intervals: The size of the class interval in a grouped frequency distribution is determined by the subjective judgment of the statistician and is inversely proportional to the number of classes. A smaller interval size will result in more classes and greater detail, while a larger interval size will result in fewer classes and less detail.
- Class Boundaries: In a grouped frequency distribution with gaps between class limits, a correction factor for continuity must be applied to convert the data into a continuous distribution. This results in new classes of exclusive type with lower and upper class limits known as class boundaries.
- Mid-value or Class Mark: The mid-value or class mark is the value of a variable that is exactly at the middle of the class, obtained by dividing the sum of the upper and lower class limits by 2. The class limits should be selected to ensure even distribution of observations throughout the class interval.
- Open-End Classes: Open-end classes are those where the lower limit of the first class, the upper limit of the last class, or both are not specified. Such classes are classified as open-end and should be avoided when possible.