Ask any Statistics/Probability/Math Question
WELCOME to Tutorteddy.com. This is a REAL site intended to help students in statistics courses. We function as online statistics tutor in a similar manner as a statistics class. online statistics help are provided to the students, researchers, etc. Our experts aid you to learn statistics and also give guidance to your homework and assignments. Statistics help provided by us will help you to learn the subject more precisely.
Graphing Distribution : Box Plots
The Objectives:
- What do you mean by hinges, step, H-spread, far out value, outside value, and adjacent value?
- Creation of box plot
- How are parallel box plots created?
- To examine whether a box plot is appropriate for a particular dataset.
Earlier we have discussed about some important graphical display such as histogram and frequency polygon. Now we are going to discuss and learn about another crucial graphical representation known as box plots. For comparison of distributions and outliers identification, box plot plays a very important role. Let us consider an example.
Example: The following table contains the number of errors in different processes per run.
Table 1: The number of errors in three different processes per run.
Process 1
|
11
|
16
|
18
|
24
|
29
|
36
|
41
|
55
|
62
|
72
|
Process 2
|
7
|
14
|
23
|
27
|
39
|
43
|
54
|
56
|
62
|
64
|
Process 3
|
5
|
9
|
16
|
26
|
28
|
36
|
48
|
51
|
56
|
60
|
At first we will learn how to create box plot and know about the basic terms. Then we will compare the number of errors in the three processes. The above requires three separate box plot for each process and such representation is also known as parallel box plots.
Construction of Box Plot:
There are a number of steps involved in the creation of box plots.
- Firstly, we need the 25th, 50th and 75th percentiles for drawing box plot. A box is drawn for each of the processes to represent the three percentiles, where boxes are drawn in such a way that it extends from the 25th to the 75th percentile. So, obviously the 50th percentile will lie in the middle of the box. So, the top of a box represents the 75th percentile, the middle line denotes the 50th percentile and the bottom of a box shows the 25th percentile
So, our first task will be to calculate the percentiles for the two processes. For process 1, process 2, and process 3, the percentiles are:
|
Process 1
|
Process 2
|
Process 3
|
25th percentile :
|
19.5
|
24
|
18.5
|
50th percentile :
|
32.5
|
41
|
32
|
75th percentile :
|
51.5
|
55.5
|
50.25
|
Fig 1: Box plots showing the three percentiles for the number of errors in the three processes
- Now let us discuss about the various terms that are involved in box plots.
Basic Terminologies:
- Lower hinge: It is the 25th percentile
- Upper hinge: It is the 75th percentile
- H-spread: It is defined as the difference between upper hinge and lower hinge.
- Step: This is 1.5 times the H-spread.
- Lower inner fence: It is defined as Lower hinge -1 Step
- Upper Inner fence: It is defined as Upper hinge+1 Step
- Lower outer fence: It is defined as Lower hinge -2 Steps
- Upper outer fence: It is defined as Upper hinge+2 Steps
- Lower adjacent: It is the smallest value above lower inner fence
- Outer adjacent: It is the largest value below upper inner fence
- Outside value: This is value which is not beyond the outer fence but is beyond the inner fence.
- Far Out value: The value that is beyond the outer fence.
The following table gives the values of the above defined terms
Table 2 : Values of the three processes
|
Process 1
|
Process 2
|
Process 3
|
Lower hinge
|
19.5
|
24
|
18.5
|
Upper hinge
|
51.5
|
55.5
|
50.25
|
H-spread
|
32
|
31.5
|
31.75
|
Step
|
48
|
47.25
|
47.625
|
Lower inner fence
|
-28.5
|
-23.25
|
-29.125
|
Upper inner fence
|
99.5
|
102.75
|
97.875
|
Lower outer fence
|
-76.5
|
-70.5
|
-76.75
|
Upper outer fence
|
147.5
|
150
|
145.5
|
Lower adjacent
|
11
|
7
|
5
|
Outer adjacent
|
72
|
64
|
60
|
- Now comes the whiskers which are on above and below of the box. Whiskers provide information about the spread of the dataset. Such whiskers are vertical lines ending in horizontal stroke. It extends from lower and upper hinge to lower and upper adjacent values. The following is the figure which shows the whiskers.
Fig 2: Box-plots with whiskers
- The outside and far out value are not represented through whiskers. However it can be represented by adding additional marks beyond whiskers. The outside values are denoted by small zero and far out value by asterisks. So, if there are any such values we plot them on our graph.
- Mean score can also be represented by a plus sign on the box plot.
Fig 3: Complete box plot with plus sign denoting mean
The above figure shows the summary of the data. We examine half the numbers of errors are between the hinges for all the three processes and the remaining lies either below the lower hinge or above the upper hinge. Also, the number of errors gradually decreases at the end of the run from process 1 to process 3 From figure 3 we see that the lower adjacent for process 1, process 2 and process 3 are 11, 7 and 5.The upper adjacent are 74, 62 and 60. The lower inner fence and outer inner fence can also be shown.
Thus box plots are quite useful in depicting the basic information of a distribution. When the whisker is longer in the positive direction then in the negative one, it would indicate positive skewness of the distribution. Also if mean is greater than the median it will also imply positive skewness. So, extreme values and differences or comparison between the distributions can be shown through box plots. However some things can’t be shown through a box plot,in such cases we use histogram or stem and leaf display.
Different Kinds of Box Plots:
A box plot can be represented in different manners. For instance , outliers may or may not be marked. The means can be shown either by plus sign or any other signs.Also in some, one may mention all the individual scores on the box –plot. The box can also differ in width that occurs when the size of the samples becomes different. The points of individual scores can also be jittered along a horizontal line.
Hence there are various rules of drawing a box plot. One should use the best one for different situations, so that the most important parts of the given dataset can be revealed through the box plot.
Fig 4: Box plot showing different values
Fig 5: Box plot showing different scores.
Thus statistics learning is quite important nowadays, it has a wide variety of applications in almost every field of study and research. So just contact us and we will provide you online statistic help.
|