## Statistics

Statistics are everywhere, and in this course we will learn methods of analyzing these statistics we are bombarded with.

You are bombarded with data and statistics every day. In fact, 90% of the world's data has been created in the last 2 years, and we produce 2.5 quintillion bytes of data every day! Data appears on your television and computer screens, in advertisements, on radio news reports, in newspapers and magazines, and on websites. You have to deal with streams of data at work, and then again when you get home. The ability to assess the accuracy and relevance of data is one of the most important skills to possess in the Information Age. Sources for statistics: https://www.mediapost.com/publications/article/291358/90-of-todays-data-created-in-two-years.html

## Data

What is data? How is it classified?

Data can be defined as facts. It may be numerical or not. Data can be collected on almost anything. The object you are collecting the data about is called a ‘variable’ (since the observed value can vary). For example, the characteristic or ‘variable’ being studied could be height; which is numerical or ‘quantitative’. Or the variable could be hair colour which is ‘qualitative’ or not numerical.

### Quantitative vs Qualitative is One Way to Categorize Types of Data

Some examples:

Quantitative (or numerical) data

Qualitative (or categorical) data

• the age of a person
• the weight of a person
• the population of a city
• the time it takes to travel to work
• whether someone is left or right handed
• a person’s favourite colour
• the type of car someone wants to buy
• first language spoken

Qualitative data can be further classified as ‘ordinal’ if it can be ranked (e.g., poor, fair, good, very good), or ‘nominal’ if it cannot be ranked (e.g., eye colour).

Quantitative (i.e., numerical) values can be further classified as discrete or continuous.

Discrete:
whole numbers only (fractions or decimals not possible) example: number of children in a family

Continuous:
whole numbers, fractions or decimals example: temperature outside of ­10.5 degrees

Quantitative or Qualitative variable? If quantitative, is it a discrete or a continuous variable?

a) height of a bicycle

b) age of a cat to the nearest year

c) volume of juice in a can

d) names of countries a person has visited

e) letter grade in a course

f) eye colour

g) how someone feels about the current government

h) whether someone drinks coffee, tea, both or neither in the morning

i) the number of cars you can see from your window

j) whether you have ever gone fishing

## Sources of Data

Part of working with data involves assessing the source, and reliability.

Primary source = data you collect

Secondary source = data someone else collected

The benefit of primary data is that you know all about them. The problem with secondary data is that often, the reliability, accuracy, and integrity of the data is uncertain. Who collected the data? Are the data biased? How old are the data? Can the data be verified, or do they have to be taken on faith? All of these questions can be difficult to answer if you did not collect the data yourself.

Examples of secondary sources: scientific research papers, news reports, Wikipedia, textbooks, websites, etc.

However, some secondary sources are better than others. For example, academic institutions tend to present original information in a less biased way. Textbooks and some encyclopedias can be good as well. The worst secondary sources tend to be those advocating particular opinions, especially personal blogs. It can be hard to determine whether a bias exists in a secondary source, so it is always good to confirm the information with another source.

Can you think of a reliable secondary source that was mentioned earlier in this learning activity?

If you answered, Statistics Canada, then you are correct!

### Examples

Categorize each data source as primary or secondary.

## One Variable vs. Two Variable Data:

Example: One Variable Studies Example: Two Variable Studies

Is an individual colour blind or not?

Relationship between having a pet and a person’s emotional or physical health.

Heights of a representative sample of adult Canadians

Relationship between the level of pollution in a city and the average life span of the residents.

Favourite colour

Relationship between the proportion of Internet subscribers in a neighbourhood and voting patterns in the neighbourhood.

The number of variables collected affects the way the data are analyzed.

In this unit we will investigate the way two variable data are analyzed. (One variable data analysis is studied in Unit 2.)

### Practice Questions:

For each situation select the best answer.

## Variability in Data and Sampling:

In general there is variability in data. Variability can come from two sources:

1. Inherent variability

2. Measurement variability

Inherent Variability: refers to the variety of responses possible from the ‘sample’ surveyed (they are a smaller group representing the larger ‘population’). This inherent variability is minimized by choosing appropriate sampling methods but it cannot be avoided.

Measurement Variability: refers to variability from any minor differences in procedure of taking
measurements (human or mechanical). This variability can be minimized by experimental design but cannot be totally avoided.

Examples:

For each situation determine

i. the population

ii. the variable being researched

iii. whether the variable is quantitative or qualitative

iv. if quantitative, whether the data are continuous or discrete

a. You are hired by a restaurant to determine how often each customer visits the restaurant each week.

b. The transit commission hires you to record the time between buses at a specific stop.

c. You conduct a study on whether residents in city A have more disposable income than those in city B.

d. You survey members of your community for their opinion on the proposed name of the new community center.

e. You collect data on the number of cars parked on your street at the top of each hour.

f. You are a marine biologist studying the biodiversity in Long Lake by identifying the species of fish in the lake caught by anglers.

## Ways to Depict 2 Variable Data:

Once data have been collected, they need to be presented effectively to communicate the patterns they contain. The most common ways to show patterns in data are to present them in tables or graphs. Data showing income distribution for an Ontario city are shown below as both a table and graph:

Table:

Annual income Percentage

Less than $10 000 22%$10 000 to $25 000 29%$25 000 to $50 000 30%$50 000 to $100 000 16% Greater than$100 000

3%

Graph:

Graphs are often used to display data so the patterns are observed quickly and easily, and used to draw conclusions. There are many types of graphs to choose from (e.g., pie graph, bar graph, histogram, line graph, scatter plot, etc.).

### Examples of reading other types of graphs:

Another common graph is a bar graph.

Can you interpret the trends in this graph?

Answer these questions to find out.

Another commonly used graph is the pictograph, where symbols represent counts. This graph describes the population in various Ontario towns. The stick person represents 1000 people.

For each question, select the best answer.

## Self-check

In this learning activity we have been introduced to statistics and why they are important in our lives. Since statistics involves data we need to classify into different types of data. We looked at these different types of data: quantitative vs qualitative; discrete vs continuous; primary vs secondary; and one vs two variable data. We also looked at variability that occurs within data, and began looking at how two variable data is displayed.

Complete these practice questions to be sure that you can:

• classify different types of data
• understand why there is variability present within data sets
• read a variety of graphs for trends

### Practice Questions:

1. As an ongoing component of this course you will create a glossary of the many terms we will encounter throughout the units. Each term should have a clear description of the meaning of the term, and an example if appropriate.

Example:

Quantitative (i.e., numerical) data - data that are represented using numbers; also referred to as numerical data (e.g., height).

Begin your glossary of terms with this learning activity and any relevant terms it includes.

2. Determine if these are quantitative or qualitative:

3. For the quantitative items from #2, classify them as discrete or continuous.

4. Provide an example of secondary source data, and explain the difference between it and primary source data.

5. You have been given the task of investigating students opinion of their school. You plan to ask them two questions:

1. Do you enjoy going to school? Yes, No, or Undecided
2. What is your overall average in your courses? _______ What is your age?___________

a) Explain why the first question is one variable data, and describe how you might present the results.

b) Explain why the second question is two variable data, and describe how you might present the results.

Note - The final assessment in unit 4 for this course is based on the material from Unit 1 and 2. The first part will be introduced at the end of Unit 1, and the second part at the end of Unit 2. This will allow time for you to collect data and analyze it using the skills from units 1 and 2, and to prepare a written report and presentation of your findings.

### Reflection:

To determine your understanding of the concepts in this learning activity reflect on these questions:

• Can I classify types of data in a variety of ways?
• Can I describe why there is variation within data sets?
• Can I describe how to present one and two variably data and identify trends in the data?

If you have answered ‘no’ to any of these questions, go back and read through the relevant examples.

If you still have questions please message your teacher or search out additional examples on line.

## Reflection

In this course you are an independent, self-directed learner. Consider this definition of a self-directed learner:

Self-directed learners are aware of how they learn best.  They are confident and know when to ask for support.  Self-directed learners set goals and make realistic plans to meet those goals.  In other words, they make a commitment to their own learning and take responsibility for it. Remember you are an independent learner, but you are not alone. At any time you can reach out to your Academic Officer at the ILC for support.

As a self-directed learner, track your progress on these:

Rate your understanding on a scale of five to one.

Five means “I have a thorough understanding.” One means “I am confused.”

Agree or Disagree statements ranked 5 to 1
Statement 5 4 3 2 1
I'm aware of how I learn best.
I have confidence in my abilities as a learner.
I know when and where to ask for support.
I take responsibility for and make a commitment to my own learning.
I practise new skills because I want to improve.
I reflect on my own learning to determine my strengths and where I can improve.
I use suggested answers and feedback to improve.

You will benefit from keeping an organized notebook, either digital or paper-based. Throughout the course, you will be prompted to reflect on your learning and document evidence of your growth.

Now that you have completed the first learning activity, go back and make notes. Consider adding definitions or terminology. You may even want to start creating your own personal word wall. A word wall is a type of glossary where you add the words you want to remember, including definitions and helpful examples.