Statistics+-+Baseball

=Baseball Statistics with Excel=

Author: Joseph Warfel, 2011

SWBAT:
 * move between worksheets in an Excel file.
 * enter simple formulas – specifically those using arithmetic and SUM.
 * drag the fill handle to use a formula in multiple cells in a single row or column.
 * control the number formats of cells.

Materials: Computers with Excel – shared one computer to two students Classwork handouts, with baseball abbreviations as last page

[|handout.docx] [|baseball.xls]


 * Plan of the lesson:**

Start by reading the list of Excel Vocabulary so that the students will be more aware of those words when they come up in the lesson. I think it would be pointless to define each of the terms at the beginning because there are so many, and the best way to learn what they mean is by using them in Excel anyway.

Before continuing say something like “If you like baseball, you will learn about how some of the most important statistics are calculated. If you don't like baseball, you will still be able to do this because I will tell you how the statistics work; and also, this won't last very long, and we won't do anything about baseball ever again.”

Pitching: Derive the formula for ERA by setting up a proportion and cross-multiplying. Do an example with numbers, then let the class do an problem on their own (if necessary), then express the formula using variables (ER and IP). Mention that a good ERA is close to 3, and smaller numbers are better. The players in the examples are purposely chosen to have silly nicknames.


 * Guided Practice:**

Only at this point is Excel necessary. Download the baseball.xls file from a website or wiki or server or ftp. Open Excel by double-clicking on the downloaded file. Introduce the idea of a workbook composed of worksheets composed of cells; explain how each cell is uniquely identified by a pair of a letter and number.

Navigate to the CWS Pitching worksheet. (CWS is the abbreviation for the Chicago White Sox.)

In column S, label cell S1 “ERA.”

In cell S2, calculate ERA for D. Axelrod.

Use the fill handle to drag the formula to S23. Optional: ERA is traditionally expressed as a decimal with two digits of precision. Select the range S2:S23 and set the formatting so that ERA is displayed in this way, either through the Format menu or by right-clicking.

Now, having a set of ERAs, have a discussion about them. Ask the students if they see anything interesting. Possible leading questions: Which is the worst pitcher on this team? Are there any good pitchers on the team? Does this team appear to be good at pitching?

This conversation should lead to the idea that it is difficult to evaluate the performance of the team as a whole by looking at individual ERAs, since some pitchers pitched hundreds of innings and others less than ten. So, it would be good if we could calculate an ERA for all of the pitchers on the team – which is entirely possible.

In cell A25, enter “Team.”

In cell C25, calculate the sum of the column. The formula should be =SUM(C2:C23), which can be typed in, or can be done by typing “=SUM(“ and then selecting the range C2:C23 with the mouse.

Use the fill handle to drag the formula from C25 over to Q25.

Calculate the team ERA in cell S25, either by typing in the formula, or by dragging from cell S23, or by copying and pasting cell S23. I think the dragging option is the best of these, although you will have to delete a “divide by zero” error code from cell S24 afterwards.

Now, having calculated the Team ERA, see if it answers any of the questions that came up in the discussion.

Let the students carry out the same analysis for the Cubs by using the data in the CHC Pitching worksheet. Direct them, when they finish, to compare the pitching of the White Sox and Cubs on the last page of the classwork. After everyone has finished the CHC Pitching worksheet, discuss this briefly.
 * Independent Practice:**

Then, go back to the first page and review the Excel vocabulary. Call on a student to explain each one, either verbally or by pointing at it on the screen. If there is one about which many students are confused, invent a definition for the students to write on the first page of the handout.
 * Review of Vocabulary:**

Explain the formula for batting average. This is a simpler formula than ERA. Carry out an example and let the students do a problem if it seems necessary.
 * Batting:**

There is also a formula for OBP. This is an alternative statistic that has been shown to have a stronger correlation with the number of runs scored by a team and their number of wins in a season. It is part of the approach to baseball statistics called “sabermetrics” that is depicted in the movie //Moneyball//. A player getting on base due to a walk is very similar to hitting a single, but batting average does not recognize this contribution, while OBP does. If there's not much time left, OBP could be left out; it returns very similar results to batting average for the data here. But, if the students are interested, you can see how the OBP statistic is constructed to overcome the disadvantages of batting average. The players in the examples were both accused of cheating in the 1919 “Black Sox” scandal. Many people believe that Shoeless Joe wasn't cheating, and they use statistical analysis of his performance in the series to support that. Chick Gandil, on the other hand, is generally recognized as the leader of the players that cheated.


 * Guided Practice:**

Navigate to the worksheet CWS Batting. Calculate batting average (BA) and OBP in a similar way to what we did with ERA. There are two options here: you could use the formula =(F2+G2+H2+I2)/D2 for batting average, or you could calculate the intermediate quantity of hits with the formula =F2+G2+H2+I2 and then calculate batting average with =hits/D2. The second option is more convenient if you are including OBP because otherwise the numerator of the expression is long and not very intuitive. When I do this, I will try to elicit instructions from the students for what to do as much as possible, getting them to use words like “drag” and “fill handle” and “cell.”


 * Independent Practice:**

Students carry out the same analysis for Cubs batting, and compare the two teams on the last page of the classwork. Discuss.


 * Potential Homework:**

Have the students analyze the data for team XYZ, and compare it to the Cubs and White Sox. If they only rely on ERA and batting average, they'll probably conclude that XYZ is much better than both of the other teams. However, if they look at the data carefully, they'll notice that XYZ has very few players and that they haven't played many games.

This is team XYZ: http://www.ocvarsity.com/articles/view-27478-ocean-series.html