Saturday, May 21, 2022
HomeBusiness IntelligenceBallpark figures: Analyzing MLB baseball attendance

Ballpark figures: Analyzing MLB baseball attendance


It’s springtime within the U.S., which implies one thing as American as apple pie is again: baseball. And since there’s every kind of nice information round one of many nation’s nice pastimes, we determined for this week’s submit to have a look at Main League Baseball (MLB) attendance statistics from the final 20 years, which is printed on many web sites together with the one we used to get the information you’ll discover within the charts under: ESPN.com.

To gather the attendance information from ESPN, we used Jupyter Workspaces (at present in beta in Domo) and the Python bundle Stunning Soup to parse the HTML. And since Domo can now schedule code in Jupyter Workspaces to run on an everyday schedule, you possibly can make certain that this web page will proceed to replace with the 2022 information.

The very first thing you’ll in all probability discover when wanting on the information is that 2020 is lacking. That’s as a result of, because of the pandemic, baseball was performed with out followers that 12 months. There was a little bit of a return to normalcy in 2021, nevertheless it wasn’t till this season that every one spectating restrictions have been lifted, so will probably be fascinating to look at how attendance rebounds (although, in full transparency, we solely have the information for full years proper now, so we’re not capturing any information associated to seasonality, reminiscent of how climate or a staff’s place within the playoff race impacts ticket gross sales).

One good method to evaluate this information is with an previous favourite of many information scientists: a field and whisker plot. The chart exhibits the minimal and most common attendance for every staff within the whiskers (the highest and backside traces). I’ve sorted this to point out the staff with the very best peak attendance 12 months on the left, and the bottom on the correct:

The place the visualization will get extra fascinating for me is with the field parts. Every field exhibits the house between twenty fifth and seventy fifth percentiles, which is supposed to mirror how a lot a staff’s attendance has swung over time. The larger bins inform me these groups (reminiscent of Philadelphia and Detroit) have had some nice years for attendance and a few not so nice years. Smaller bins (reminiscent of Boston) say {that a} staff has been very constant in its attendance numbers. We’ve got additionally filtered the chart for pre-pandemic years solely since 2021 (and to a lesser extent partial 2022 information) skews the information.

An alternate strategy to understanding how groups rank in attendance is to create indexes of the place a staff’s attendance stands relative to the entire MLB common—which is what we’ve achieved instantly under. Darkish blue bins imply {that a} staff is nicely above the common, whereas darkish orange bins imply {that a} staff is nicely under the common. You should use the filters to have a look at no matter league, division, staff(s), or 12 months(s) you’re focused on:

Lengthy-time Domo customers could also be taking a look at these indexes and considering that I did some pre-calculation in a Magic ETL or a Dataset View. It’s true that doing calculations on such complete ranges sometimes require pre-calculation. But when I did that, it might be laborious to permit for the 12 months filter. So, the key is out: With Domo’s new FIXED beast modes (at present in beta), you are able to do FIXED stage of element features proper in a beast mode. For the above “Index to League Avg”, that is the calculation:

You may see there are two issues taking place right here. First, when I’ve the SUM FIXED by League, then it’s summing throughout all values with the identical league because the row I’m on. That permits me to get that league complete we want for the denominator of the index. Second, it’s utilizing FILTER ALLOW to inform Domo that filters on 12 months can impression the FIXED features.  There are alternatives for FILTER ALLOW, FILTER DENY, and FILTER NONE.

Right here’s one final instance of how helpful the FIXED with FILTER DENY will be. The bar charts under are defaulted to the New York Yankees (my boss’ favourite staff). The primary chart isn’t utilizing FIXED, so once I filter for the Yankees, the Min, Max, and Median fields develop into meaningless since they get filtered to be the identical as the chosen staff. The second chart makes use of FIXED and DENY on staff identify in order that the Min, Max, and Median stay as references to the principle common, which is for the Yankees.

One of many issues I like—and likewise at occasions discover maddening—about exploring new information is that there’s at all times extra to discover. As I labored on this submit, I noticed that it might be fairly fascinating to herald groups’ win/loss information in addition to data on stadium capability. However then I assumed: Let’s perhaps save that for a future submit.




RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments

x