{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "pd.set_option('display.max_rows', 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Methods and Descriptive Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Informational Methods\n", "\n", "Pandas methods that help the user 'peek at the data' in different ways (e.g. look at a few rows at a time, count the number of non-null entries, count the number of distinct entries). These methods are particularly useful when the data is too large to look at in its entirety." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "|Method Name|Description|\n", "|---|---|\n", "|`head`|return the first `n` entries of a Series|\n", "|`tail`|return the last `n` entries of a Series|\n", "|`count`|Count the number of non-null entries of a Series|\n", "|`nunique`|Returns number of unique values of a Series|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Example:* The `DataFrame` named `uswnt` contains information on all soccer players on the US Women's national team from 1991 through 2019." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "uswnt = pd.read_csv('data/world_cups.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(147, 9)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# number of rows / columns\n", "uswnt.shape" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerPosAgeYearAppsStartsMinGlsAst
0Mary HarveyGK2619916654000
1Julie FoudyMF2019916654012
2Carla OverbeckDF2319916650700
3Carin Jennings-GabarraFW2619916649163
4Michelle AkersMFFW25199166491101
5Linda HamiltonDF2219916550700
6Mia HammMFFW1919916549921
\n", "
" ], "text/plain": [ " Player Pos Age Year Apps Starts Min Gls Ast\n", "0 Mary Harvey GK 26 1991 6 6 540 0 0\n", "1 Julie Foudy MF 20 1991 6 6 540 1 2\n", "2 Carla Overbeck DF 23 1991 6 6 507 0 0\n", "3 Carin Jennings-Gabarra FW 26 1991 6 6 491 6 3\n", "4 Michelle Akers MFFW 25 1991 6 6 491 10 1\n", "5 Linda Hamilton DF 22 1991 6 5 507 0 0\n", "6 Mia Hamm MFFW 19 1991 6 5 499 2 1" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# first 7 entries; players from the 90s\n", "uswnt.head(7)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerPosAgeYearAppsStartsMinGlsAst
145Allie LongMF312019103100
146Emily SonnettDF25201910800
\n", "
" ], "text/plain": [ " Player Pos Age Year Apps Starts Min Gls Ast\n", "145 Allie Long MF 31 2019 1 0 31 0 0\n", "146 Emily Sonnett DF 25 2019 1 0 8 0 0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# last 2 entries; players from 2019\n", "uswnt.tail(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A look at the Player column:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Mary Harvey\n", "1 Julie Foudy\n", "2 Carla Overbeck\n", "3 Carin Jennings-Gabarra\n", "4 Michelle Akers\n", "Name: Player, dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Look at players column; `head` also a Series method.\n", "players = uswnt['Player']\n", "players.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(147,)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# no duplicates\n", "players.shape" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "147" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "players.count()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "76" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "players.nunique()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerPosAgeYearAppsStartsMinGlsAst
4Michelle AkersMFFW25199166491101
132Alex MorganFW2920196649063
3Carin Jennings-GabarraFW2619916649163
110Carli LloydMF3220157763061
136Megan RapinoeFWMF3320195542962
\n", "
" ], "text/plain": [ " Player Pos Age Year Apps Starts Min Gls Ast\n", "4 Michelle Akers MFFW 25 1991 6 6 491 10 1\n", "132 Alex Morgan FW 29 2019 6 6 490 6 3\n", "3 Carin Jennings-Gabarra FW 26 1991 6 6 491 6 3\n", "110 Carli Lloyd MF 32 2015 7 7 630 6 1\n", "136 Megan Rapinoe FWMF 33 2019 5 5 429 6 2" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Top 5: Most goals in a single world-cup tournament; note the index.\n", "uswnt.sort_values(by='Gls', ascending=False).head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerPosAgeYearAppsStartsMinGlsAst
4Michelle AkersMFFW25199166491101
19Kristine LillyMFFW2319956651830
41Tiffeny MilbrettFW2619996550930
60Abby WambachFW2320036542630
72Abby WambachFW2720076653660
89Abby WambachFW3120116660041
110Carli LloydMF3220157763061
132Alex MorganFW2920196649063
\n", "
" ], "text/plain": [ " Player Pos Age Year Apps Starts Min Gls Ast\n", "4 Michelle Akers MFFW 25 1991 6 6 491 10 1\n", "19 Kristine Lilly MFFW 23 1995 6 6 518 3 0\n", "41 Tiffeny Milbrett FW 26 1999 6 5 509 3 0\n", "60 Abby Wambach FW 23 2003 6 5 426 3 0\n", "72 Abby Wambach FW 27 2007 6 6 536 6 0\n", "89 Abby Wambach FW 31 2011 6 6 600 4 1\n", "110 Carli Lloyd MF 32 2015 7 7 630 6 1\n", "132 Alex Morgan FW 29 2019 6 6 490 6 3" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Top goal scorer per world cup for USWNT\n", "(\n", " uswnt\n", " .sort_values(by='Gls', ascending=False)\n", " .drop_duplicates(subset=['Year'])\n", " .sort_values(by='Year')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Array arithmetic\n", "* `Series` can use array arithmetic just like Numpy\n", "* Warning: arrays indices are lined up before operation! (More on this later)\n", "\n", "*Example:* Compute (1) the minutes played per appearance and (2) each players year of birth." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 90.000000\n", "1 90.000000\n", "2 84.500000\n", "3 81.833333\n", " ... \n", "143 90.000000\n", "144 45.000000\n", "145 31.000000\n", "146 8.000000\n", "Length: 147, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "minutes = uswnt['Min']\n", "apps = uswnt['Apps']\n", "\n", "(minutes / apps)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1965\n", "1 1971\n", "2 1968\n", "3 1965\n", " ... \n", "143 1993\n", "144 1988\n", "145 1988\n", "146 1994\n", "Length: 147, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "year = uswnt['Year']\n", "ages = uswnt['Age']\n", "\n", "(year - ages)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Descriptive methods\n", "\n", "As noted in the previous section, `Series` and `DataFrame` objects are Numpy arrays with named labels. As such, \n", "* Numpy functions and methods are directly applicable to Pandas objects (particularly `Series`), and\n", "* many Pandas methods are inherited from Numpy, often with tweaks to default arguments that are convenient for data analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Example:* Applying Numpy functions to a `Series` (e.g. a column of a `DataFrame`) results in applying the function to the data in the underlying Numpy array." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.421768707482993" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# mean age of the players\n", "np.sum(ages) / ages.shape[0]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.421768707482993" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The mean\n", "np.mean(ages)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.median(ages)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Example:* Pandas supplies these Numpy function as `Series` methods as well." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.421768707482993" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ages.mean()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ages.median()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 147.000000\n", "mean 26.421769\n", "std 4.298654\n", "min 18.000000\n", "25% 23.000000\n", "50% 26.000000\n", "75% 30.000000\n", "max 39.000000\n", "Name: Age, dtype: float64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ages.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Example:* The *variance* is an example of a method that differs between Numpy and Pandas.\n", "* In Numpy, `np.var` computes the population variance.\n", "* In Pandas, the `var` method computes the sample variance.\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.284007866739518" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(((ages - ages.mean())**2).sum() / ages.shape[0])**(1/2)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.284007866739518" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std(ages)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.298654090204659" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(((ages - ages.mean())**2).sum() / (ages.shape[0] - 1))**(1/2)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.298654090204659" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ages.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `DataFrame` Methods and the `axis` keyword\n", "\n", "* DataFrames share *many* of the same methods with Series.\n", " - The dataFrame method applies the Series method to every row/column.\n", "* Some of these methods take the `axis` keyword argument:\n", " - `axis=0`: the method is applied to series with index given by **rows**.\n", " - `axis=1`: the method is applied to series with index given by **columns**.\n", "* Default value: `axis=0` (apply method to each column)." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Age 26.421769\n", "Year 2005.612245\n", "Apps 4.605442\n", "Starts 3.741497\n", "Min 342.789116\n", "Gls 0.918367\n", "Ast 0.476190\n", "dtype: float64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.mean() " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Player Wendy Gebauer\n", "Pos MFFWDF\n", "Age 39\n", "Year 2019\n", " ... \n", "Starts 7\n", "Min 630\n", "Gls 10\n", "Ast 4\n", "Length: 9, dtype: object" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.max()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerPosAgeYearAppsStartsMinGlsAst
0Mary HarveyGK2619916654000
1Julie FoudyMF2019916654012
2Carla OverbeckDF2319916650700
3Carin Jennings-GabarraFW2619916649163
4Michelle AkersMFFW25199166491101
\n", "
" ], "text/plain": [ " Player Pos Age Year Apps Starts Min Gls Ast\n", "0 Mary Harvey GK 26 1991 6 6 540 0 0\n", "1 Julie Foudy MF 20 1991 6 6 540 1 2\n", "2 Carla Overbeck DF 23 1991 6 6 507 0 0\n", "3 Carin Jennings-Gabarra FW 26 1991 6 6 491 6 3\n", "4 Michelle Akers MFFW 25 1991 6 6 491 10 1" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.head()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2569\n", "1 2566\n", "2 2533\n", "3 2529\n", "4 2530\n", "dtype: int64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.head().sum(axis=1)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeYearAppsStartsMinGlsAst
count147.000000147.000000147.000000147.000000147.000000147.000000147.000000
mean26.4217692005.6122454.6054423.741497342.7891160.9183670.476190
std4.2986549.2295301.9673982.427266203.7861571.5593740.862724
min18.0000001991.0000001.0000000.0000002.0000000.0000000.000000
25%23.0000001999.0000003.0000001.000000113.5000000.0000000.000000
50%26.0000002007.0000006.0000005.000000429.0000000.0000000.000000
75%30.0000002015.0000006.0000006.000000509.5000001.0000001.000000
max39.0000002019.0000007.0000007.000000630.00000010.0000004.000000
\n", "
" ], "text/plain": [ " Age Year Apps Starts Min \\\n", "count 147.000000 147.000000 147.000000 147.000000 147.000000 \n", "mean 26.421769 2005.612245 4.605442 3.741497 342.789116 \n", "std 4.298654 9.229530 1.967398 2.427266 203.786157 \n", "min 18.000000 1991.000000 1.000000 0.000000 2.000000 \n", "25% 23.000000 1999.000000 3.000000 1.000000 113.500000 \n", "50% 26.000000 2007.000000 6.000000 5.000000 429.000000 \n", "75% 30.000000 2015.000000 6.000000 6.000000 509.500000 \n", "max 39.000000 2019.000000 7.000000 7.000000 630.000000 \n", "\n", " Gls Ast \n", "count 147.000000 147.000000 \n", "mean 0.918367 0.476190 \n", "std 1.559374 0.862724 \n", "min 0.000000 0.000000 \n", "25% 0.000000 0.000000 \n", "50% 0.000000 0.000000 \n", "75% 1.000000 1.000000 \n", "max 10.000000 4.000000 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `apply` method\n", "\n", "The `apply` method is both a Series and a DataFrame method for applying custom functions across data.\n", "\n", "* `ser.apply(func)` applies `func` to the values contained in the Series `ser`,\n", "* `df.apply(func)` applies `func` to the *columns* of the DataFrame `df`,\n", "* `df.apply(func, axis=1)` applies `func` to the *rows* of the DataFrame `df`.\n", "\n", "*Remark:* Notice that, when applied to a DataFrame, `func` should be a function that takes in a *Series*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example:** To create a boolean column that describes if a given player's first name ends in the letter `e`, create a custom function to pass to `apply`:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "def firstname_endswith_e(player):\n", " '''returns True if the first name ends in the letter e'''\n", " fn, _ = player.split(maxsplit=1)\n", " return fn[-1] == 'e'" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 True\n", "2 False\n", "3 False\n", " ... \n", "143 False\n", "144 False\n", "145 True\n", "146 False\n", "Name: Player, Length: 147, dtype: bool" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt['Player'].apply(firstname_endswith_e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `agg` method\n", "\n", "The `agg` method simultaneously applies multiple Series methods to the columns of a DataFrame. Given a DataFrame `df`, \n", "* `df.agg(func)` returns a Series obtained by applying the function to the columns of a `df`,\n", "* `df.agg([f1,...,fN])` returns a DataFrame obtained by applying each function to each column of `df`,\n", "* `df.agg({col1:f1,...,colN:fN})` returns a Series obtained by applying each function to column specified by its corresponding key.\n", "* Analogously, `agg` can also be passed a dictionary, keyed by column name, of lists of functions.\n", "\n", "*Remark 1:* `agg` accepts function/method *names* as well, represented as strings.\n", "\n", "*Remark 2:* `agg` has an `axis` keyword argument that applies functions row-wise instead of column-wise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example:** `uswnt.agg('max')` computes the maximum value for each column. This value is also computable using the method directly -- that is, `uswnt.max()`." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Player Wendy Gebauer\n", "Pos MFFWDF\n", "Age 39\n", "Year 2019\n", " ... \n", "Starts 7\n", "Min 630\n", "Gls 10\n", "Ast 4\n", "Length: 9, dtype: object" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.agg('max')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example:** Passing a list of functions into `agg` results in a DataFrame whose rows contain the results of applying each function to columns of the original DataFrame. If a function throws an exception upon application to a column, the value in the resulting DataFrame is `NaN`." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerPosAgeYearAppsStartsMinGlsAst
maxWendy GebauerMFFWDF39.0000002019.0000007.0000007.000000630.00000010.0000004.00000
meanNaNNaN26.4217692005.6122454.6054423.741497342.7891160.9183670.47619
medianNaNNaN26.0000002007.0000006.0000005.000000429.0000000.0000000.00000
\n", "
" ], "text/plain": [ " Player Pos Age Year Apps Starts \\\n", "max Wendy Gebauer MFFWDF 39.000000 2019.000000 7.000000 7.000000 \n", "mean NaN NaN 26.421769 2005.612245 4.605442 3.741497 \n", "median NaN NaN 26.000000 2007.000000 6.000000 5.000000 \n", "\n", " Min Gls Ast \n", "max 630.000000 10.000000 4.00000 \n", "mean 342.789116 0.918367 0.47619 \n", "median 429.000000 0.000000 0.00000 " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.agg(['mean', np.median, 'max'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example:** Similarly, passing a dictionary of functions keyed by column name applies the function only to the specified columns." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Player Wendy Gebauer\n", "Pos 0\n", "Age 26.4218\n", "Ast 0\n", "dtype: object" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.agg({'Player': 'max', 'Pos': 'min', 'Age': 'mean', 'Ast': 'min'})" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerAge
maxWendy Gebauer39.000000
meanNaN26.421769
medianNaN26.000000
minAbby DahlkemperNaN
\n", "
" ], "text/plain": [ " Player Age\n", "max Wendy Gebauer 39.000000\n", "mean NaN 26.421769\n", "median NaN 26.000000\n", "min Abby Dahlkemper NaN" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uswnt.agg({'Player': ['min', 'max'], 'Age': ['mean', np.median, 'max']})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }