Using List collection functions and calculating summary statistics.
Developed with Davide Costa
You should now feel comfortable with the footballer dataset and how to work with tuples, records, anonymous records. You should also know how to perform simple transformations. With a large and heterogeneous dataset, it's useful to understand how to sort, group, and filter the data, and also many other interesting List functions.
It is a good idea to browse the documentation for lists at the F# language reference and the F# core library documentation sites before you start. For further discussion of collection functions, the related F# for fun and profit page is also useful.
Reference needed nuget packages and open namespaces
#r "nuget: FSharp.Data, 5.0.2"
#r "nuget: FSharp.Stats, 0.5.0"
open FSharp.Data
open FSharp.Stats
open FSharp.Stats.Correlation
Load the Csv file.
let [<Literal>] CsvPath = __SOURCE_DIRECTORY__ + "/FootballPlayers.csv"
type FootballPlayersCsv = CsvProvider<CsvPath>
let playerStatsTable =
FootballPlayersCsv.GetSample().Rows
|> Seq.toList
EXERCISES - PART 2
List Functions.
1 List.take
List.take 5 takes the first 5 rows.
List.take 2 takes the first 2 rows
Example: Take the first 4 rows from playerStatsTable with List.take.
playerStatsTable
|> List.take 4
|
- Take the first 7 rows from
playerStatsTablewithList.take.
answerplayerStatsTable
|> List.take 7
val it: CsvProvider<...>.Row list =
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35);
("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Karim Benzema", "fr FRA", "FW", "Real Madrid", "esLa Liga", 33, 32, 27);
("Ciro Immobile", "it ITA", "FW", "Lazio", "itSerie A", 31, 31, 27);
("Wissam Ben Yedder", "fr FRA", "FW", "Monaco", "frLigue 1", 30, 37, 25);
("Patrik Schick", "cz CZE", "FW", "Leverkusen", "deBundesliga", 25, 27, 24);
("Son Heung-min", "kr KOR", "MF,FW", "Tottenham", "engPremier League", 29,
35, 23)]
2 List.truncate
List.truncate 5 takes the first 5 rows.
List.truncate 2 takes the first 2 rows
You must have noted that List.take and List.truncate return similar outputs, but these are not exactly the same.
List.take gives you the exact number of items that you specify in the parameters,
while List.truncate takes at maximum the number of items you specified in the parameters.
Thus, in most cases both give you the exact same output, except if you ask for more items then the ones available in the List (List length).
In this particular scenario List.truncate returns the maximum number of elements (all the elements in the List),
while List.take returns an error, since it is supposed to take the exact number of elements you asked for, which is impossible in this particular case.
Example: Take the first 4 rows from playerStatsTable with List.truncate.
playerStatsTable
|> List.truncate 4
|
- Take the first 7 rows from
playerStatsTablewithList.truncate.
answerplayerStatsTable
|> List.truncate 7
val it: CsvProvider<...>.Row list =
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35);
("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Karim Benzema", "fr FRA", "FW", "Real Madrid", "esLa Liga", 33, 32, 27);
("Ciro Immobile", "it ITA", "FW", "Lazio", "itSerie A", 31, 31, 27);
("Wissam Ben Yedder", "fr FRA", "FW", "Monaco", "frLigue 1", 30, 37, 25);
("Patrik Schick", "cz CZE", "FW", "Leverkusen", "deBundesliga", 25, 27, 24);
("Son Heung-min", "kr KOR", "MF,FW", "Tottenham", "engPremier League", 29,
35, 23)]
3 List.distinct
List.distinct returns the unique elements from the List.
["hello"; "world"; "hello"; "hi"] |> List.distinct returns ["hello"; "world"; "hi"]
Example: From playerStatsTable Nation field find the unique elements with List.distinct.
playerStatsTable
|> List.map(fun x -> x.Nation)
|> List.distinct
|
- From
playerStatsTableLeaguefield find the unique elements withList.distinct.
answerplayerStatsTable
|> List.map(fun x -> x.League)
|> List.distinct
val it: string list =
["deBundesliga"; "frLigue 1"; "esLa Liga"; "itSerie A"; "engPremier League"]
4 List.countBy
List.countBy returns a list of paired tuples with the unique elements and their counts.
Example: From playerStatsTable Team field find the unique elements and their counts with List.countBy.
playerStatsTable
|> List.countBy(fun x -> x.Team)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
- From
playerStatsTableLeaguefield find the unique elements and their counts withList.countBy.
answerplayerStatsTable
|> List.countBy(fun x -> x.League)
val it: (string * int) list =
[("deBundesliga", 36); ("frLigue 1", 46); ("esLa Liga", 30);
("itSerie A", 52); ("engPremier League", 36)]
5 List.filter
List.filter allows you to extract a subset of the dataset based on one or multiple conditions.
Example: Filter the playerStatsTable to get only portuguese players. (Nation = "pt POR").
Remember that we have to look to the dataset to find the string correspondent to portuguese players,
which in this case is "pt POR"
playerStatsTable
|> List.filter(fun x -> x.Nation = "pt POR")
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
FiltertheplayerStatsTableto get only 16 year-old players. (Age = 16).
answerplayerStatsTable
|> List.filter(fun x -> x.Age = 16)
val it: CsvProvider<...>.Row list = []
6 List.sort and List.sortDescending
[1; 4; 5; 3; 6] |> List.sortreturns[1; 3; 4; 5; 6](ascending sort).[1; 4; 5; 3; 6] |> List.sortDescendingreturns[6; 5; 4; 3; 1](descending sort).
Example: map playerStatsTable to get a list of Age and sort it (ascending).
Since we want to sort the age List we first use List.map to get only that List.
Then we use List.sort to sort it.
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.sort
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
|
-
map
playerStatsTableto get a list ofGoalsScoredand sort it (ascending).
Hint: To sort the GoalsScored List you first need to useList.mapto get only that List. Then useList.sortto sort it.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.sort
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
val it: int list =
[0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2; 2; 2;
2; 2; 2; 2; 2; 2; 2; 2; 2; 2]
Example: Map playerStatsTable to get a list of Age and sort it (descending).
Since we want to sort the age List we first use List.map to get only that List.
Then we use List.sortDescending to sort it.
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.sortDescending
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
|
-
Map
playerStatsTableto get a list ofGoalsScoredand sort it (descending).
Hint: To sort the GoalsScored List you first need to useList.mapto get only that List. Then useList.sortDescendingto sort it.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.sortDescending
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
val it: int list =
[35; 28; 27; 27; 25; 24; 23; 23; 22; 21; 21; 21; 20; 20; 18; 18; 17; 17; 17;
17; 17; 17; 16; 16; 16; 16; 16; 15; 15; 13; 13; 13; 13; 13; 12; 12; 12; 12;
12; 12; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 10; 10; 10; 10; 10; 10;
10; 10; 10]
7 List.sortBy and List.sortByDescending
List.sortBy is very usefull to sort the dataset accordingly to a certain dataset field.
Example: sort (ascending) playerStatsTable by Age (List.sortBy).
playerStatsTable
|> List.sortBy(fun x -> x.Age)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
- sort (ascending)
playerStatsTablebyGoalsScored(List.sortBy).
answerplayerStatsTable
|> List.sortBy(fun x -> x.GoalsScored)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
val it: CsvProvider<...>.Row list =
[("Stefan Ortega", "de GER", "GK", "Arminia", "deBundesliga", 28, 33, 0);
("Rui Patrício", "pt POR", "GK", "Roma", "itSerie A", 33, 38, 0);
("Philipp Pentke", "de GER", "GK", "Hoffenheim", "deBundesliga", 36, 1, 0);
("Pavao Pervan", "at AUT", "GK", "Wolfsburg", "deBundesliga", 33, 6, 0);
("Nick Pope", "eng ENG", "GK", "Burnley", "engPremier League", 29, 36, 0)]
Example: sort (descending) playerStatsTable by Age (List.sortByDescending).
playerStatsTable
|> List.sortByDescending(fun x -> x.Age)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
- sort (descending)
playerStatsTablebyGoalsScored(List.sortByDescending).
answerplayerStatsTable
|> List.sortByDescending(fun x -> x.GoalsScored)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
val it: CsvProvider<...>.Row list =
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35);
("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Karim Benzema", "fr FRA", "FW", "Real Madrid", "esLa Liga", 33, 32, 27);
("Ciro Immobile", "it ITA", "FW", "Lazio", "itSerie A", 31, 31, 27);
("Wissam Ben Yedder", "fr FRA", "FW", "Monaco", "frLigue 1", 30, 37, 25)]
8 List.splitInto
List.splitInto is very usefull to split your dataset into multiple subsets.
This function is commonly used to generate quantiles by splitting a sorted List.
For instance, for investment strategies financial assets are usually sorted by a certain signal
and then splitted into quantiles. If the signal has a positive sign, it means that the long strategy consists of going long
on the first quantile stocks, and the long-short strategy consists of going long on the first quantile stocks and short on the last quantile stocks.
Note: List.splitInto receives one parameter which refers to the number of groups you want to create out of the dataset.
Example: Sort the playerStatsTable by GoalsScored and then split the dataset into 4 groups using List.sortBy and List.splitInto.
playerStatsTable
|> List.sortBy(fun x -> x.GoalsScored)
|> List.splitInto 4
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
|> List.map(fun x -> x |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
|
- Sort the
playerStatsTablebyAgeand then split the dataset into 5 groups usingList.sortByandList.splitInto.
answerplayerStatsTable
|> List.sortBy(fun x -> x.Age)
|> List.splitInto 5
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
|> List.map(fun x -> x |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
val it: CsvProvider<...>.Row list list =
[[("Giorgio Scalvini", "it ITA", "DF,MF", "Atalanta", "itSerie A", 17, 18, 1);
("Alejandro Primo", "es ESP", "GK", "Levante", "esLa Liga", 17, 1, 0);
("Florian Wirtz", "de GER", "MF,FW", "Leverkusen", "deBundesliga", 18, 24,
7); ("Destiny Udogie", "it ITA", "DF", "Udinese", "itSerie A", 18, 35, 5);
("Bukayo Saka", "eng ENG", "FW,MF", "Arsenal", "engPremier League", 19, 38,
11)];
[("Lautaro Martínez", "ar ARG", "FW", "Inter", "itSerie A", 23, 35, 21);
("Christopher Nkunku", "fr FRA", "FW,MF", "RB Leipzig", "deBundesliga", 23,
34, 20);
("Tammy Abraham", "eng ENG", "FW", "Roma", "itSerie A", 23, 37, 17);
("Ludovic Blas", "fr FRA", "MF,FW", "Nantes", "frLigue 1", 23, 35, 10);
("Emmanuel Dennis", "ng NGA", "FW,MF", "Watford", "engPremier League", 23,
33, 10)]]
9 List.groupBy
List.groupBy allows you to group elements of a list.
It takes a key-generating function and a list as inputs.
The function is executed on each element of the List, returning a list of tuples
where the first element of each tuple is the key and the second is a list of the elements for which the function produced that key.
Example: Group the playerStatsTable by Nation using List.groupBy.
playerStatsTable
|> List.groupBy(fun x -> x.Nation)
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
|> List.map(fun (x, xs) -> x, xs |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
|
- Group the
playerStatsTablebyAgeusingList.groupBy.
answerplayerStatsTable
|> List.groupBy(fun x -> x.Age)
|> List.map(fun (x, xs) -> x, xs |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
val it: (int * CsvProvider<...>.Row list) list =
[(32,
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga",
32, 34, 35);
("Marco Reus", "de GER", "MF,FW", "Dortmund", "deBundesliga", 32, 29, 9);
("Ivan Perišić", "hr CRO", "DF", "Inter", "itSerie A", 32, 35, 8);
("Axel Witsel", "be BEL", "MF,DF", "Dortmund", "deBundesliga", 32, 29, 2);
("Ivan Radovanović", "rs SRB", "DF,MF", "Salernitana", "itSerie A", 32,
14, 1)]);
(22,
[("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Gianluca Scamacca", "it ITA", "FW", "Sassuolo", "itSerie A", 22, 36, 16);
("Moussa Diaby", "fr FRA", "FW,MF", "Leverkusen", "deBundesliga", 22, 32,
13);
("Randal Kolo Muani", "fr FRA", "FW,MF", "Nantes", "frLigue 1", 22, 36,
12);
("Mason Mount", "eng ENG", "MF", "Chelsea", "engPremier League", 22, 32,
11)])]
Statistics List Functions
1 List.max
[1; 4; 5; 3; 6] |> List.max returns 6 (the highest value in the List).
Example: Map playerStatsTable to get the Age List, and find the maximum (List.max).
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.max
|
- Map
playerStatsTableto get theGoalsScoredList, and find the maximum (List.max).
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.max
val it: int = 35
2 List.min
[1; 4; 5; 3; 6] |> List.min returns 1 (the lowest value in the List).
Example: Map playerStatsTable to get the Age List, and find the minimum (List.min).
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.min
|
- Map
playerStatsTableto get theGoalsScoredList, and find the minimum (List.min).
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.min
val it: int = 0
3 List.maxBy
Sometimes you want the element with the "maximum y" where "y" is the result of applying a particular function to a list element. This is what List.maxBy is for. This function is best understood by seeing an example.
Example: Find the player in playerStatsTable with the maximum Age using maxBy. What we need to do then is write a function that takes a player as input and outputs the players age. List.maxBy will then find the player that is the maxiumum after transforming it using this function.
playerStatsTable
|> List.maxBy(fun x -> x.Age)
|
- Find the maximum
playerStatsTablerow byGoalsScoredusingmaxBy.
answerplayerStatsTable
|> List.maxBy(fun x -> x.GoalsScored)
val it: CsvProvider<...>.Row =
("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35)
4 List.minBy
Sometimes you want the element with the "minimum y" where "y" is the result of applying a particular function to a list element. This is what List.minBy is for.
Example: Find the player in playerStatsTable with the minimum Age using minBy.
playerStatsTable
|> List.minBy(fun x -> x.Age)
|
- Find the minimum
playerStatsTablerow byGoalsScoredusingminBy.
answerplayerStatsTable
|> List.minBy(fun x -> x.GoalsScored)
val it: CsvProvider<...>.Row =
("Stefan Ortega", "de GER", "GK", "Arminia", "deBundesliga", 28, 33, 0)
5 List.sum
[1; 4; 5; 3; 6] |> List.sum returns 19 (sum of the List elements).
Example: Calculate the total number of years lived by all players. Hint: transform (List.map) each element of playerStatsTable into an integer representing the player's Age and then get the sum (List.sum) of all the players' ages (the result should be an int).
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.sum
|
- Calculate the total goals scored (
GoalsScored) by all players inplayerStatsTable.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.sum
val it: int = 1470
6 List.sumBy
We are using a dataset that has multiple fields per List element. If you want to get the sum for particular fields it convenient to use List.sumBy.
It takes a function and transforms each element using that function and afterward sums all the transformed elements. It is like an List.map and List.sum combined into one function.
Example: Use List.sumBy to calculate the total number of years lived by all players in playerStatsTable. Remember that each player has lived Age years.
playerStatsTable
|> List.sumBy(fun x -> x.Age)
|
- Find the sum of the
GoalsScoredby all players inplayerStatsTableusingList.sumBy.
answerplayerStatsTable
|> List.sumBy(fun x -> x.GoalsScored)
val it: int = 1470
7 List.average
[1.0; 2.0; 5.0; 2.0] |> List.average returns 2.5 (the average of all the List elements).
Example: Transform playerStatsTable into a list of the players' ages (Age) and find the average Age (List.average).
The field x.Age needs to be transformed from int to float because List.average only works with floats or decimals.
playerStatsTable
|> List.map(fun x -> float x.Age)
|> List.average
|
-
Use
List.mapto transformplayerStatsTableinto a list of the players'GoalsScoredand find the averageGoalsScored(List.average).
Hint: The variablex.GoalsScoredneeds to be transformed frominttofloatsinceList.averageonly works withfloatsordecimals.
answerplayerStatsTable
|> List.map(fun x -> float x.GoalsScored)
|> List.average
val it: float = 7.35
8 List.averageBy
We are using a dataset that has multiple fields per List element. If you want to get the average for particular fields it convenient to use List.averageBy.
It takes a function and transforms each element using that function and afterward averages all the transformed elements. It is like an List.map and List.average combined into one function.
Example: Find the average Age using List.averageBy.
The Age needs to be transformed from int to float since List.averageBy only works with floats or decimals.
playerStatsTable
|> List.averageBy(fun x -> float x.Age)
|
-
Find the average
GoalsScoredusingList.averageBy.
Hint: TheGoalsScoredneeds to be transformed frominttofloatsinceList.averageByonly works withfloatsordecimals.
answerplayerStatsTable
|> List.averageBy(fun x -> float x.GoalsScored)
val it: float = 7.35
9 Seq.stDev
For Seq.stDev to work, we loaded the FSharp.Stats nuget (#r "nuget: FSharp.Stats, 0.5.0").
This nuget contains the standard deviation function.
Besides this we also opened the module FSharp.Stats (open FSharp.Stats).
FSharp.Stats documentation
Example: Use List.map to transform playerStatsTable by GoalsScored and find the standard deviation. (Seq.stDev).
Note that for Seq.stDev to work the values need to be floats or decimals, so we need to transform the GoalsScored from int to float.
playerStatsTable
|> List.map(fun x -> float x.GoalsScored)
|> Seq.stDev
|
-
Transform
playerStatsTableinto a list of the players'Age's and find the standard deviation. (Seq.stDev).
Hint: You need to transformAgevalues frominttofloats.
answerplayerStatsTable
|> List.map(fun x -> float x.Age)
|> Seq.stDev
val it: float = 4.343018426
10 Seq.pearsonOfPairs
In order to perform correlations we have to load and open the namespace FSharp.Stats.
Also, we open FSharpe.Stats.Correlation to allow a easier access to the correlation functions.
It will be helpfull to check the FSharp.Stats.Correlation Documentation before starting the exercises.
Example: Test the correlation between MatchesPlayed and GoalsScored using pearsonOfPairs.
Seq.pearsonOfPairs expects a list of tuples (x1 * x2), computing the correlation between x1 and x2.
So we use List.map to get a list of tuples with (MatchesPlayed, GoalsScored).
Then we only need to pipe (|>) to Seq.pearsonOfPairs.
playerStatsTable
|> List.map(fun x -> x.MatchesPlayed, x.GoalsScored)
|> Seq.pearsonOfPairs
|
-
Test the correlation between
MatchesPlayedandAgeusingpearsonOfPairs.
Hints:Seq.pearsonOfPairsexpects a list of tuples (x1 * x2). UseList.mapto get a list of tuples with (MatchesPlayed,Age). Then you only need to pipe (|>) toSeq.pearsonOfPairs.
answerplayerStatsTable
|> List.map(fun x -> x.MatchesPlayed, x.Age)
|> Seq.pearsonOfPairs
val it: float = -0.07750635099
-
Test the correlation between
GoalsScoredandAgeusingpearsonOfPairs.
Hints:Seq.pearsonOfPairsexpects a list of tuples (x1 * x2). UseList.mapto get a list of tuples with (GoalsScored,Age). Then you only need to pipe (|>) toSeq.pearsonOfPairs.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored, x.Age)
|> Seq.pearsonOfPairs
val it: float = 0.01881518088
Further Statistics practice
Now that you should feel confortable with List.filter, List.groupBy, List.splitInto
and also some f# statistics functions, let's combine those concepts together.
1 List.countBy, List.filter and List.averageBy
Example: Find the average goals scored by portuguese players.
In order to find the average goals for portuguese players we know that we need to use List.filter.
But we need to know what is the string correspondent to portuguese players!
Using List.distinct or List.countBy we can observe all the Nation strings, which allow us to see that portuguese Nation string is "pt POR".
playerStatsTable
|> List.countBy(fun x -> x.Nation)
Now that we know what is the Portuguese string we can filter x.Nation = "pt POR" in order to only get portuguese players' rows!
Then we can easily pipe it (|>) to List.averageBy (fun x -> float x.Age) to get the average age of portuguese players.
playerStatsTable
|> List.filter(fun x -> x.Nation = "pt POR")
|> List.averageBy(fun x -> float x.Age)
|
-
Find the average age for players playing on the Premier League .
Hint:
You'll first need to use
List.filterto get only players from the Premier League (x.League = "engPremier League"). Then use averageBy to compute the average by age, don't forget to usefloat x.Ageto transform age values to float type.
answerplayerStatsTable
|> List.filter(fun x -> x.League = "engPremier League")
|> List.averageBy(fun x -> float x.Age)
val it: float = 25.58333333
2. List.groupBy, List.map and transformations.
Example: Group playerStatsTable by Team and compute the average number of GoalsScored.
//example using record:
type TeamAndAvgGls =
{ Team : string
AvgGoalsScored : float }
playerStatsTable
|> List.groupBy(fun x -> x.Team)
|> List.map(fun (team, playerStats) ->
{ Team = team
AvgGoalsScored = playerStats |> List.averageBy(fun playerStats -> float playerStats.GoalsScored)})
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
or
//example using tuple:
playerStatsTable
|> List.groupBy(fun x -> x.Team)
|> List.map(fun (team, playerStats) -> team, playerStats |> List.averageBy(fun playerStats -> float playerStats.GoalsScored))
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
-
Group
playerStatsTablebyLeagueand then compute the AverageAgeby group.
Hint: UsegroupByto group by league (League). Then useaverageByto compute the average by age (Age) and pipe it (|>) toList.mapto organize the data in a record or tuple with League (League) and Average Age.
answer//solution using record:
type LeagueAndAvgAge =
{ League : string
AverageAge : float }
playerStatsTable
|> List.groupBy(fun x -> x.League)
|> List.map(fun (leagues, playerStats) ->
{ League = leagues
AverageAge = playerStats |> List.averageBy(fun playerStats -> float playerStats.Age) })
//solution using tuples:
playerStatsTable
|> List.groupBy(fun x -> x.League)
|> List.map(fun (leagues, playerStats) ->
leagues,
playerStats |> List.averageBy(fun playerStats -> float playerStats.Age) )
type LeagueAndAvgAge =
{
League: string
AverageAge: float
}
val it: (string * float) list =
[("deBundesliga", 27.11111111); ("frLigue 1", 25.7173913);
("esLa Liga", 26.53333333); ("itSerie A", 26.80769231);
("engPremier League", 25.58333333)]
3 List.sortDescending, List.splitInto, List.map and Seq.stDev
-
From
playerStatsTablesort the players'Age(descending), split the dataset into quartiles (4-quantiles) and compute the standard deviation for each quantile.
Hint: You only need theAgefield from the dataset, so you can usemapstraight away to get theAgeList. Sort that List withList.sortDescending, and then split it into 4 parts usingList.splitInto. Finally useList.mapto iterate through each quantile and apply the functionSeq.stDev.
answerplayerStatsTable
|> List.map(fun x -> float x.Age)
|> List.sortDescending
|> List.splitInto 4
|> List.map(fun x -> x |> Seq.stDev)
val it: float list = [2.294714424; 0.9082389329; 0.9171829097; 1.59604102]
val string: value: 'T -> string
--------------------
type string = System.String
System.String.Replace(oldChar: char, newChar: char) : string
System.String.Replace(oldValue: string, newValue: string, comparisonType: System.StringComparison) : string
System.String.Replace(oldValue: string, newValue: string, ignoreCase: bool, culture: System.Globalization.CultureInfo) : string
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
<summary> Contains correlation functions for different data types </summary>
type LiteralAttribute = inherit Attribute new: unit -> LiteralAttribute
--------------------
new: unit -> LiteralAttribute
<summary>Typed representation of a CSV file.</summary> <param name='Sample'>Location of a CSV sample file or a string containing a sample CSV document.</param> <param name='Separators'>Column delimiter(s). Defaults to <c>,</c>.</param> <param name='InferRows'>Number of rows to use for inference. Defaults to <c>1000</c>. If this is zero, all rows are used.</param> <param name='Schema'>Optional column types, in a comma separated list. Valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c>, <c>string</c>, <c>int?</c>, <c>int64?</c>, <c>bool?</c>, <c>float?</c>, <c>decimal?</c>, <c>date?</c>, <c>datetimeoffset?</c>, <c>timespan?</c>, <c>guid?</c>, <c>int option</c>, <c>int64 option</c>, <c>bool option</c>, <c>float option</c>, <c>decimal option</c>, <c>date option</c>, <c>datetimeoffset option</c>, <c>timespan option</c>, <c>guid option</c> and <c>string option</c>. You can also specify a unit and the name of the column like this: <c>Name (type<unit>)</c>, or you can override only the name. If you don't want to specify all the columns, you can reference the columns by name like this: <c>ColumnName=type</c>.</param> <param name='HasHeaders'>Whether the sample contains the names of the columns as its first line.</param> <param name='IgnoreErrors'>Whether to ignore rows that have the wrong number of columns or which can't be parsed using the inferred or specified schema. Otherwise an exception is thrown when these rows are encountered.</param> <param name='SkipRows'>Skips the first n rows of the CSV file.</param> <param name='AssumeMissingValues'>When set to true, the type provider will assume all columns can have missing values, even if in the provided sample all values are present. Defaults to false.</param> <param name='PreferOptionals'>When set to true, inference will prefer to use the option type instead of nullable types, <c>double.NaN</c> or <c>""</c> for missing values. Defaults to false.</param> <param name='Quote'>The quotation mark (for surrounding values containing the delimiter). Defaults to <c>"</c>.</param> <param name='MissingValues'>The set of strings recognized as missing values specified as a comma-separated string (e.g., "NA,N/A"). Defaults to <c>NaN,NA,N/A,#N/A,:,-,TBA,TBD</c>.</param> <param name='CacheRows'>Whether the rows should be caches so they can be iterated multiple times. Defaults to true. Disable for large datasets.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.csv'). This is useful when exposing types generated by the type provider.</param>
module Seq from FSharp.Stats.Correlation
<summary> Contains correlation functions optimized for sequences </summary>
--------------------
module Seq from FSharp.Stats
<summary> Module to compute common statistical measure </summary>
--------------------
module Seq from Microsoft.FSharp.Collections
--------------------
type Seq = new: unit -> Seq static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq
--------------------
new: unit -> Seq
module List from FSharp.Stats
<summary> Module to compute common statistical measure on list </summary>
--------------------
module List from Microsoft.FSharp.Collections
--------------------
type List = new: unit -> List static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float list static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float list
--------------------
type List<'T> = | op_Nil | op_ColonColon of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex: rank: int * offset: int -> int member GetSlice: startIndex: int option * endIndex: int option -> 'T list static member Cons: head: 'T * tail: 'T list -> 'T list member Head: 'T member IsEmpty: bool member Item: index: int -> 'T with get ...
--------------------
new: unit -> List
val float: value: 'T -> float (requires member op_Explicit)
--------------------
type float = System.Double
--------------------
type float<'Measure> = float
<summary> Computes the sample standard deviation </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>standard deviation of a sample (Bessel's correction by N-1)</returns>
<summary> Calculates the pearson correlation of two samples given as a sequence of paired values. Homoscedasticity must be assumed. </summary>
<param name="seq">The input sequence.</param>
<typeparam name="'T"></typeparam>
<returns>The pearson correlation.</returns>
<example><code> // Consider a sequence of paired x and y values: // [(x1, y1); (x2, y2); (x3, y3); (x4, y4); ... ] let xy = [(312.7, 315.5); (104.2, 101.3); (104.0, 108.0); (34.7, 32.2)] // To get the correlation between x and y: xy |> Seq.pearsonOfPairs // evaluates to 0.9997053729 </code></example>
Teaching