Using List collection functions and calculating summary statistics.
Developed with Davide Costa
You should now feel comfortable with the footballer dataset and how to work with tuples, records, anonymous records. You should also know how to perform simple transformations. With a large and heterogeneous dataset, it's useful to understand how to sort, group, and filter the data, and also many other interesting List functions.
It is a good idea to browse the documentation for lists at the F# language reference and the F# core library documentation sites before you start. For further discussion of collection functions, the related F# for fun and profit page is also useful.
Reference needed nuget packages and open namespaces
#r "nuget: FSharp.Data, 5.0.2"
#r "nuget: FSharp.Stats, 0.5.0"
open FSharp.Data
open FSharp.Stats
open FSharp.Stats.Correlation
Load the Csv file.
let [<Literal>] CsvPath = __SOURCE_DIRECTORY__ + "/FootballPlayers.csv"
type FootballPlayersCsv = CsvProvider<CsvPath>
let playerStatsTable =
FootballPlayersCsv.GetSample().Rows
|> Seq.toList
EXERCISES - PART 2
List Functions.
1 List.take
List.take 5
takes the first 5 rows.
List.take 2
takes the first 2 rows
Example: Take the first 4 rows from playerStatsTable
with List.take
.
playerStatsTable
|> List.take 4
|
- Take the first 7 rows from
playerStatsTable
withList.take
.
answerplayerStatsTable
|> List.take 7
val it: CsvProvider<...>.Row list =
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35);
("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Karim Benzema", "fr FRA", "FW", "Real Madrid", "esLa Liga", 33, 32, 27);
("Ciro Immobile", "it ITA", "FW", "Lazio", "itSerie A", 31, 31, 27);
("Wissam Ben Yedder", "fr FRA", "FW", "Monaco", "frLigue 1", 30, 37, 25);
("Patrik Schick", "cz CZE", "FW", "Leverkusen", "deBundesliga", 25, 27, 24);
("Son Heung-min", "kr KOR", "MF,FW", "Tottenham", "engPremier League", 29,
35, 23)]
2 List.truncate
List.truncate 5
takes the first 5 rows.
List.truncate 2
takes the first 2 rows
You must have noted that List.take
and List.truncate
return similar outputs, but these are not exactly the same.
List.take
gives you the exact number of items that you specify in the parameters,
while List.truncate
takes at maximum the number of items you specified in the parameters.
Thus, in most cases both give you the exact same output, except if you ask for more items then the ones available in the List (List length).
In this particular scenario List.truncate
returns the maximum number of elements (all the elements in the List),
while List.take
returns an error, since it is supposed to take the exact number of elements you asked for, which is impossible in this particular case.
Example: Take the first 4 rows from playerStatsTable
with List.truncate
.
playerStatsTable
|> List.truncate 4
|
- Take the first 7 rows from
playerStatsTable
withList.truncate
.
answerplayerStatsTable
|> List.truncate 7
val it: CsvProvider<...>.Row list =
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35);
("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Karim Benzema", "fr FRA", "FW", "Real Madrid", "esLa Liga", 33, 32, 27);
("Ciro Immobile", "it ITA", "FW", "Lazio", "itSerie A", 31, 31, 27);
("Wissam Ben Yedder", "fr FRA", "FW", "Monaco", "frLigue 1", 30, 37, 25);
("Patrik Schick", "cz CZE", "FW", "Leverkusen", "deBundesliga", 25, 27, 24);
("Son Heung-min", "kr KOR", "MF,FW", "Tottenham", "engPremier League", 29,
35, 23)]
3 List.distinct
List.distinct
returns the unique elements from the List.
["hello"; "world"; "hello"; "hi"] |> List.distinct
returns ["hello"; "world"; "hi"]
Example: From playerStatsTable
Nation
field find the unique elements with List.distinct
.
playerStatsTable
|> List.map(fun x -> x.Nation)
|> List.distinct
|
- From
playerStatsTable
League
field find the unique elements withList.distinct
.
answerplayerStatsTable
|> List.map(fun x -> x.League)
|> List.distinct
val it: string list =
["deBundesliga"; "frLigue 1"; "esLa Liga"; "itSerie A"; "engPremier League"]
4 List.countBy
List.countBy
returns a list of paired tuples with the unique elements and their counts.
Example: From playerStatsTable
Team
field find the unique elements and their counts with List.countBy
.
playerStatsTable
|> List.countBy(fun x -> x.Team)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
- From
playerStatsTable
League
field find the unique elements and their counts withList.countBy
.
answerplayerStatsTable
|> List.countBy(fun x -> x.League)
val it: (string * int) list =
[("deBundesliga", 36); ("frLigue 1", 46); ("esLa Liga", 30);
("itSerie A", 52); ("engPremier League", 36)]
5 List.filter
List.filter
allows you to extract a subset of the dataset based on one or multiple conditions.
Example: Filter
the playerStatsTable
to get only portuguese players. (Nation = "pt POR"
).
Remember that we have to look to the dataset to find the string correspondent to portuguese players,
which in this case is "pt POR"
playerStatsTable
|> List.filter(fun x -> x.Nation = "pt POR")
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
Filter
theplayerStatsTable
to get only 16 year-old players. (Age = 16
).
answerplayerStatsTable
|> List.filter(fun x -> x.Age = 16)
val it: CsvProvider<...>.Row list = []
6 List.sort and List.sortDescending
[1; 4; 5; 3; 6] |> List.sort
returns[1; 3; 4; 5; 6]
(ascending sort).[1; 4; 5; 3; 6] |> List.sortDescending
returns[6; 5; 4; 3; 1]
(descending sort).
Example: map playerStatsTable
to get a list of Age
and sort it (ascending).
Since we want to sort the age List we first use List.map
to get only that List.
Then we use List.sort
to sort it.
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.sort
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
|
-
map
playerStatsTable
to get a list ofGoalsScored
and sort it (ascending).
Hint: To sort the GoalsScored List you first need to useList.map
to get only that List. Then useList.sort
to sort it.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.sort
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
val it: int list =
[0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2; 2; 2;
2; 2; 2; 2; 2; 2; 2; 2; 2; 2]
Example: Map playerStatsTable
to get a list of Age
and sort it (descending).
Since we want to sort the age List we first use List.map
to get only that List.
Then we use List.sortDescending
to sort it.
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.sortDescending
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
|
-
Map
playerStatsTable
to get a list ofGoalsScored
and sort it (descending).
Hint: To sort the GoalsScored List you first need to useList.map
to get only that List. Then useList.sortDescending
to sort it.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.sortDescending
|> List.truncate 60 //just to observe the first 60 values, not a part of the exercise.
val it: int list =
[35; 28; 27; 27; 25; 24; 23; 23; 22; 21; 21; 21; 20; 20; 18; 18; 17; 17; 17;
17; 17; 17; 16; 16; 16; 16; 16; 15; 15; 13; 13; 13; 13; 13; 12; 12; 12; 12;
12; 12; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 10; 10; 10; 10; 10; 10;
10; 10; 10]
7 List.sortBy and List.sortByDescending
List.sortBy
is very usefull to sort the dataset accordingly to a certain dataset field.
Example: sort (ascending) playerStatsTable
by Age
(List.sortBy
).
playerStatsTable
|> List.sortBy(fun x -> x.Age)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
- sort (ascending)
playerStatsTable
byGoalsScored
(List.sortBy
).
answerplayerStatsTable
|> List.sortBy(fun x -> x.GoalsScored)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
val it: CsvProvider<...>.Row list =
[("Stefan Ortega", "de GER", "GK", "Arminia", "deBundesliga", 28, 33, 0);
("Rui Patrício", "pt POR", "GK", "Roma", "itSerie A", 33, 38, 0);
("Philipp Pentke", "de GER", "GK", "Hoffenheim", "deBundesliga", 36, 1, 0);
("Pavao Pervan", "at AUT", "GK", "Wolfsburg", "deBundesliga", 33, 6, 0);
("Nick Pope", "eng ENG", "GK", "Burnley", "engPremier League", 29, 36, 0)]
Example: sort (descending) playerStatsTable
by Age
(List.sortByDescending
).
playerStatsTable
|> List.sortByDescending(fun x -> x.Age)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
- sort (descending)
playerStatsTable
byGoalsScored
(List.sortByDescending
).
answerplayerStatsTable
|> List.sortByDescending(fun x -> x.GoalsScored)
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
val it: CsvProvider<...>.Row list =
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35);
("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Karim Benzema", "fr FRA", "FW", "Real Madrid", "esLa Liga", 33, 32, 27);
("Ciro Immobile", "it ITA", "FW", "Lazio", "itSerie A", 31, 31, 27);
("Wissam Ben Yedder", "fr FRA", "FW", "Monaco", "frLigue 1", 30, 37, 25)]
8 List.splitInto
List.splitInto
is very usefull to split your dataset into multiple subsets.
This function is commonly used to generate quantiles by splitting a sorted List.
For instance, for investment strategies financial assets are usually sorted by a certain signal
and then splitted into quantiles. If the signal has a positive sign, it means that the long strategy consists of going long
on the first quantile stocks, and the long-short strategy consists of going long on the first quantile stocks and short on the last quantile stocks.
Note: List.splitInto
receives one parameter which refers to the number of groups you want to create out of the dataset.
Example: Sort the playerStatsTable
by GoalsScored
and then split the dataset into 4 groups using List.sortBy
and List.splitInto
.
playerStatsTable
|> List.sortBy(fun x -> x.GoalsScored)
|> List.splitInto 4
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
|> List.map(fun x -> x |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
|
- Sort the
playerStatsTable
byAge
and then split the dataset into 5 groups usingList.sortBy
andList.splitInto
.
answerplayerStatsTable
|> List.sortBy(fun x -> x.Age)
|> List.splitInto 5
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
|> List.map(fun x -> x |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
val it: CsvProvider<...>.Row list list =
[[("Giorgio Scalvini", "it ITA", "DF,MF", "Atalanta", "itSerie A", 17, 18, 1);
("Alejandro Primo", "es ESP", "GK", "Levante", "esLa Liga", 17, 1, 0);
("Florian Wirtz", "de GER", "MF,FW", "Leverkusen", "deBundesliga", 18, 24,
7); ("Destiny Udogie", "it ITA", "DF", "Udinese", "itSerie A", 18, 35, 5);
("Bukayo Saka", "eng ENG", "FW,MF", "Arsenal", "engPremier League", 19, 38,
11)];
[("Lautaro Martínez", "ar ARG", "FW", "Inter", "itSerie A", 23, 35, 21);
("Christopher Nkunku", "fr FRA", "FW,MF", "RB Leipzig", "deBundesliga", 23,
34, 20);
("Tammy Abraham", "eng ENG", "FW", "Roma", "itSerie A", 23, 37, 17);
("Ludovic Blas", "fr FRA", "MF,FW", "Nantes", "frLigue 1", 23, 35, 10);
("Emmanuel Dennis", "ng NGA", "FW,MF", "Watford", "engPremier League", 23,
33, 10)]]
9 List.groupBy
List.groupBy
allows you to group elements of a list.
It takes a key-generating function and a list as inputs.
The function is executed on each element of the List, returning a list of tuples
where the first element of each tuple is the key and the second is a list of the elements for which the function produced that key.
Example: Group the playerStatsTable
by Nation
using List.groupBy
.
playerStatsTable
|> List.groupBy(fun x -> x.Nation)
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
|> List.map(fun (x, xs) -> x, xs |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
|
- Group the
playerStatsTable
byAge
usingList.groupBy
.
answerplayerStatsTable
|> List.groupBy(fun x -> x.Age)
|> List.map(fun (x, xs) -> x, xs |> List.truncate 5) //just to observe the first 5 rows of each group List, not a part of the exercise.
|> List.truncate 2 //just to observe the first 2 groups Lists, not a part of the exercise.
val it: (int * CsvProvider<...>.Row list) list =
[(32,
[("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga",
32, 34, 35);
("Marco Reus", "de GER", "MF,FW", "Dortmund", "deBundesliga", 32, 29, 9);
("Ivan Perišić", "hr CRO", "DF", "Inter", "itSerie A", 32, 35, 8);
("Axel Witsel", "be BEL", "MF,DF", "Dortmund", "deBundesliga", 32, 29, 2);
("Ivan Radovanović", "rs SRB", "DF,MF", "Salernitana", "itSerie A", 32,
14, 1)]);
(22,
[("Kylian Mbappé", "fr FRA", "FW", "Paris S-G", "frLigue 1", 22, 35, 28);
("Gianluca Scamacca", "it ITA", "FW", "Sassuolo", "itSerie A", 22, 36, 16);
("Moussa Diaby", "fr FRA", "FW,MF", "Leverkusen", "deBundesliga", 22, 32,
13);
("Randal Kolo Muani", "fr FRA", "FW,MF", "Nantes", "frLigue 1", 22, 36,
12);
("Mason Mount", "eng ENG", "MF", "Chelsea", "engPremier League", 22, 32,
11)])]
Statistics List Functions
1 List.max
[1; 4; 5; 3; 6] |> List.max
returns 6
(the highest value in the List).
Example: Map playerStatsTable
to get the Age
List, and find the maximum (List.max
).
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.max
|
- Map
playerStatsTable
to get theGoalsScored
List, and find the maximum (List.max
).
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.max
val it: int = 35
2 List.min
[1; 4; 5; 3; 6] |> List.min
returns 1
(the lowest value in the List).
Example: Map playerStatsTable
to get the Age
List, and find the minimum (List.min
).
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.min
|
- Map
playerStatsTable
to get theGoalsScored
List, and find the minimum (List.min
).
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.min
val it: int = 0
3 List.maxBy
Sometimes you want the element with the "maximum y" where "y" is the result of applying a particular function to a list element. This is what List.maxBy
is for. This function is best understood by seeing an example.
Example: Find the player in playerStatsTable
with the maximum Age
using maxBy
. What we need to do then is write a function that takes a player as input and outputs the players age. List.maxBy
will then find the player that is the maxiumum after transforming it using this function.
playerStatsTable
|> List.maxBy(fun x -> x.Age)
|
- Find the maximum
playerStatsTable
row byGoalsScored
usingmaxBy
.
answerplayerStatsTable
|> List.maxBy(fun x -> x.GoalsScored)
val it: CsvProvider<...>.Row =
("Robert Lewandowski", "pl POL", "FW", "Bayern Munich", "deBundesliga", 32,
34, 35)
4 List.minBy
Sometimes you want the element with the "minimum y" where "y" is the result of applying a particular function to a list element. This is what List.minBy
is for.
Example: Find the player in playerStatsTable
with the minimum Age
using minBy
.
playerStatsTable
|> List.minBy(fun x -> x.Age)
|
- Find the minimum
playerStatsTable
row byGoalsScored
usingminBy
.
answerplayerStatsTable
|> List.minBy(fun x -> x.GoalsScored)
val it: CsvProvider<...>.Row =
("Stefan Ortega", "de GER", "GK", "Arminia", "deBundesliga", 28, 33, 0)
5 List.sum
[1; 4; 5; 3; 6] |> List.sum
returns 19
(sum of the List elements).
Example: Calculate the total number of years lived by all players. Hint: transform (List.map
) each element of playerStatsTable
into an integer representing the player's Age
and then get the sum (List.sum
) of all the players' ages (the result should be an int
).
playerStatsTable
|> List.map(fun x -> x.Age)
|> List.sum
|
- Calculate the total goals scored (
GoalsScored
) by all players inplayerStatsTable
.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored)
|> List.sum
val it: int = 1470
6 List.sumBy
We are using a dataset that has multiple fields per List element. If you want to get the sum for particular fields it convenient to use List.sumBy
.
It takes a function and transforms each element using that function and afterward sums all the transformed elements. It is like an List.map
and List.sum
combined into one function.
Example: Use List.sumBy
to calculate the total number of years lived by all players in playerStatsTable
. Remember that each player has lived Age
years.
playerStatsTable
|> List.sumBy(fun x -> x.Age)
|
- Find the sum of the
GoalsScored
by all players inplayerStatsTable
usingList.sumBy
.
answerplayerStatsTable
|> List.sumBy(fun x -> x.GoalsScored)
val it: int = 1470
7 List.average
[1.0; 2.0; 5.0; 2.0] |> List.average
returns 2.5
(the average of all the List elements).
Example: Transform playerStatsTable
into a list of the players' ages (Age
) and find the average Age
(List.average
).
The field x.Age
needs to be transformed from int
to float
because List.average
only works with floats
or decimals
.
playerStatsTable
|> List.map(fun x -> float x.Age)
|> List.average
|
-
Use
List.map
to transformplayerStatsTable
into a list of the players'GoalsScored
and find the averageGoalsScored
(List.average
).
Hint: The variablex.GoalsScored
needs to be transformed fromint
tofloat
sinceList.average
only works withfloats
ordecimals
.
answerplayerStatsTable
|> List.map(fun x -> float x.GoalsScored)
|> List.average
val it: float = 7.35
8 List.averageBy
We are using a dataset that has multiple fields per List element. If you want to get the average for particular fields it convenient to use List.averageBy
.
It takes a function and transforms each element using that function and afterward averages all the transformed elements. It is like an List.map
and List.average
combined into one function.
Example: Find the average Age
using List.averageBy
.
The Age
needs to be transformed from int
to float
since List.averageBy
only works with floats
or decimals
.
playerStatsTable
|> List.averageBy(fun x -> float x.Age)
|
-
Find the average
GoalsScored
usingList.averageBy
.
Hint: TheGoalsScored
needs to be transformed fromint
tofloat
sinceList.averageBy
only works withfloats
ordecimals
.
answerplayerStatsTable
|> List.averageBy(fun x -> float x.GoalsScored)
val it: float = 7.35
9 Seq.stDev
For Seq.stDev
to work, we loaded the FSharp.Stats nuget
(#r "nuget: FSharp.Stats, 0.5.0"
).
This nuget contains the standard deviation function.
Besides this we also opened the module FSharp.Stats
(open FSharp.Stats
).
FSharp.Stats documentation
Example: Use List.map
to transform playerStatsTable
by GoalsScored
and find the standard deviation. (Seq.stDev
).
Note that for Seq.stDev
to work the values need to be floats
or decimals
, so we need to transform the GoalsScored
from int
to float
.
playerStatsTable
|> List.map(fun x -> float x.GoalsScored)
|> Seq.stDev
|
-
Transform
playerStatsTable
into a list of the players'Age
's and find the standard deviation. (Seq.stDev
).
Hint: You need to transformAge
values fromint
tofloats
.
answerplayerStatsTable
|> List.map(fun x -> float x.Age)
|> Seq.stDev
val it: float = 4.343018426
10 Seq.pearsonOfPairs
In order to perform correlations we have to load and open the namespace FSharp.Stats
.
Also, we open FSharpe.Stats.Correlation
to allow a easier access to the correlation functions.
It will be helpfull to check the FSharp.Stats.Correlation Documentation before starting the exercises.
Example: Test the correlation between MatchesPlayed
and GoalsScored
using pearsonOfPairs
.
Seq.pearsonOfPairs
expects a list of tuples (x1 * x2), computing the correlation between x1 and x2.
So we use List.map
to get a list of tuples with (MatchesPlayed
, GoalsScored
).
Then we only need to pipe (|>
) to Seq.pearsonOfPairs
.
playerStatsTable
|> List.map(fun x -> x.MatchesPlayed, x.GoalsScored)
|> Seq.pearsonOfPairs
|
-
Test the correlation between
MatchesPlayed
andAge
usingpearsonOfPairs
.
Hints:Seq.pearsonOfPairs
expects a list of tuples (x1 * x2). UseList.map
to get a list of tuples with (MatchesPlayed
,Age
). Then you only need to pipe (|>
) toSeq.pearsonOfPairs
.
answerplayerStatsTable
|> List.map(fun x -> x.MatchesPlayed, x.Age)
|> Seq.pearsonOfPairs
val it: float = -0.07750635099
-
Test the correlation between
GoalsScored
andAge
usingpearsonOfPairs
.
Hints:Seq.pearsonOfPairs
expects a list of tuples (x1 * x2). UseList.map
to get a list of tuples with (GoalsScored
,Age
). Then you only need to pipe (|>
) toSeq.pearsonOfPairs
.
answerplayerStatsTable
|> List.map(fun x -> x.GoalsScored, x.Age)
|> Seq.pearsonOfPairs
val it: float = 0.01881518088
Further Statistics practice
Now that you should feel confortable with List.filter
, List.groupBy
, List.splitInto
and also some f# statistics functions, let's combine those concepts together.
1 List.countBy, List.filter and List.averageBy
Example: Find the average goals scored by portuguese players.
In order to find the average goals for portuguese players we know that we need to use List.filter
.
But we need to know what is the string correspondent to portuguese players!
Using List.distinct
or List.countBy
we can observe all the Nation
strings, which allow us to see that portuguese Nation string is "pt POR"
.
playerStatsTable
|> List.countBy(fun x -> x.Nation)
Now that we know what is the Portuguese string we can filter x.Nation = "pt POR"
in order to only get portuguese players' rows!
Then we can easily pipe it (|>
) to List.averageBy (fun x -> float x.Age)
to get the average age of portuguese players.
playerStatsTable
|> List.filter(fun x -> x.Nation = "pt POR")
|> List.averageBy(fun x -> float x.Age)
|
-
Find the average age for players playing on the Premier League .
Hint:
You'll first need to use
List.filter
to get only players from the Premier League (x.League = "engPremier League"
). Then use averageBy to compute the average by age, don't forget to usefloat x.Age
to transform age values to float type.
answerplayerStatsTable
|> List.filter(fun x -> x.League = "engPremier League")
|> List.averageBy(fun x -> float x.Age)
val it: float = 25.58333333
2. List.groupBy, List.map and transformations.
Example: Group playerStatsTable
by Team
and compute the average number of GoalsScored
.
//example using record:
type TeamAndAvgGls =
{ Team : string
AvgGoalsScored : float }
playerStatsTable
|> List.groupBy(fun x -> x.Team)
|> List.map(fun (team, playerStats) ->
{ Team = team
AvgGoalsScored = playerStats |> List.averageBy(fun playerStats -> float playerStats.GoalsScored)})
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
or
//example using tuple:
playerStatsTable
|> List.groupBy(fun x -> x.Team)
|> List.map(fun (team, playerStats) -> team, playerStats |> List.averageBy(fun playerStats -> float playerStats.GoalsScored))
|> List.truncate 5 //just to observe the first 5 rows, not a part of the exercise.
|
-
Group
playerStatsTable
byLeague
and then compute the AverageAge
by group.
Hint: UsegroupBy
to group by league (League
). Then useaverageBy
to compute the average by age (Age
) and pipe it (|>
) toList.map
to organize the data in a record or tuple with League (League
) and Average Age.
answer//solution using record:
type LeagueAndAvgAge =
{ League : string
AverageAge : float }
playerStatsTable
|> List.groupBy(fun x -> x.League)
|> List.map(fun (leagues, playerStats) ->
{ League = leagues
AverageAge = playerStats |> List.averageBy(fun playerStats -> float playerStats.Age) })
//solution using tuples:
playerStatsTable
|> List.groupBy(fun x -> x.League)
|> List.map(fun (leagues, playerStats) ->
leagues,
playerStats |> List.averageBy(fun playerStats -> float playerStats.Age) )
type LeagueAndAvgAge =
{
League: string
AverageAge: float
}
val it: (string * float) list =
[("deBundesliga", 27.11111111); ("frLigue 1", 25.7173913);
("esLa Liga", 26.53333333); ("itSerie A", 26.80769231);
("engPremier League", 25.58333333)]
3 List.sortDescending, List.splitInto, List.map and Seq.stDev
-
From
playerStatsTable
sort the players'Age
(descending), split the dataset into quartiles (4-quantiles) and compute the standard deviation for each quantile.
Hint: You only need theAge
field from the dataset, so you can usemap
straight away to get theAge
List. Sort that List withList.sortDescending
, and then split it into 4 parts usingList.splitInto
. Finally useList.map
to iterate through each quantile and apply the functionSeq.stDev
.
answerplayerStatsTable
|> List.map(fun x -> float x.Age)
|> List.sortDescending
|> List.splitInto 4
|> List.map(fun x -> x |> Seq.stDev)
val it: float list = [2.294714424; 0.9082389329; 0.9171829097; 1.59604102]
val string: value: 'T -> string
--------------------
type string = System.String
System.String.Replace(oldChar: char, newChar: char) : string
System.String.Replace(oldValue: string, newValue: string, comparisonType: System.StringComparison) : string
System.String.Replace(oldValue: string, newValue: string, ignoreCase: bool, culture: System.Globalization.CultureInfo) : string
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
<summary> Contains correlation functions for different data types </summary>
type LiteralAttribute = inherit Attribute new: unit -> LiteralAttribute
--------------------
new: unit -> LiteralAttribute
<summary>Typed representation of a CSV file.</summary> <param name='Sample'>Location of a CSV sample file or a string containing a sample CSV document.</param> <param name='Separators'>Column delimiter(s). Defaults to <c>,</c>.</param> <param name='InferRows'>Number of rows to use for inference. Defaults to <c>1000</c>. If this is zero, all rows are used.</param> <param name='Schema'>Optional column types, in a comma separated list. Valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c>, <c>string</c>, <c>int?</c>, <c>int64?</c>, <c>bool?</c>, <c>float?</c>, <c>decimal?</c>, <c>date?</c>, <c>datetimeoffset?</c>, <c>timespan?</c>, <c>guid?</c>, <c>int option</c>, <c>int64 option</c>, <c>bool option</c>, <c>float option</c>, <c>decimal option</c>, <c>date option</c>, <c>datetimeoffset option</c>, <c>timespan option</c>, <c>guid option</c> and <c>string option</c>. You can also specify a unit and the name of the column like this: <c>Name (type<unit>)</c>, or you can override only the name. If you don't want to specify all the columns, you can reference the columns by name like this: <c>ColumnName=type</c>.</param> <param name='HasHeaders'>Whether the sample contains the names of the columns as its first line.</param> <param name='IgnoreErrors'>Whether to ignore rows that have the wrong number of columns or which can't be parsed using the inferred or specified schema. Otherwise an exception is thrown when these rows are encountered.</param> <param name='SkipRows'>Skips the first n rows of the CSV file.</param> <param name='AssumeMissingValues'>When set to true, the type provider will assume all columns can have missing values, even if in the provided sample all values are present. Defaults to false.</param> <param name='PreferOptionals'>When set to true, inference will prefer to use the option type instead of nullable types, <c>double.NaN</c> or <c>""</c> for missing values. Defaults to false.</param> <param name='Quote'>The quotation mark (for surrounding values containing the delimiter). Defaults to <c>"</c>.</param> <param name='MissingValues'>The set of strings recognized as missing values specified as a comma-separated string (e.g., "NA,N/A"). Defaults to <c>NaN,NA,N/A,#N/A,:,-,TBA,TBD</c>.</param> <param name='CacheRows'>Whether the rows should be caches so they can be iterated multiple times. Defaults to true. Disable for large datasets.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.csv'). This is useful when exposing types generated by the type provider.</param>
module Seq from FSharp.Stats.Correlation
<summary> Contains correlation functions optimized for sequences </summary>
--------------------
module Seq from FSharp.Stats
<summary> Module to compute common statistical measure </summary>
--------------------
module Seq from Microsoft.FSharp.Collections
--------------------
type Seq = new: unit -> Seq static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq
--------------------
new: unit -> Seq
module List from FSharp.Stats
<summary> Module to compute common statistical measure on list </summary>
--------------------
module List from Microsoft.FSharp.Collections
--------------------
type List = new: unit -> List static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float list static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float list
--------------------
type List<'T> = | op_Nil | op_ColonColon of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex: rank: int * offset: int -> int member GetSlice: startIndex: int option * endIndex: int option -> 'T list static member Cons: head: 'T * tail: 'T list -> 'T list member Head: 'T member IsEmpty: bool member Item: index: int -> 'T with get ...
--------------------
new: unit -> List
val float: value: 'T -> float (requires member op_Explicit)
--------------------
type float = System.Double
--------------------
type float<'Measure> = float
<summary> Computes the sample standard deviation </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>standard deviation of a sample (Bessel's correction by N-1)</returns>
<summary> Calculates the pearson correlation of two samples given as a sequence of paired values. Homoscedasticity must be assumed. </summary>
<param name="seq">The input sequence.</param>
<typeparam name="'T"></typeparam>
<returns>The pearson correlation.</returns>
<example><code> // Consider a sequence of paired x and y values: // [(x1, y1); (x2, y2); (x3, y3); (x4, y4); ... ] let xy = [(312.7, 315.5); (104.2, 101.3); (104.0, 108.0); (34.7, 32.2)] // To get the correlation between x and y: xy |> Seq.pearsonOfPairs // evaluates to 0.9997053729 </code></example>