Overview

Dataset statistics

Number of variables 10
Number of observations 16819
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 1.4 MiB
Average record size in memory 88.0 B

Variable types

Categorical 10

Alerts

discipline_title has a high cardinality: 67 distinct values High cardinality
slug_game has a high cardinality: 53 distinct values High cardinality
event_title has a high cardinality: 1192 distinct values High cardinality
athlete_full_name has a high cardinality: 12074 distinct values High cardinality
country_name has a high cardinality: 141 distinct values High cardinality
country_code has a high cardinality: 140 distinct values High cardinality
country_3_letter_code has a high cardinality: 141 distinct values High cardinality
discipline_title is highly overall correlated with event_gender and 1 other fields High correlation
event_gender is highly overall correlated with discipline_title High correlation
participant_type is highly overall correlated with discipline_title High correlation
athlete_full_name is uniformly distributed Uniform

Reproduction

Analysis started 2023-07-24 03:59:27.149715
Analysis finished 2023-07-24 03:59:30.139575
Duration 2.99 seconds
Software version pandas-profiling v3.6.6
Download configuration config.json

Variables

discipline_title
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct 67
Distinct (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
Athletics
2583 
Swimming
1386 
Wrestling
1220 
Boxing
 
941
Shooting
 
722
Other values (62)
9967 

Length

Max length 25
Median length 20
Mean length 10.075688
Min length 4

Characters and Unicode

Total characters 169463
Distinct characters 45
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Curling
2nd row Curling
3rd row Curling
4th row Curling
5th row Curling

Common Values

Value Count Frequency (%)
Athletics 2583
 
15.4%
Swimming 1386
 
8.2%
Wrestling 1220
 
7.3%
Boxing 941
 
5.6%
Shooting 722
 
4.3%
Canoe Sprint 650
 
3.9%
Gymnastics Artistic 642
 
3.8%
Rowing 618
 
3.7%
Weightlifting 606
 
3.6%
Sailing 577
 
3.4%
Other values (57) 6874
40.9%

Length

2023-07-24T03:59:30.315934 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
athletics 2583
 
11.4%
swimming 1473
 
6.5%
wrestling 1220
 
5.4%
skiing 1042
 
4.6%
skating 982
 
4.4%
boxing 941
 
4.2%
canoe 815
 
3.6%
shooting 722
 
3.2%
gymnastics 711
 
3.2%
artistic 685
 
3.0%
Other values (64) 11392
50.5%

Most occurring characters

Value Count Frequency (%)
i 22620
 
13.3%
n 16174
 
9.5%
t 13839
 
8.2%
g 11324
 
6.7%
e 10555
 
6.2%
s 8696
 
5.1%
l 7638
 
4.5%
o 7539
 
4.4%
c 6002
 
3.5%
S 5899
 
3.5%
Other values (35) 59177
34.9%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 141952
83.8%
Uppercase Letter 21761
 
12.8%
Space Separator 5750
 
3.4%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 22620
15.9%
n 16174
11.4%
t 13839
9.7%
g 11324
 
8.0%
e 10555
 
7.4%
s 8696
 
6.1%
l 7638
 
5.4%
o 7539
 
5.3%
c 6002
 
4.2%
r 5711
 
4.0%
Other values (15) 31854
22.4%
Uppercase Letter
Value Count Frequency (%)
S 5899
27.1%
A 3884
17.8%
C 2385
11.0%
W 1826
 
8.4%
B 1641
 
7.5%
T 1427
 
6.6%
F 862
 
4.0%
R 807
 
3.7%
J 784
 
3.6%
G 733
 
3.4%
Other values (9) 1513
 
7.0%
Space Separator
Value Count Frequency (%)
5750
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 163713
96.6%
Common 5750
 
3.4%

Most frequent character per script

Latin
Value Count Frequency (%)
i 22620
13.8%
n 16174
 
9.9%
t 13839
 
8.5%
g 11324
 
6.9%
e 10555
 
6.4%
s 8696
 
5.3%
l 7638
 
4.7%
o 7539
 
4.6%
c 6002
 
3.7%
S 5899
 
3.6%
Other values (34) 53427
32.6%
Common
Value Count Frequency (%)
5750
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 169463
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
i 22620
 
13.3%
n 16174
 
9.5%
t 13839
 
8.2%
g 11324
 
6.7%
e 10555
 
6.2%
s 8696
 
5.1%
l 7638
 
4.5%
o 7539
 
4.4%
c 6002
 
3.5%
S 5899
 
3.5%
Other values (35) 59177
34.9%

slug_game
Categorical

Distinct 53
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
tokyo-2020
 
1006
rio-2016
 
912
beijing-2008
 
896
london-2012
 
895
athens-2004
 
878
Other values (48)
12232 

Length

Max length 27
Median length 19
Mean length 12.043403
Min length 8

Characters and Unicode

Total characters 202558
Distinct characters 36
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row beijing-2022
2nd row beijing-2022
3rd row beijing-2022
4th row beijing-2022
5th row beijing-2022

Common Values

Value Count Frequency (%)
tokyo-2020 1006
 
6.0%
rio-2016 912
 
5.4%
beijing-2008 896
 
5.3%
london-2012 895
 
5.3%
athens-2004 878
 
5.2%
sydney-2000 876
 
5.2%
atlanta-1996 772
 
4.6%
barcelona-1992 654
 
3.9%
los-angeles-1984 604
 
3.6%
seoul-1988 548
 
3.3%
Other values (43) 8778
52.2%

Length

2023-07-24T03:59:30.715748 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
tokyo-2020 1006
 
6.0%
rio-2016 912
 
5.4%
beijing-2008 896
 
5.3%
london-2012 895
 
5.3%
athens-2004 878
 
5.2%
sydney-2000 876
 
5.2%
atlanta-1996 772
 
4.6%
barcelona-1992 654
 
3.9%
los-angeles-1984 604
 
3.6%
seoul-1988 548
 
3.3%
Other values (43) 8778
52.2%

Most occurring characters

Value Count Frequency (%)
- 19216
 
9.5%
0 14222
 
7.0%
o 13281
 
6.6%
2 12847
 
6.3%
1 12657
 
6.2%
n 12467
 
6.2%
9 11772
 
5.8%
e 10879
 
5.4%
a 9944
 
4.9%
l 9069
 
4.5%
Other values (26) 76204
37.6%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 116066
57.3%
Decimal Number 67276
33.2%
Dash Punctuation 19216
 
9.5%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
o 13281
11.4%
n 12467
10.7%
e 10879
 
9.4%
a 9944
 
8.6%
l 9069
 
7.8%
i 7755
 
6.7%
s 7508
 
6.5%
t 6831
 
5.9%
r 5581
 
4.8%
y 4102
 
3.5%
Other values (15) 28649
24.7%
Decimal Number
Value Count Frequency (%)
0 14222
21.1%
2 12847
19.1%
1 12657
18.8%
9 11772
17.5%
8 5361
 
8.0%
6 4499
 
6.7%
4 3450
 
5.1%
7 1031
 
1.5%
5 743
 
1.1%
3 694
 
1.0%
Dash Punctuation
Value Count Frequency (%)
- 19216
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 116066
57.3%
Common 86492
42.7%

Most frequent character per script

Latin
Value Count Frequency (%)
o 13281
11.4%
n 12467
10.7%
e 10879
 
9.4%
a 9944
 
8.6%
l 9069
 
7.8%
i 7755
 
6.7%
s 7508
 
6.5%
t 6831
 
5.9%
r 5581
 
4.8%
y 4102
 
3.5%
Other values (15) 28649
24.7%
Common
Value Count Frequency (%)
- 19216
22.2%
0 14222
16.4%
2 12847
14.9%
1 12657
14.6%
9 11772
13.6%
8 5361
 
6.2%
6 4499
 
5.2%
4 3450
 
4.0%
7 1031
 
1.2%
5 743
 
0.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 202558
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
- 19216
 
9.5%
0 14222
 
7.0%
o 13281
 
6.6%
2 12847
 
6.3%
1 12657
 
6.2%
n 12467
 
6.2%
9 11772
 
5.8%
e 10879
 
5.4%
a 9944
 
4.9%
l 9069
 
4.5%
Other values (26) 76204
37.6%

event_title
Categorical

Distinct 1192
Distinct (%) 7.1%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
Individual men
 
208
individual mixed
 
183
doubles men
 
162
1500m men
 
154
doubles women
 
138
Other values (1187)
15974 

Length

Max length 52
Median length 43
Mean length 20.328795
Min length 3

Characters and Unicode

Total characters 341910
Distinct characters 75
Distinct categories 11 ?
Distinct scripts 2 ?
Distinct blocks 4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 2 ?
Unique (%) < 0.1%

Sample

1st row Mixed Doubles
2nd row Mixed Doubles
3rd row Mixed Doubles
4th row Mixed Doubles
5th row Mixed Doubles

Common Values

Value Count Frequency (%)
Individual men 208
 
1.2%
individual mixed 183
 
1.1%
doubles men 162
 
1.0%
1500m men 154
 
0.9%
doubles women 138
 
0.8%
5000m men 136
 
0.8%
Singles men 135
 
0.8%
10000m men 129
 
0.8%
Singles women 128
 
0.8%
Individual women 125
 
0.7%
Other values (1182) 15321
91.1%

Length

2023-07-24T03:59:31.458172 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
men 10209
 
18.4%
women 4409
 
7.9%
individual 1774
 
3.2%
freestyle 1259
 
2.3%
kilograms 974
 
1.8%
mixed 969
 
1.7%
double 801
 
1.4%
751
 
1.4%
100m 681
 
1.2%
200m 653
 
1.2%
Other values (638) 32991
59.5%

Most occurring characters

Value Count Frequency (%)
39230
 
11.5%
e 35649
 
10.4%
m 25327
 
7.4%
n 24356
 
7.1%
i 17486
 
5.1%
o 15594
 
4.6%
l 14683
 
4.3%
a 13019
 
3.8%
t 12252
 
3.6%
0 12116
 
3.5%
Other values (65) 132198
38.7%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 249937
73.1%
Space Separator 39230
 
11.5%
Decimal Number 31558
 
9.2%
Uppercase Letter 11810
 
3.5%
Dash Punctuation 2789
 
0.8%
Other Punctuation 2361
 
0.7%
Close Punctuation 1450
 
0.4%
Open Punctuation 1450
 
0.4%
Math Symbol 1105
 
0.3%
Final Punctuation 215
 
0.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 35649
14.3%
m 25327
 
10.1%
n 24356
 
9.7%
i 17486
 
7.0%
o 15594
 
6.2%
l 14683
 
5.9%
a 13019
 
5.2%
t 12252
 
4.9%
s 11663
 
4.7%
r 10565
 
4.2%
Other values (17) 69343
27.7%
Uppercase Letter
Value Count Frequency (%)
M 2190
18.5%
F 1144
9.7%
S 1081
 
9.2%
I 836
 
7.1%
W 792
 
6.7%
G 679
 
5.7%
R 667
 
5.6%
H 591
 
5.0%
K 557
 
4.7%
P 514
 
4.4%
Other values (14) 2759
23.4%
Decimal Number
Value Count Frequency (%)
0 12116
38.4%
1 4169
 
13.2%
5 3925
 
12.4%
2 2682
 
8.5%
6 2092
 
6.6%
7 1949
 
6.2%
4 1389
 
4.4%
8 1384
 
4.4%
3 1220
 
3.9%
9 632
 
2.0%
Other Punctuation
Value Count Frequency (%)
, 1042
44.1%
' 979
41.5%
. 307
 
13.0%
/ 21
 
0.9%
: 12
 
0.5%
Math Symbol
Value Count Frequency (%)
≤ 908
82.2%
+ 104
 
9.4%
> 93
 
8.4%
Space Separator
Value Count Frequency (%)
39230
100.0%
Dash Punctuation
Value Count Frequency (%)
- 2789
100.0%
Close Punctuation
Value Count Frequency (%)
) 1450
100.0%
Open Punctuation
Value Count Frequency (%)
( 1450
100.0%
Final Punctuation
Value Count Frequency (%)
’ 215
100.0%
Other Number
Value Count Frequency (%)
½ 5
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 261747
76.6%
Common 80163
 
23.4%

Most frequent character per script

Latin
Value Count Frequency (%)
e 35649
13.6%
m 25327
 
9.7%
n 24356
 
9.3%
i 17486
 
6.7%
o 15594
 
6.0%
l 14683
 
5.6%
a 13019
 
5.0%
t 12252
 
4.7%
s 11663
 
4.5%
r 10565
 
4.0%
Other values (41) 81153
31.0%
Common
Value Count Frequency (%)
39230
48.9%
0 12116
 
15.1%
1 4169
 
5.2%
5 3925
 
4.9%
- 2789
 
3.5%
2 2682
 
3.3%
6 2092
 
2.6%
7 1949
 
2.4%
) 1450
 
1.8%
( 1450
 
1.8%
Other values (14) 8311
 
10.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 340584
99.6%
Math Operators 908
 
0.3%
Punctuation 215
 
0.1%
None 203
 
0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
39230
 
11.5%
e 35649
 
10.5%
m 25327
 
7.4%
n 24356
 
7.2%
i 17486
 
5.1%
o 15594
 
4.6%
l 14683
 
4.3%
a 13019
 
3.8%
t 12252
 
3.6%
0 12116
 
3.6%
Other values (59) 130872
38.4%
Math Operators
Value Count Frequency (%)
≤ 908
100.0%
Punctuation
Value Count Frequency (%)
’ 215
100.0%
None
Value Count Frequency (%)
é 186
91.6%
à 6
 
3.0%
É 6
 
3.0%
½ 5
 
2.5%

event_gender
Categorical

Distinct 4
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
Men
10842 
Women
5003 
Open
 
638
Mixed
 
336

Length

Max length 5
Median length 3
Mean length 3.6728105
Min length 3

Characters and Unicode

Total characters 61773
Distinct characters 11
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Mixed
2nd row Mixed
3rd row Mixed
4th row Mixed
5th row Mixed

Common Values

Value Count Frequency (%)
Men 10842
64.5%
Women 5003
29.7%
Open 638
 
3.8%
Mixed 336
 
2.0%

Length

2023-07-24T03:59:32.030661 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-24T03:59:32.537865 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
men 10842
64.5%
women 5003
29.7%
open 638
 
3.8%
mixed 336
 
2.0%

Most occurring characters

Value Count Frequency (%)
e 16819
27.2%
n 16483
26.7%
M 11178
18.1%
W 5003
 
8.1%
o 5003
 
8.1%
m 5003
 
8.1%
O 638
 
1.0%
p 638
 
1.0%
i 336
 
0.5%
x 336
 
0.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 44954
72.8%
Uppercase Letter 16819
 
27.2%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 16819
37.4%
n 16483
36.7%
o 5003
 
11.1%
m 5003
 
11.1%
p 638
 
1.4%
i 336
 
0.7%
x 336
 
0.7%
d 336
 
0.7%
Uppercase Letter
Value Count Frequency (%)
M 11178
66.5%
W 5003
29.7%
O 638
 
3.8%

Most occurring scripts

Value Count Frequency (%)
Latin 61773
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 16819
27.2%
n 16483
26.7%
M 11178
18.1%
W 5003
 
8.1%
o 5003
 
8.1%
m 5003
 
8.1%
O 638
 
1.0%
p 638
 
1.0%
i 336
 
0.5%
x 336
 
0.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 61773
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 16819
27.2%
n 16483
26.7%
M 11178
18.1%
W 5003
 
8.1%
o 5003
 
8.1%
m 5003
 
8.1%
O 638
 
1.0%
p 638
 
1.0%
i 336
 
0.5%
x 336
 
0.5%

medal_type
Categorical

Distinct 3
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
BRONZE
5959 
SILVER
5451 
GOLD
5409 

Length

Max length 6
Median length 6
Mean length 5.3567989
Min length 4

Characters and Unicode

Total characters 90096
Distinct characters 12
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row GOLD
2nd row GOLD
3rd row SILVER
4th row SILVER
5th row BRONZE

Common Values

Value Count Frequency (%)
BRONZE 5959
35.4%
SILVER 5451
32.4%
GOLD 5409
32.2%

Length

2023-07-24T03:59:33.002861 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-24T03:59:33.515365 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
bronze 5959
35.4%
silver 5451
32.4%
gold 5409
32.2%

Most occurring characters

Value Count Frequency (%)
R 11410
12.7%
E 11410
12.7%
O 11368
12.6%
L 10860
12.1%
B 5959
6.6%
N 5959
6.6%
Z 5959
6.6%
S 5451
6.1%
I 5451
6.1%
V 5451
6.1%
Other values (2) 10818
12.0%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 90096
100.0%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
R 11410
12.7%
E 11410
12.7%
O 11368
12.6%
L 10860
12.1%
B 5959
6.6%
N 5959
6.6%
Z 5959
6.6%
S 5451
6.1%
I 5451
6.1%
V 5451
6.1%
Other values (2) 10818
12.0%

Most occurring scripts

Value Count Frequency (%)
Latin 90096
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
R 11410
12.7%
E 11410
12.7%
O 11368
12.6%
L 10860
12.1%
B 5959
6.6%
N 5959
6.6%
Z 5959
6.6%
S 5451
6.1%
I 5451
6.1%
V 5451
6.1%
Other values (2) 10818
12.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 90096
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
R 11410
12.7%
E 11410
12.7%
O 11368
12.6%
L 10860
12.1%
B 5959
6.6%
N 5959
6.6%
Z 5959
6.6%
S 5451
6.1%
I 5451
6.1%
V 5451
6.1%
Other values (2) 10818
12.0%

participant_type
Categorical

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
Athlete
14021 
GameTeam
2798 

Length

Max length 8
Median length 7
Mean length 7.1663595
Min length 7

Characters and Unicode

Total characters 120531
Distinct characters 9
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row GameTeam
2nd row GameTeam
3rd row GameTeam
4th row GameTeam
5th row GameTeam

Common Values

Value Count Frequency (%)
Athlete 14021
83.4%
GameTeam 2798
 
16.6%

Length

2023-07-24T03:59:33.876984 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-24T03:59:34.157053 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Value Count Frequency (%)
athlete 14021
83.4%
gameteam 2798
 
16.6%

Most occurring characters

Value Count Frequency (%)
e 33638
27.9%
t 28042
23.3%
A 14021
11.6%
h 14021
11.6%
l 14021
11.6%
a 5596
 
4.6%
m 5596
 
4.6%
G 2798
 
2.3%
T 2798
 
2.3%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 100914
83.7%
Uppercase Letter 19617
 
16.3%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 33638
33.3%
t 28042
27.8%
h 14021
13.9%
l 14021
13.9%
a 5596
 
5.5%
m 5596
 
5.5%
Uppercase Letter
Value Count Frequency (%)
A 14021
71.5%
G 2798
 
14.3%
T 2798
 
14.3%

Most occurring scripts

Value Count Frequency (%)
Latin 120531
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 33638
27.9%
t 28042
23.3%
A 14021
11.6%
h 14021
11.6%
l 14021
11.6%
a 5596
 
4.6%
m 5596
 
4.6%
G 2798
 
2.3%
T 2798
 
2.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 120531
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 33638
27.9%
t 28042
23.3%
A 14021
11.6%
h 14021
11.6%
l 14021
11.6%
a 5596
 
4.6%
m 5596
 
4.6%
G 2798
 
2.3%
T 2798
 
2.3%

athlete_full_name
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct 12074
Distinct (%) 71.8%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
Michael PHELPS
 
16
Marit BJOERGEN
 
12
Ireen WÃœST
 
10
Takashi ONO
 
10
Alexei NEMOV
 
10
Other values (12069)
16761 

Length

Max length 38
Median length 34
Mean length 15.055651
Min length 3

Characters and Unicode

Total characters 253221
Distinct characters 103
Distinct categories 7 ?
Distinct scripts 2 ?
Distinct blocks 3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 9028 ?
Unique (%) 53.7%

Sample

1st row Stefania CONSTANTINI
2nd row Amos MOSANER
3rd row Kristin SKASLIEN
4th row Magnus NEDREGOTTEN
5th row Almida DE VAL

Common Values

Value Count Frequency (%)
Michael PHELPS 16
 
0.1%
Marit BJOERGEN 12
 
0.1%
Ireen WÃœST 10
 
0.1%
Takashi ONO 10
 
0.1%
Alexei NEMOV 10
 
0.1%
Björn DAEHLIE 9
 
0.1%
Paavo NURMI 9
 
0.1%
Sawao KATO 9
 
0.1%
Ole Einar BJØRNDALEN 9
 
0.1%
Ray EWRY 8
 
< 0.1%
Other values (12064) 16717
99.4%

Length

2023-07-24T03:59:34.562099 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
john 207
 
0.6%
van 122
 
0.3%
thomas 121
 
0.3%
robert 116
 
0.3%
michael 114
 
0.3%
david 107
 
0.3%
peter 104
 
0.3%
charles 103
 
0.3%
william 102
 
0.3%
kim 99
 
0.3%
Other values (14704) 36006
96.8%

Most occurring characters

Value Count Frequency (%)
20391
 
8.1%
a 13444
 
5.3%
A 13097
 
5.2%
E 12598
 
5.0%
e 11110
 
4.4%
i 9692
 
3.8%
n 9682
 
3.8%
N 9481
 
3.7%
R 9448
 
3.7%
r 8601
 
3.4%
Other values (93) 135677
53.6%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 132821
52.5%
Lowercase Letter 98619
38.9%
Space Separator 20391
 
8.1%
Dash Punctuation 936
 
0.4%
Other Punctuation 388
 
0.2%
Open Punctuation 33
 
< 0.1%
Close Punctuation 33
 
< 0.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 13444
13.6%
e 11110
11.3%
i 9692
9.8%
n 9682
9.8%
r 8601
8.7%
o 6675
 
6.8%
l 6205
 
6.3%
t 4326
 
4.4%
s 4309
 
4.4%
h 3362
 
3.4%
Other values (40) 21213
21.5%
Uppercase Letter
Value Count Frequency (%)
A 13097
 
9.9%
E 12598
 
9.5%
N 9481
 
7.1%
R 9448
 
7.1%
S 8318
 
6.3%
I 8318
 
6.3%
O 8135
 
6.1%
L 7243
 
5.5%
T 5679
 
4.3%
M 5495
 
4.1%
Other values (36) 45009
33.9%
Other Punctuation
Value Count Frequency (%)
. 311
80.2%
' 47
 
12.1%
, 30
 
7.7%
Space Separator
Value Count Frequency (%)
20391
100.0%
Dash Punctuation
Value Count Frequency (%)
- 936
100.0%
Open Punctuation
Value Count Frequency (%)
( 33
100.0%
Close Punctuation
Value Count Frequency (%)
) 33
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 231440
91.4%
Common 21781
 
8.6%

Most frequent character per script

Latin
Value Count Frequency (%)
a 13444
 
5.8%
A 13097
 
5.7%
E 12598
 
5.4%
e 11110
 
4.8%
i 9692
 
4.2%
n 9682
 
4.2%
N 9481
 
4.1%
R 9448
 
4.1%
r 8601
 
3.7%
S 8318
 
3.6%
Other values (86) 125969
54.4%
Common
Value Count Frequency (%)
20391
93.6%
- 936
 
4.3%
. 311
 
1.4%
' 47
 
0.2%
( 33
 
0.2%
) 33
 
0.2%
, 30
 
0.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 252374
99.7%
None 824
 
0.3%
IPA Ext 23
 
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
20391
 
8.1%
a 13444
 
5.3%
A 13097
 
5.2%
E 12598
 
5.0%
e 11110
 
4.4%
i 9692
 
3.8%
n 9682
 
3.8%
N 9481
 
3.8%
R 9448
 
3.7%
r 8601
 
3.4%
Other values (49) 134830
53.4%
None
Value Count Frequency (%)
Ö 198
24.0%
ö 116
14.1%
Ä 102
12.4%
Ü 78
 
9.5%
é 63
 
7.6%
ü 53
 
6.4%
Ø 21
 
2.5%
ä 20
 
2.4%
á 19
 
2.3%
ç 18
 
2.2%
Other values (33) 136
16.5%
IPA Ext
Value Count Frequency (%)
É™ 23
100.0%

country_name
Categorical

Distinct 141
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
United States of America
2616 
Germany
 
923
Great Britain
 
812
France
 
746
People's Republic of China
 
734
Other values (136)
10988 

Length

Max length 37
Median length 28
Mean length 12.656341
Min length 3

Characters and Unicode

Total characters 212867
Distinct characters 56
Distinct categories 6 ?
Distinct scripts 2 ?
Distinct blocks 2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 20 ?
Unique (%) 0.1%

Sample

1st row Italy
2nd row Italy
3rd row Norway
4th row Norway
5th row Sweden

Common Values

Value Count Frequency (%)
United States of America 2616
 
15.6%
Germany 923
 
5.5%
Great Britain 812
 
4.8%
France 746
 
4.4%
People's Republic of China 734
 
4.4%
Italy 618
 
3.7%
Sweden 555
 
3.3%
Russian Federation 511
 
3.0%
Japan 508
 
3.0%
Australia 495
 
2.9%
Other values (131) 8301
49.4%

Length

2023-07-24T03:59:34.864334 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
of 4029
 
12.7%
united 2620
 
8.2%
america 2616
 
8.2%
states 2616
 
8.2%
republic 1996
 
6.3%
germany 1591
 
5.0%
britain 812
 
2.6%
great 812
 
2.6%
people's 791
 
2.5%
france 746
 
2.3%
Other values (158) 13192
41.5%

Most occurring characters

Value Count Frequency (%)
a 24209
 
11.4%
e 22086
 
10.4%
i 15512
 
7.3%
15002
 
7.0%
n 13874
 
6.5%
t 13336
 
6.3%
r 12839
 
6.0%
o 8509
 
4.0%
c 7166
 
3.4%
l 6947
 
3.3%
Other values (46) 73387
34.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 168152
79.0%
Uppercase Letter 27976
 
13.1%
Space Separator 15002
 
7.0%
Other Punctuation 805
 
0.4%
Close Punctuation 466
 
0.2%
Open Punctuation 466
 
0.2%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 24209
14.4%
e 22086
13.1%
i 15512
9.2%
n 13874
 
8.3%
t 13336
 
7.9%
r 12839
 
7.6%
o 8509
 
5.1%
c 7166
 
4.3%
l 6947
 
4.1%
s 6415
 
3.8%
Other values (17) 37259
22.2%
Uppercase Letter
Value Count Frequency (%)
S 3889
13.9%
A 3709
13.3%
G 3040
10.9%
R 2901
10.4%
U 2818
10.1%
C 1913
6.8%
F 1879
6.7%
B 1430
 
5.1%
P 1142
 
4.1%
N 1047
 
3.7%
Other values (14) 4208
15.0%
Other Punctuation
Value Count Frequency (%)
' 795
98.8%
, 10
 
1.2%
Space Separator
Value Count Frequency (%)
15002
100.0%
Close Punctuation
Value Count Frequency (%)
) 466
100.0%
Open Punctuation
Value Count Frequency (%)
( 466
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 196128
92.1%
Common 16739
 
7.9%

Most frequent character per script

Latin
Value Count Frequency (%)
a 24209
 
12.3%
e 22086
 
11.3%
i 15512
 
7.9%
n 13874
 
7.1%
t 13336
 
6.8%
r 12839
 
6.5%
o 8509
 
4.3%
c 7166
 
3.7%
l 6947
 
3.5%
s 6415
 
3.3%
Other values (41) 65235
33.3%
Common
Value Count Frequency (%)
15002
89.6%
' 795
 
4.7%
) 466
 
2.8%
( 466
 
2.8%
, 10
 
0.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 212863
> 99.9%
None 4
 
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 24209
 
11.4%
e 22086
 
10.4%
i 15512
 
7.3%
15002
 
7.0%
n 13874
 
6.5%
t 13336
 
6.3%
r 12839
 
6.0%
o 8509
 
4.0%
c 7166
 
3.4%
l 6947
 
3.3%
Other values (45) 73383
34.5%
None
Value Count Frequency (%)
ô 4
100.0%

country_code
Categorical

Distinct 140
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
US
2616 
DE
1125 
GB
 
812
FR
 
746
CN
 
734
Other values (135)
10786 

Length

Max length 4
Median length 2
Mean length 2.0869255
Min length 2

Characters and Unicode

Total characters 35100
Distinct characters 26
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 20 ?
Unique (%) 0.1%

Sample

1st row IT
2nd row IT
3rd row NO
4th row NO
5th row SE

Common Values

Value Count Frequency (%)
US 2616
 
15.6%
DE 1125
 
6.7%
GB 812
 
4.8%
FR 746
 
4.4%
CN 734
 
4.4%
IT 618
 
3.7%
SE 555
 
3.3%
RU 511
 
3.0%
JP 508
 
3.0%
AU 495
 
2.9%
Other values (130) 8099
48.2%

Length

2023-07-24T03:59:35.155154 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
us 2616
 
15.6%
de 1125
 
6.7%
gb 812
 
4.8%
fr 746
 
4.4%
cn 734
 
4.4%
it 618
 
3.7%
se 555
 
3.3%
ru 511
 
3.0%
jp 508
 
3.0%
au 495
 
2.9%
Other values (130) 8099
48.2%

Most occurring characters

Value Count Frequency (%)
U 4564
13.0%
S 3699
 
10.5%
E 2837
 
8.1%
D 2782
 
7.9%
R 2568
 
7.3%
C 2221
 
6.3%
N 1852
 
5.3%
A 1714
 
4.9%
B 1431
 
4.1%
I 1309
 
3.7%
Other values (16) 10123
28.8%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 35100
100.0%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
U 4564
13.0%
S 3699
 
10.5%
E 2837
 
8.1%
D 2782
 
7.9%
R 2568
 
7.3%
C 2221
 
6.3%
N 1852
 
5.3%
A 1714
 
4.9%
B 1431
 
4.1%
I 1309
 
3.7%
Other values (16) 10123
28.8%

Most occurring scripts

Value Count Frequency (%)
Latin 35100
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
U 4564
13.0%
S 3699
 
10.5%
E 2837
 
8.1%
D 2782
 
7.9%
R 2568
 
7.3%
C 2221
 
6.3%
N 1852
 
5.3%
A 1714
 
4.9%
B 1431
 
4.1%
I 1309
 
3.7%
Other values (16) 10123
28.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 35100
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
U 4564
13.0%
S 3699
 
10.5%
E 2837
 
8.1%
D 2782
 
7.9%
R 2568
 
7.3%
C 2221
 
6.3%
N 1852
 
5.3%
A 1714
 
4.9%
B 1431
 
4.1%
I 1309
 
3.7%
Other values (16) 10123
28.8%
Distinct 141
Distinct (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory size 262.8 KiB
USA
2616 
GER
 
923
GBR
 
812
FRA
 
746
CHN
 
734
Other values (136)
10988 

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 50457
Distinct characters 26
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 20 ?
Unique (%) 0.1%

Sample

1st row ITA
2nd row ITA
3rd row NOR
4th row NOR
5th row SWE

Common Values

Value Count Frequency (%)
USA 2616
 
15.6%
GER 923
 
5.5%
GBR 812
 
4.8%
FRA 746
 
4.4%
CHN 734
 
4.4%
ITA 618
 
3.7%
SWE 555
 
3.3%
RUS 511
 
3.0%
JPN 508
 
3.0%
AUS 495
 
2.9%
Other values (131) 8301
49.4%

Length

2023-07-24T03:59:35.409135 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
usa 2616
 
15.6%
ger 923
 
5.5%
gbr 812
 
4.8%
fra 746
 
4.4%
chn 734
 
4.4%
ita 618
 
3.7%
swe 555
 
3.3%
rus 511
 
3.0%
jpn 508
 
3.0%
aus 495
 
2.9%
Other values (131) 8301
49.4%

Most occurring characters

Value Count Frequency (%)
A 6037
12.0%
R 6013
11.9%
U 5917
11.7%
S 4975
9.9%
N 4022
 
8.0%
E 3053
 
6.1%
G 2846
 
5.6%
C 1884
 
3.7%
O 1719
 
3.4%
I 1704
 
3.4%
Other values (16) 12287
24.4%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 50457
100.0%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
A 6037
12.0%
R 6013
11.9%
U 5917
11.7%
S 4975
9.9%
N 4022
 
8.0%
E 3053
 
6.1%
G 2846
 
5.6%
C 1884
 
3.7%
O 1719
 
3.4%
I 1704
 
3.4%
Other values (16) 12287
24.4%

Most occurring scripts

Value Count Frequency (%)
Latin 50457
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
A 6037
12.0%
R 6013
11.9%
U 5917
11.7%
S 4975
9.9%
N 4022
 
8.0%
E 3053
 
6.1%
G 2846
 
5.6%
C 1884
 
3.7%
O 1719
 
3.4%
I 1704
 
3.4%
Other values (16) 12287
24.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 50457
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
A 6037
12.0%
R 6013
11.9%
U 5917
11.7%
S 4975
9.9%
N 4022
 
8.0%
E 3053
 
6.1%
G 2846
 
5.6%
C 1884
 
3.7%
O 1719
 
3.4%
I 1704
 
3.4%
Other values (16) 12287
24.4%

Correlations

2023-07-24T03:59:35.607679 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
discipline_title slug_game event_gender medal_type participant_type
discipline_title 1.000 0.183 0.568 0.033 0.768
slug_game 0.183 1.000 0.222 0.000 0.108
event_gender 0.568 0.222 1.000 0.000 0.386
medal_type 0.033 0.000 0.000 1.000 0.000
participant_type 0.768 0.108 0.386 0.000 1.000

Missing values

2023-07-24T03:59:29.474271 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-07-24T03:59:29.912414 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

discipline_title slug_game event_title event_gender medal_type participant_type athlete_full_name country_name country_code country_3_letter_code
0 Curling beijing-2022 Mixed Doubles Mixed GOLD GameTeam Stefania CONSTANTINI Italy IT ITA
1 Curling beijing-2022 Mixed Doubles Mixed GOLD GameTeam Amos MOSANER Italy IT ITA
2 Curling beijing-2022 Mixed Doubles Mixed SILVER GameTeam Kristin SKASLIEN Norway NO NOR
3 Curling beijing-2022 Mixed Doubles Mixed SILVER GameTeam Magnus NEDREGOTTEN Norway NO NOR
4 Curling beijing-2022 Mixed Doubles Mixed BRONZE GameTeam Almida DE VAL Sweden SE SWE
5 Curling beijing-2022 Mixed Doubles Mixed BRONZE GameTeam Oskar ERIKSSON Sweden SE SWE
12 Freestyle Skiing beijing-2022 Men's Moguls Men SILVER Athlete Mikael KINGSBURY Canada CA CAN
13 Freestyle Skiing beijing-2022 Men's Moguls Men GOLD Athlete Walter WALLBERG Sweden SE SWE
14 Freestyle Skiing beijing-2022 Men's Moguls Men BRONZE Athlete Ikuma HORISHIMA Japan JP JPN
15 Freestyle Skiing beijing-2022 Men's Freeski Halfpipe Men GOLD Athlete Nico PORTEOUS New Zealand NZ NZL
discipline_title slug_game event_title event_gender medal_type participant_type athlete_full_name country_name country_code country_3_letter_code
21685 Tennis athens-1896 doubles men Men SILVER GameTeam Dimitrios PETROKOKKINOS Greece GR GRE
21688 Wrestling athens-1896 Unlimited Class, Greco-Roman Men Men GOLD Athlete Carl SCHUHMANN Germany DE GER
21689 Wrestling athens-1896 Unlimited Class, Greco-Roman Men Men SILVER Athlete Georgios TSITAS Greece GR GRE
21690 Wrestling athens-1896 Unlimited Class, Greco-Roman Men Men BRONZE Athlete Stefanos Khristopoulos Greece GR GRE
21691 Weightlifting athens-1896 heavyweight - one hand lift men Men GOLD Athlete Launceston ELLIOT Great Britain GB GBR
21692 Weightlifting athens-1896 heavyweight - one hand lift men Men SILVER Athlete Viggo JENSEN Denmark DK DEN
21693 Weightlifting athens-1896 heavyweight - one hand lift men Men BRONZE Athlete Alexandros Nikolopoulos Greece GR GRE
21694 Weightlifting athens-1896 heavyweight - two hand lift men Men GOLD Athlete Viggo JENSEN Denmark DK DEN
21695 Weightlifting athens-1896 heavyweight - two hand lift men Men SILVER Athlete Launceston ELLIOT Great Britain GB GBR
21696 Weightlifting athens-1896 heavyweight - two hand lift men Men BRONZE Athlete Sotirios VERSIS Greece GR GRE