Dataset statistics
Number of variables | 10 |
---|---|
Number of observations | 16819 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 1.4 MiB |
Average record size in memory | 88.0 B |
Variable types
Categorical | 10 |
---|
discipline_title has a
high cardinality: 67 distinct values |
High cardinality |
slug_game has a high
cardinality: 53 distinct values |
High cardinality |
event_title has a high
cardinality: 1192 distinct values |
High cardinality |
athlete_full_name has
a high cardinality: 12074 distinct values |
High cardinality |
country_name has a high
cardinality: 141 distinct values |
High cardinality |
country_code has a
high cardinality: 140 distinct values |
High cardinality |
country_3_letter_code
has a high cardinality: 141 distinct values |
High cardinality |
discipline_title is
highly overall correlated with event_gender and 1 other fields |
High correlation |
event_gender is highly
overall correlated with discipline_title |
High correlation |
participant_type is
highly overall correlated with discipline_title |
High correlation |
athlete_full_name is
uniformly distributed |
Uniform |
Reproduction
Analysis started | 2023-07-24 03:59:27.149715 |
---|---|
Analysis finished | 2023-07-24 03:59:30.139575 |
Duration | 2.99 seconds |
Software version | pandas-profiling v3.6.6 |
Download configuration | config.json |
discipline_title
Categorical
HIGH CARDINALITY
 
HIGH CORRELATION
 
Distinct | 67 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
Athletics | |
---|---|
Swimming | |
Wrestling | |
Boxing | 941 |
Shooting | 722 |
Other values (62) |
Length
Max length | 25 |
---|---|
Median length | 20 |
Mean length | 10.075688 |
Min length | 4 |
Characters and Unicode
Total characters | 169463 |
---|---|
Distinct characters | 45 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Curling |
---|---|
2nd row | Curling |
3rd row | Curling |
4th row | Curling |
5th row | Curling |
Common Values
Value | Count | Frequency (%) |
Athletics | 2583 | 15.4% |
Swimming | 1386 | 8.2% |
Wrestling | 1220 | 7.3% |
Boxing | 941 | 5.6% |
Shooting | 722 | 4.3% |
Canoe Sprint | 650 | 3.9% |
Gymnastics Artistic | 642 | 3.8% |
Rowing | 618 | 3.7% |
Weightlifting | 606 | 3.6% |
Sailing | 577 | 3.4% |
Other values (57) | 6874 |
Length
Value | Count | Frequency (%) |
athletics | 2583 | 11.4% |
swimming | 1473 | 6.5% |
wrestling | 1220 | 5.4% |
skiing | 1042 | 4.6% |
skating | 982 | 4.4% |
boxing | 941 | 4.2% |
canoe | 815 | 3.6% |
shooting | 722 | 3.2% |
gymnastics | 711 | 3.2% |
artistic | 685 | 3.0% |
Other values (64) | 11392 |
Most occurring characters
Value | Count | Frequency (%) |
i | 22620 | 13.3% |
n | 16174 | 9.5% |
t | 13839 | 8.2% |
g | 11324 | 6.7% |
e | 10555 | 6.2% |
s | 8696 | 5.1% |
l | 7638 | 4.5% |
o | 7539 | 4.4% |
c | 6002 | 3.5% |
S | 5899 | 3.5% |
Other values (35) | 59177 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 141952 | |
Uppercase Letter | 21761 | 12.8% |
Space Separator | 5750 | 3.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
i | 22620 | |
n | 16174 | |
t | 13839 | |
g | 11324 | 8.0% |
e | 10555 | 7.4% |
s | 8696 | 6.1% |
l | 7638 | 5.4% |
o | 7539 | 5.3% |
c | 6002 | 4.2% |
r | 5711 | 4.0% |
Other values (15) | 31854 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 5899 | |
A | 3884 | |
C | 2385 | |
W | 1826 | 8.4% |
B | 1641 | 7.5% |
T | 1427 | 6.6% |
F | 862 | 4.0% |
R | 807 | 3.7% |
J | 784 | 3.6% |
G | 733 | 3.4% |
Other values (9) | 1513 | 7.0% |
Space Separator
Value | Count | Frequency (%) |
5750 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 163713 | |
Common | 5750 | 3.4% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
i | 22620 | |
n | 16174 | 9.9% |
t | 13839 | 8.5% |
g | 11324 | 6.9% |
e | 10555 | 6.4% |
s | 8696 | 5.3% |
l | 7638 | 4.7% |
o | 7539 | 4.6% |
c | 6002 | 3.7% |
S | 5899 | 3.6% |
Other values (34) | 53427 |
Common
Value | Count | Frequency (%) |
5750 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 169463 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
i | 22620 | 13.3% |
n | 16174 | 9.5% |
t | 13839 | 8.2% |
g | 11324 | 6.7% |
e | 10555 | 6.2% |
s | 8696 | 5.1% |
l | 7638 | 4.5% |
o | 7539 | 4.4% |
c | 6002 | 3.5% |
S | 5899 | 3.5% |
Other values (35) | 59177 |
slug_game
Categorical
Distinct | 53 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
tokyo-2020 | 1006 |
---|---|
rio-2016 | 912 |
beijing-2008 | 896 |
london-2012 | 895 |
athens-2004 | 878 |
Other values (48) |
Length
Max length | 27 |
---|---|
Median length | 19 |
Mean length | 12.043403 |
Min length | 8 |
Characters and Unicode
Total characters | 202558 |
---|---|
Distinct characters | 36 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | beijing-2022 |
---|---|
2nd row | beijing-2022 |
3rd row | beijing-2022 |
4th row | beijing-2022 |
5th row | beijing-2022 |
Common Values
Value | Count | Frequency (%) |
tokyo-2020 | 1006 | 6.0% |
rio-2016 | 912 | 5.4% |
beijing-2008 | 896 | 5.3% |
london-2012 | 895 | 5.3% |
athens-2004 | 878 | 5.2% |
sydney-2000 | 876 | 5.2% |
atlanta-1996 | 772 | 4.6% |
barcelona-1992 | 654 | 3.9% |
los-angeles-1984 | 604 | 3.6% |
seoul-1988 | 548 | 3.3% |
Other values (43) | 8778 |
Length
Value | Count | Frequency (%) |
tokyo-2020 | 1006 | 6.0% |
rio-2016 | 912 | 5.4% |
beijing-2008 | 896 | 5.3% |
london-2012 | 895 | 5.3% |
athens-2004 | 878 | 5.2% |
sydney-2000 | 876 | 5.2% |
atlanta-1996 | 772 | 4.6% |
barcelona-1992 | 654 | 3.9% |
los-angeles-1984 | 604 | 3.6% |
seoul-1988 | 548 | 3.3% |
Other values (43) | 8778 |
Most occurring characters
Value | Count | Frequency (%) |
- | 19216 | 9.5% |
0 | 14222 | 7.0% |
o | 13281 | 6.6% |
2 | 12847 | 6.3% |
1 | 12657 | 6.2% |
n | 12467 | 6.2% |
9 | 11772 | 5.8% |
e | 10879 | 5.4% |
a | 9944 | 4.9% |
l | 9069 | 4.5% |
Other values (26) | 76204 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 116066 | |
Decimal Number | 67276 | |
Dash Punctuation | 19216 | 9.5% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
o | 13281 | |
n | 12467 | |
e | 10879 | 9.4% |
a | 9944 | 8.6% |
l | 9069 | 7.8% |
i | 7755 | 6.7% |
s | 7508 | 6.5% |
t | 6831 | 5.9% |
r | 5581 | 4.8% |
y | 4102 | 3.5% |
Other values (15) | 28649 |
Decimal Number
Value | Count | Frequency (%) |
0 | 14222 | |
2 | 12847 | |
1 | 12657 | |
9 | 11772 | |
8 | 5361 | 8.0% |
6 | 4499 | 6.7% |
4 | 3450 | 5.1% |
7 | 1031 | 1.5% |
5 | 743 | 1.1% |
3 | 694 | 1.0% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 19216 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 116066 | |
Common | 86492 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
o | 13281 | |
n | 12467 | |
e | 10879 | 9.4% |
a | 9944 | 8.6% |
l | 9069 | 7.8% |
i | 7755 | 6.7% |
s | 7508 | 6.5% |
t | 6831 | 5.9% |
r | 5581 | 4.8% |
y | 4102 | 3.5% |
Other values (15) | 28649 |
Common
Value | Count | Frequency (%) |
- | 19216 | |
0 | 14222 | |
2 | 12847 | |
1 | 12657 | |
9 | 11772 | |
8 | 5361 | 6.2% |
6 | 4499 | 5.2% |
4 | 3450 | 4.0% |
7 | 1031 | 1.2% |
5 | 743 | 0.9% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 202558 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
- | 19216 | 9.5% |
0 | 14222 | 7.0% |
o | 13281 | 6.6% |
2 | 12847 | 6.3% |
1 | 12657 | 6.2% |
n | 12467 | 6.2% |
9 | 11772 | 5.8% |
e | 10879 | 5.4% |
a | 9944 | 4.9% |
l | 9069 | 4.5% |
Other values (26) | 76204 |
event_title
Categorical
Distinct | 1192 |
---|---|
Distinct (%) | 7.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
Individual men | 208 |
---|---|
individual mixed | 183 |
doubles men | 162 |
1500m men | 154 |
doubles women | 138 |
Other values (1187) |
Length
Max length | 52 |
---|---|
Median length | 43 |
Mean length | 20.328795 |
Min length | 3 |
Characters and Unicode
Total characters | 341910 |
---|---|
Distinct characters | 75 |
Distinct categories | 11 ? |
Distinct scripts | 2 ? |
Distinct blocks | 4 ? |
Unique
Unique | 2 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | Mixed Doubles |
---|---|
2nd row | Mixed Doubles |
3rd row | Mixed Doubles |
4th row | Mixed Doubles |
5th row | Mixed Doubles |
Common Values
Value | Count | Frequency (%) |
Individual men | 208 | 1.2% |
individual mixed | 183 | 1.1% |
doubles men | 162 | 1.0% |
1500m men | 154 | 0.9% |
doubles women | 138 | 0.8% |
5000m men | 136 | 0.8% |
Singles men | 135 | 0.8% |
10000m men | 129 | 0.8% |
Singles women | 128 | 0.8% |
Individual women | 125 | 0.7% |
Other values (1182) | 15321 |
Length
Value | Count | Frequency (%) |
men | 10209 | 18.4% |
women | 4409 | 7.9% |
individual | 1774 | 3.2% |
freestyle | 1259 | 2.3% |
kilograms | 974 | 1.8% |
mixed | 969 | 1.7% |
double | 801 | 1.4% |
751 | 1.4% | |
100m | 681 | 1.2% |
200m | 653 | 1.2% |
Other values (638) | 32991 |
Most occurring characters
Value | Count | Frequency (%) |
39230 | 11.5% | |
e | 35649 | 10.4% |
m | 25327 | 7.4% |
n | 24356 | 7.1% |
i | 17486 | 5.1% |
o | 15594 | 4.6% |
l | 14683 | 4.3% |
a | 13019 | 3.8% |
t | 12252 | 3.6% |
0 | 12116 | 3.5% |
Other values (65) | 132198 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 249937 | |
Space Separator | 39230 | 11.5% |
Decimal Number | 31558 | 9.2% |
Uppercase Letter | 11810 | 3.5% |
Dash Punctuation | 2789 | 0.8% |
Other Punctuation | 2361 | 0.7% |
Close Punctuation | 1450 | 0.4% |
Open Punctuation | 1450 | 0.4% |
Math Symbol | 1105 | 0.3% |
Final Punctuation | 215 | 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 35649 | |
m | 25327 | 10.1% |
n | 24356 | 9.7% |
i | 17486 | 7.0% |
o | 15594 | 6.2% |
l | 14683 | 5.9% |
a | 13019 | 5.2% |
t | 12252 | 4.9% |
s | 11663 | 4.7% |
r | 10565 | 4.2% |
Other values (17) | 69343 |
Uppercase Letter
Value | Count | Frequency (%) |
M | 2190 | |
F | 1144 | |
S | 1081 | 9.2% |
I | 836 | 7.1% |
W | 792 | 6.7% |
G | 679 | 5.7% |
R | 667 | 5.6% |
H | 591 | 5.0% |
K | 557 | 4.7% |
P | 514 | 4.4% |
Other values (14) | 2759 |
Decimal Number
Value | Count | Frequency (%) |
0 | 12116 | |
1 | 4169 | 13.2% |
5 | 3925 | 12.4% |
2 | 2682 | 8.5% |
6 | 2092 | 6.6% |
7 | 1949 | 6.2% |
4 | 1389 | 4.4% |
8 | 1384 | 4.4% |
3 | 1220 | 3.9% |
9 | 632 | 2.0% |
Other Punctuation
Value | Count | Frequency (%) |
, | 1042 | |
' | 979 | |
. | 307 | 13.0% |
/ | 21 | 0.9% |
: | 12 | 0.5% |
Math Symbol
Value | Count | Frequency (%) |
≤ | 908 | |
+ | 104 | 9.4% |
> | 93 | 8.4% |
Space Separator
Value | Count | Frequency (%) |
39230 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2789 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1450 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1450 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 215 |
Other Number
Value | Count | Frequency (%) |
½ | 5 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 261747 | |
Common | 80163 | 23.4% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 35649 | |
m | 25327 | 9.7% |
n | 24356 | 9.3% |
i | 17486 | 6.7% |
o | 15594 | 6.0% |
l | 14683 | 5.6% |
a | 13019 | 5.0% |
t | 12252 | 4.7% |
s | 11663 | 4.5% |
r | 10565 | 4.0% |
Other values (41) | 81153 |
Common
Value | Count | Frequency (%) |
39230 | ||
0 | 12116 | 15.1% |
1 | 4169 | 5.2% |
5 | 3925 | 4.9% |
- | 2789 | 3.5% |
2 | 2682 | 3.3% |
6 | 2092 | 2.6% |
7 | 1949 | 2.4% |
) | 1450 | 1.8% |
( | 1450 | 1.8% |
Other values (14) | 8311 | 10.4% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 340584 | |
Math Operators | 908 | 0.3% |
Punctuation | 215 | 0.1% |
None | 203 | 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
39230 | 11.5% | |
e | 35649 | 10.5% |
m | 25327 | 7.4% |
n | 24356 | 7.2% |
i | 17486 | 5.1% |
o | 15594 | 4.6% |
l | 14683 | 4.3% |
a | 13019 | 3.8% |
t | 12252 | 3.6% |
0 | 12116 | 3.6% |
Other values (59) | 130872 |
Math Operators
Value | Count | Frequency (%) |
≤ | 908 |
Punctuation
Value | Count | Frequency (%) |
’ | 215 |
None
Value | Count | Frequency (%) |
é | 186 | |
à | 6 | 3.0% |
É | 6 | 3.0% |
½ | 5 | 2.5% |
event_gender
Categorical
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
Men | |
---|---|
Women | |
Open | 638 |
Mixed | 336 |
Length
Max length | 5 |
---|---|
Median length | 3 |
Mean length | 3.6728105 |
Min length | 3 |
Characters and Unicode
Total characters | 61773 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Mixed |
---|---|
2nd row | Mixed |
3rd row | Mixed |
4th row | Mixed |
5th row | Mixed |
Common Values
Value | Count | Frequency (%) |
Men | 10842 | |
Women | 5003 | |
Open | 638 | 3.8% |
Mixed | 336 | 2.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
men | 10842 | |
women | 5003 | |
open | 638 | 3.8% |
mixed | 336 | 2.0% |
Most occurring characters
Value | Count | Frequency (%) |
e | 16819 | |
n | 16483 | |
M | 11178 | |
W | 5003 | 8.1% |
o | 5003 | 8.1% |
m | 5003 | 8.1% |
O | 638 | 1.0% |
p | 638 | 1.0% |
i | 336 | 0.5% |
x | 336 | 0.5% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 44954 | |
Uppercase Letter | 16819 | 27.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 16819 | |
n | 16483 | |
o | 5003 | 11.1% |
m | 5003 | 11.1% |
p | 638 | 1.4% |
i | 336 | 0.7% |
x | 336 | 0.7% |
d | 336 | 0.7% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 11178 | |
W | 5003 | |
O | 638 | 3.8% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 61773 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 16819 | |
n | 16483 | |
M | 11178 | |
W | 5003 | 8.1% |
o | 5003 | 8.1% |
m | 5003 | 8.1% |
O | 638 | 1.0% |
p | 638 | 1.0% |
i | 336 | 0.5% |
x | 336 | 0.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 61773 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 16819 | |
n | 16483 | |
M | 11178 | |
W | 5003 | 8.1% |
o | 5003 | 8.1% |
m | 5003 | 8.1% |
O | 638 | 1.0% |
p | 638 | 1.0% |
i | 336 | 0.5% |
x | 336 | 0.5% |
medal_type
Categorical
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
BRONZE | |
---|---|
SILVER | |
GOLD |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 5.3567989 |
Min length | 4 |
Characters and Unicode
Total characters | 90096 |
---|---|
Distinct characters | 12 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | GOLD |
---|---|
2nd row | GOLD |
3rd row | SILVER |
4th row | SILVER |
5th row | BRONZE |
Common Values
Value | Count | Frequency (%) |
BRONZE | 5959 | |
SILVER | 5451 | |
GOLD | 5409 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
bronze | 5959 | |
silver | 5451 | |
gold | 5409 |
Most occurring characters
Value | Count | Frequency (%) |
R | 11410 | |
E | 11410 | |
O | 11368 | |
L | 10860 | |
B | 5959 | |
N | 5959 | |
Z | 5959 | |
S | 5451 | |
I | 5451 | |
V | 5451 | |
Other values (2) | 10818 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 90096 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
R | 11410 | |
E | 11410 | |
O | 11368 | |
L | 10860 | |
B | 5959 | |
N | 5959 | |
Z | 5959 | |
S | 5451 | |
I | 5451 | |
V | 5451 | |
Other values (2) | 10818 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 90096 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
R | 11410 | |
E | 11410 | |
O | 11368 | |
L | 10860 | |
B | 5959 | |
N | 5959 | |
Z | 5959 | |
S | 5451 | |
I | 5451 | |
V | 5451 | |
Other values (2) | 10818 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 90096 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
R | 11410 | |
E | 11410 | |
O | 11368 | |
L | 10860 | |
B | 5959 | |
N | 5959 | |
Z | 5959 | |
S | 5451 | |
I | 5451 | |
V | 5451 | |
Other values (2) | 10818 |
participant_type
Categorical
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
Athlete | |
---|---|
GameTeam |
Length
Max length | 8 |
---|---|
Median length | 7 |
Mean length | 7.1663595 |
Min length | 7 |
Characters and Unicode
Total characters | 120531 |
---|---|
Distinct characters | 9 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | GameTeam |
---|---|
2nd row | GameTeam |
3rd row | GameTeam |
4th row | GameTeam |
5th row | GameTeam |
Common Values
Value | Count | Frequency (%) |
Athlete | 14021 | |
GameTeam | 2798 | 16.6% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
athlete | 14021 | |
gameteam | 2798 | 16.6% |
Most occurring characters
Value | Count | Frequency (%) |
e | 33638 | |
t | 28042 | |
A | 14021 | |
h | 14021 | |
l | 14021 | |
a | 5596 | 4.6% |
m | 5596 | 4.6% |
G | 2798 | 2.3% |
T | 2798 | 2.3% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 100914 | |
Uppercase Letter | 19617 | 16.3% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 33638 | |
t | 28042 | |
h | 14021 | |
l | 14021 | |
a | 5596 | 5.5% |
m | 5596 | 5.5% |
Uppercase Letter
Value | Count | Frequency (%) |
A | 14021 | |
G | 2798 | 14.3% |
T | 2798 | 14.3% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 120531 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 33638 | |
t | 28042 | |
A | 14021 | |
h | 14021 | |
l | 14021 | |
a | 5596 | 4.6% |
m | 5596 | 4.6% |
G | 2798 | 2.3% |
T | 2798 | 2.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 120531 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 33638 | |
t | 28042 | |
A | 14021 | |
h | 14021 | |
l | 14021 | |
a | 5596 | 4.6% |
m | 5596 | 4.6% |
G | 2798 | 2.3% |
T | 2798 | 2.3% |
athlete_full_name
Categorical
HIGH CARDINALITY
  UNIFORM
 
Distinct | 12074 |
---|---|
Distinct (%) | 71.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
Michael PHELPS | 16 |
---|---|
Marit BJOERGEN | 12 |
Ireen WÃœST | 10 |
Takashi ONO | 10 |
Alexei NEMOV | 10 |
Other values (12069) |
Length
Max length | 38 |
---|---|
Median length | 34 |
Mean length | 15.055651 |
Min length | 3 |
Characters and Unicode
Total characters | 253221 |
---|---|
Distinct characters | 103 |
Distinct categories | 7 ? |
Distinct scripts | 2 ? |
Distinct blocks | 3 ? |
Unique
Unique | 9028 ? |
---|---|
Unique (%) | 53.7% |
Sample
1st row | Stefania CONSTANTINI |
---|---|
2nd row | Amos MOSANER |
3rd row | Kristin SKASLIEN |
4th row | Magnus NEDREGOTTEN |
5th row | Almida DE VAL |
Common Values
Value | Count | Frequency (%) |
Michael PHELPS | 16 | 0.1% |
Marit BJOERGEN | 12 | 0.1% |
Ireen WÃœST | 10 | 0.1% |
Takashi ONO | 10 | 0.1% |
Alexei NEMOV | 10 | 0.1% |
Björn DAEHLIE | 9 | 0.1% |
Paavo NURMI | 9 | 0.1% |
Sawao KATO | 9 | 0.1% |
Ole Einar BJØRNDALEN | 9 | 0.1% |
Ray EWRY | 8 | < 0.1% |
Other values (12064) | 16717 |
Length
Value | Count | Frequency (%) |
john | 207 | 0.6% |
van | 122 | 0.3% |
thomas | 121 | 0.3% |
robert | 116 | 0.3% |
michael | 114 | 0.3% |
david | 107 | 0.3% |
peter | 104 | 0.3% |
charles | 103 | 0.3% |
william | 102 | 0.3% |
kim | 99 | 0.3% |
Other values (14704) | 36006 |
Most occurring characters
Value | Count | Frequency (%) |
20391 | 8.1% | |
a | 13444 | 5.3% |
A | 13097 | 5.2% |
E | 12598 | 5.0% |
e | 11110 | 4.4% |
i | 9692 | 3.8% |
n | 9682 | 3.8% |
N | 9481 | 3.7% |
R | 9448 | 3.7% |
r | 8601 | 3.4% |
Other values (93) | 135677 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 132821 | |
Lowercase Letter | 98619 | |
Space Separator | 20391 | 8.1% |
Dash Punctuation | 936 | 0.4% |
Other Punctuation | 388 | 0.2% |
Open Punctuation | 33 | < 0.1% |
Close Punctuation | 33 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 13444 | |
e | 11110 | |
i | 9692 | |
n | 9682 | |
r | 8601 | |
o | 6675 | 6.8% |
l | 6205 | 6.3% |
t | 4326 | 4.4% |
s | 4309 | 4.4% |
h | 3362 | 3.4% |
Other values (40) | 21213 |
Uppercase Letter
Value | Count | Frequency (%) |
A | 13097 | 9.9% |
E | 12598 | 9.5% |
N | 9481 | 7.1% |
R | 9448 | 7.1% |
S | 8318 | 6.3% |
I | 8318 | 6.3% |
O | 8135 | 6.1% |
L | 7243 | 5.5% |
T | 5679 | 4.3% |
M | 5495 | 4.1% |
Other values (36) | 45009 |
Other Punctuation
Value | Count | Frequency (%) |
. | 311 | |
' | 47 | 12.1% |
, | 30 | 7.7% |
Space Separator
Value | Count | Frequency (%) |
20391 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 936 |
Open Punctuation
Value | Count | Frequency (%) |
( | 33 |
Close Punctuation
Value | Count | Frequency (%) |
) | 33 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 231440 | |
Common | 21781 | 8.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 13444 | 5.8% |
A | 13097 | 5.7% |
E | 12598 | 5.4% |
e | 11110 | 4.8% |
i | 9692 | 4.2% |
n | 9682 | 4.2% |
N | 9481 | 4.1% |
R | 9448 | 4.1% |
r | 8601 | 3.7% |
S | 8318 | 3.6% |
Other values (86) | 125969 |
Common
Value | Count | Frequency (%) |
20391 | ||
- | 936 | 4.3% |
. | 311 | 1.4% |
' | 47 | 0.2% |
( | 33 | 0.2% |
) | 33 | 0.2% |
, | 30 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 252374 | |
None | 824 | 0.3% |
IPA Ext | 23 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
20391 | 8.1% | |
a | 13444 | 5.3% |
A | 13097 | 5.2% |
E | 12598 | 5.0% |
e | 11110 | 4.4% |
i | 9692 | 3.8% |
n | 9682 | 3.8% |
N | 9481 | 3.8% |
R | 9448 | 3.7% |
r | 8601 | 3.4% |
Other values (49) | 134830 |
None
Value | Count | Frequency (%) |
Ö | 198 | |
ö | 116 | |
Ä | 102 | |
Ü | 78 | 9.5% |
é | 63 | 7.6% |
ü | 53 | 6.4% |
Ø | 21 | 2.5% |
ä | 20 | 2.4% |
á | 19 | 2.3% |
ç | 18 | 2.2% |
Other values (33) | 136 |
IPA Ext
Value | Count | Frequency (%) |
É™ | 23 |
country_name
Categorical
Distinct | 141 |
---|---|
Distinct (%) | 0.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
United States of America | |
---|---|
Germany | 923 |
Great Britain | 812 |
France | 746 |
People's Republic of China | 734 |
Other values (136) |
Length
Max length | 37 |
---|---|
Median length | 28 |
Mean length | 12.656341 |
Min length | 3 |
Characters and Unicode
Total characters | 212867 |
---|---|
Distinct characters | 56 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique
Unique | 20 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | Italy |
---|---|
2nd row | Italy |
3rd row | Norway |
4th row | Norway |
5th row | Sweden |
Common Values
Value | Count | Frequency (%) |
United States of America | 2616 | 15.6% |
Germany | 923 | 5.5% |
Great Britain | 812 | 4.8% |
France | 746 | 4.4% |
People's Republic of China | 734 | 4.4% |
Italy | 618 | 3.7% |
Sweden | 555 | 3.3% |
Russian Federation | 511 | 3.0% |
Japan | 508 | 3.0% |
Australia | 495 | 2.9% |
Other values (131) | 8301 |
Length
Value | Count | Frequency (%) |
of | 4029 | 12.7% |
united | 2620 | 8.2% |
america | 2616 | 8.2% |
states | 2616 | 8.2% |
republic | 1996 | 6.3% |
germany | 1591 | 5.0% |
britain | 812 | 2.6% |
great | 812 | 2.6% |
people's | 791 | 2.5% |
france | 746 | 2.3% |
Other values (158) | 13192 |
Most occurring characters
Value | Count | Frequency (%) |
a | 24209 | 11.4% |
e | 22086 | 10.4% |
i | 15512 | 7.3% |
15002 | 7.0% | |
n | 13874 | 6.5% |
t | 13336 | 6.3% |
r | 12839 | 6.0% |
o | 8509 | 4.0% |
c | 7166 | 3.4% |
l | 6947 | 3.3% |
Other values (46) | 73387 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 168152 | |
Uppercase Letter | 27976 | 13.1% |
Space Separator | 15002 | 7.0% |
Other Punctuation | 805 | 0.4% |
Close Punctuation | 466 | 0.2% |
Open Punctuation | 466 | 0.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 24209 | |
e | 22086 | |
i | 15512 | |
n | 13874 | 8.3% |
t | 13336 | 7.9% |
r | 12839 | 7.6% |
o | 8509 | 5.1% |
c | 7166 | 4.3% |
l | 6947 | 4.1% |
s | 6415 | 3.8% |
Other values (17) | 37259 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 3889 | |
A | 3709 | |
G | 3040 | |
R | 2901 | |
U | 2818 | |
C | 1913 | |
F | 1879 | |
B | 1430 | 5.1% |
P | 1142 | 4.1% |
N | 1047 | 3.7% |
Other values (14) | 4208 |
Other Punctuation
Value | Count | Frequency (%) |
' | 795 | |
, | 10 | 1.2% |
Space Separator
Value | Count | Frequency (%) |
15002 |
Close Punctuation
Value | Count | Frequency (%) |
) | 466 |
Open Punctuation
Value | Count | Frequency (%) |
( | 466 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 196128 | |
Common | 16739 | 7.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 24209 | 12.3% |
e | 22086 | 11.3% |
i | 15512 | 7.9% |
n | 13874 | 7.1% |
t | 13336 | 6.8% |
r | 12839 | 6.5% |
o | 8509 | 4.3% |
c | 7166 | 3.7% |
l | 6947 | 3.5% |
s | 6415 | 3.3% |
Other values (41) | 65235 |
Common
Value | Count | Frequency (%) |
15002 | ||
' | 795 | 4.7% |
) | 466 | 2.8% |
( | 466 | 2.8% |
, | 10 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 212863 | |
None | 4 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 24209 | 11.4% |
e | 22086 | 10.4% |
i | 15512 | 7.3% |
15002 | 7.0% | |
n | 13874 | 6.5% |
t | 13336 | 6.3% |
r | 12839 | 6.0% |
o | 8509 | 4.0% |
c | 7166 | 3.4% |
l | 6947 | 3.3% |
Other values (45) | 73383 |
None
Value | Count | Frequency (%) |
ô | 4 |
country_code
Categorical
Distinct | 140 |
---|---|
Distinct (%) | 0.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
US | |
---|---|
DE | |
GB | 812 |
FR | 746 |
CN | 734 |
Other values (135) |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.0869255 |
Min length | 2 |
Characters and Unicode
Total characters | 35100 |
---|---|
Distinct characters | 26 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 20 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | IT |
---|---|
2nd row | IT |
3rd row | NO |
4th row | NO |
5th row | SE |
Common Values
Value | Count | Frequency (%) |
US | 2616 | 15.6% |
DE | 1125 | 6.7% |
GB | 812 | 4.8% |
FR | 746 | 4.4% |
CN | 734 | 4.4% |
IT | 618 | 3.7% |
SE | 555 | 3.3% |
RU | 511 | 3.0% |
JP | 508 | 3.0% |
AU | 495 | 2.9% |
Other values (130) | 8099 |
Length
Value | Count | Frequency (%) |
us | 2616 | 15.6% |
de | 1125 | 6.7% |
gb | 812 | 4.8% |
fr | 746 | 4.4% |
cn | 734 | 4.4% |
it | 618 | 3.7% |
se | 555 | 3.3% |
ru | 511 | 3.0% |
jp | 508 | 3.0% |
au | 495 | 2.9% |
Other values (130) | 8099 |
Most occurring characters
Value | Count | Frequency (%) |
U | 4564 | |
S | 3699 | 10.5% |
E | 2837 | 8.1% |
D | 2782 | 7.9% |
R | 2568 | 7.3% |
C | 2221 | 6.3% |
N | 1852 | 5.3% |
A | 1714 | 4.9% |
B | 1431 | 4.1% |
I | 1309 | 3.7% |
Other values (16) | 10123 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 35100 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
U | 4564 | |
S | 3699 | 10.5% |
E | 2837 | 8.1% |
D | 2782 | 7.9% |
R | 2568 | 7.3% |
C | 2221 | 6.3% |
N | 1852 | 5.3% |
A | 1714 | 4.9% |
B | 1431 | 4.1% |
I | 1309 | 3.7% |
Other values (16) | 10123 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 35100 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
U | 4564 | |
S | 3699 | 10.5% |
E | 2837 | 8.1% |
D | 2782 | 7.9% |
R | 2568 | 7.3% |
C | 2221 | 6.3% |
N | 1852 | 5.3% |
A | 1714 | 4.9% |
B | 1431 | 4.1% |
I | 1309 | 3.7% |
Other values (16) | 10123 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 35100 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
U | 4564 | |
S | 3699 | 10.5% |
E | 2837 | 8.1% |
D | 2782 | 7.9% |
R | 2568 | 7.3% |
C | 2221 | 6.3% |
N | 1852 | 5.3% |
A | 1714 | 4.9% |
B | 1431 | 4.1% |
I | 1309 | 3.7% |
Other values (16) | 10123 |
country_3_letter_code
Categorical
Distinct | 141 |
---|---|
Distinct (%) | 0.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.8 KiB |
USA | |
---|---|
GER | 923 |
GBR | 812 |
FRA | 746 |
CHN | 734 |
Other values (136) |
Common Values
Value | Count | Frequency (%) |
USA | 2616 | 15.6% |
GER | 923 | 5.5% |
GBR | 812 | 4.8% |
FRA | 746 | 4.4% |
CHN | 734 | 4.4% |
ITA | 618 | 3.7% |
SWE | 555 | 3.3% |
RUS | 511 | 3.0% |
JPN | 508 | 3.0% |
AUS | 495 | 2.9% |
Other values (131) | 8301 |
Length
Value | Count | Frequency (%) |
usa | 2616 | 15.6% |
ger | 923 | 5.5% |
gbr | 812 | 4.8% |
fra | 746 | 4.4% |
chn | 734 | 4.4% |
ita | 618 | 3.7% |
swe | 555 | 3.3% |
rus | 511 | 3.0% |
jpn | 508 | 3.0% |
aus | 495 | 2.9% |
Other values (131) | 8301 |
Most occurring characters
Value | Count | Frequency (%) |
A | 6037 | |
R | 6013 | |
U | 5917 | |
S | 4975 | |
N | 4022 | 8.0% |
E | 3053 | 6.1% |
G | 2846 | 5.6% |
C | 1884 | 3.7% |
O | 1719 | 3.4% |
I | 1704 | 3.4% |
Other values (16) | 12287 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 50457 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
A | 6037 | |
R | 6013 | |
U | 5917 | |
S | 4975 | |
N | 4022 | 8.0% |
E | 3053 | 6.1% |
G | 2846 | 5.6% |
C | 1884 | 3.7% |
O | 1719 | 3.4% |
I | 1704 | 3.4% |
Other values (16) | 12287 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 50457 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
A | 6037 | |
R | 6013 | |
U | 5917 | |
S | 4975 | |
N | 4022 | 8.0% |
E | 3053 | 6.1% |
G | 2846 | 5.6% |
C | 1884 | 3.7% |
O | 1719 | 3.4% |
I | 1704 | 3.4% |
Other values (16) | 12287 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 50457 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
A | 6037 | |
R | 6013 | |
U | 5917 | |
S | 4975 | |
N | 4022 | 8.0% |
E | 3053 | 6.1% |
G | 2846 | 5.6% |
C | 1884 | 3.7% |
O | 1719 | 3.4% |
I | 1704 | 3.4% |
Other values (16) | 12287 |
discipline_title | slug_game | event_gender | medal_type | participant_type | |
---|---|---|---|---|---|
discipline_title | 1.000 | 0.183 | 0.568 | 0.033 | 0.768 |
slug_game | 0.183 | 1.000 | 0.222 | 0.000 | 0.108 |
event_gender | 0.568 | 0.222 | 1.000 | 0.000 | 0.386 |
medal_type | 0.033 | 0.000 | 0.000 | 1.000 | 0.000 |
participant_type | 0.768 | 0.108 | 0.386 | 0.000 | 1.000 |
discipline_title | slug_game | event_title | event_gender | medal_type | participant_type | athlete_full_name | country_name | country_code | country_3_letter_code | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Curling | beijing-2022 | Mixed Doubles | Mixed | GOLD | GameTeam | Stefania CONSTANTINI | Italy | IT | ITA |
1 | Curling | beijing-2022 | Mixed Doubles | Mixed | GOLD | GameTeam | Amos MOSANER | Italy | IT | ITA |
2 | Curling | beijing-2022 | Mixed Doubles | Mixed | SILVER | GameTeam | Kristin SKASLIEN | Norway | NO | NOR |
3 | Curling | beijing-2022 | Mixed Doubles | Mixed | SILVER | GameTeam | Magnus NEDREGOTTEN | Norway | NO | NOR |
4 | Curling | beijing-2022 | Mixed Doubles | Mixed | BRONZE | GameTeam | Almida DE VAL | Sweden | SE | SWE |
5 | Curling | beijing-2022 | Mixed Doubles | Mixed | BRONZE | GameTeam | Oskar ERIKSSON | Sweden | SE | SWE |
12 | Freestyle Skiing | beijing-2022 | Men's Moguls | Men | SILVER | Athlete | Mikael KINGSBURY | Canada | CA | CAN |
13 | Freestyle Skiing | beijing-2022 | Men's Moguls | Men | GOLD | Athlete | Walter WALLBERG | Sweden | SE | SWE |
14 | Freestyle Skiing | beijing-2022 | Men's Moguls | Men | BRONZE | Athlete | Ikuma HORISHIMA | Japan | JP | JPN |
15 | Freestyle Skiing | beijing-2022 | Men's Freeski Halfpipe | Men | GOLD | Athlete | Nico PORTEOUS | New Zealand | NZ | NZL |
discipline_title | slug_game | event_title | event_gender | medal_type | participant_type | athlete_full_name | country_name | country_code | country_3_letter_code | |
---|---|---|---|---|---|---|---|---|---|---|
21685 | Tennis | athens-1896 | doubles men | Men | SILVER | GameTeam | Dimitrios PETROKOKKINOS | Greece | GR | GRE |
21688 | Wrestling | athens-1896 | Unlimited Class, Greco-Roman Men | Men | GOLD | Athlete | Carl SCHUHMANN | Germany | DE | GER |
21689 | Wrestling | athens-1896 | Unlimited Class, Greco-Roman Men | Men | SILVER | Athlete | Georgios TSITAS | Greece | GR | GRE |
21690 | Wrestling | athens-1896 | Unlimited Class, Greco-Roman Men | Men | BRONZE | Athlete | Stefanos Khristopoulos | Greece | GR | GRE |
21691 | Weightlifting | athens-1896 | heavyweight - one hand lift men | Men | GOLD | Athlete | Launceston ELLIOT | Great Britain | GB | GBR |
21692 | Weightlifting | athens-1896 | heavyweight - one hand lift men | Men | SILVER | Athlete | Viggo JENSEN | Denmark | DK | DEN |
21693 | Weightlifting | athens-1896 | heavyweight - one hand lift men | Men | BRONZE | Athlete | Alexandros Nikolopoulos | Greece | GR | GRE |
21694 | Weightlifting | athens-1896 | heavyweight - two hand lift men | Men | GOLD | Athlete | Viggo JENSEN | Denmark | DK | DEN |
21695 | Weightlifting | athens-1896 | heavyweight - two hand lift men | Men | SILVER | Athlete | Launceston ELLIOT | Great Britain | GB | GBR |
21696 | Weightlifting | athens-1896 | heavyweight - two hand lift men | Men | BRONZE | Athlete | Sotirios VERSIS | Greece | GR | GRE |