OlympicMinia Report

Dataset statistics

Number of variables	10
Number of observations	16819
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.4 MiB
Average record size in memory	88.0 B

Variable types

Categorical	10

Alerts

`discipline_title` has a high cardinality: 67 distinct values	High cardinality
`slug_game` has a high cardinality: 53 distinct values	High cardinality
`event_title` has a high cardinality: 1192 distinct values	High cardinality
`athlete_full_name` has a high cardinality: 12074 distinct values	High cardinality
`country_name` has a high cardinality: 141 distinct values	High cardinality
`country_code` has a high cardinality: 140 distinct values	High cardinality
`country_3_letter_code` has a high cardinality: 141 distinct values	High cardinality
`discipline_title` is highly overall correlated with `event_gender` and 1 other fields	High correlation
`event_gender` is highly overall correlated with `discipline_title`	High correlation
`participant_type` is highly overall correlated with `discipline_title`	High correlation
`athlete_full_name` is uniformly distributed	Uniform

Reproduction

Analysis started	2023-07-24 03:59:27.149715
Analysis finished	2023-07-24 03:59:30.139575
Duration	2.99 seconds
Software version	pandas-profiling v3.6.6
Download configuration	config.json

discipline_title
Categorical

HIGH CARDINALITY HIGH CORRELATION

Distinct	67
Distinct (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

Athletics	2583
Swimming	1386
Wrestling	1220
Boxing	941
Shooting	722
Other values (62)	9967

Length

Max length	25
Median length	20
Mean length	10.075688
Min length	4

Characters and Unicode

Total characters	169463
Distinct characters	45
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	Curling
2nd row	Curling
3rd row	Curling
4th row	Curling
5th row	Curling

Common Values

Value	Count	Frequency (%)
Athletics	2583	15.4%
Swimming	1386	8.2%
Wrestling	1220	7.3%
Boxing	941	5.6%
Shooting	722	4.3%
Canoe Sprint	650	3.9%
Gymnastics Artistic	642	3.8%
Rowing	618	3.7%
Weightlifting	606	3.6%
Sailing	577	3.4%
Other values (57)	6874	40.9%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
athletics	2583	11.4%
swimming	1473	6.5%
wrestling	1220	5.4%
skiing	1042	4.6%
skating	982	4.4%
boxing	941	4.2%
canoe	815	3.6%
shooting	722	3.2%
gymnastics	711	3.2%
artistic	685	3.0%
Other values (64)	11392	50.5%

Most occurring characters

Value	Count	Frequency (%)
i	22620	13.3%
n	16174	9.5%
t	13839	8.2%
g	11324	6.7%
e	10555	6.2%
s	8696	5.1%
l	7638	4.5%
o	7539	4.4%
c	6002	3.5%
S	5899	3.5%
Other values (35)	59177	34.9%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	141952	83.8%
Uppercase Letter	21761	12.8%
Space Separator	5750	3.4%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
i	22620	15.9%
n	16174	11.4%
t	13839	9.7%
g	11324	8.0%
e	10555	7.4%
s	8696	6.1%
l	7638	5.4%
o	7539	5.3%
c	6002	4.2%
r	5711	4.0%
Other values (15)	31854	22.4%

Uppercase Letter

Value	Count	Frequency (%)
S	5899	27.1%
A	3884	17.8%
C	2385	11.0%
W	1826	8.4%
B	1641	7.5%
T	1427	6.6%
F	862	4.0%
R	807	3.7%
J	784	3.6%
G	733	3.4%
Other values (9)	1513	7.0%

Space Separator

Value	Count	Frequency (%)
	5750	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	163713	96.6%
Common	5750	3.4%

Most frequent character per script

Latin

Value	Count	Frequency (%)
i	22620	13.8%
n	16174	9.9%
t	13839	8.5%
g	11324	6.9%
e	10555	6.4%
s	8696	5.3%
l	7638	4.7%
o	7539	4.6%
c	6002	3.7%
S	5899	3.6%
Other values (34)	53427	32.6%

Common

Value	Count	Frequency (%)
	5750	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	169463	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
i	22620	13.3%
n	16174	9.5%
t	13839	8.2%
g	11324	6.7%
e	10555	6.2%
s	8696	5.1%
l	7638	4.5%
o	7539	4.4%
c	6002	3.5%
S	5899	3.5%
Other values (35)	59177	34.9%

slug_game
Categorical

Distinct	53
Distinct (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

tokyo-2020	1006
rio-2016	912
beijing-2008	896
london-2012	895
athens-2004	878
Other values (48)	12232

Length

Max length	27
Median length	19
Mean length	12.043403
Min length	8

Characters and Unicode

Total characters	202558
Distinct characters	36
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	beijing-2022
2nd row	beijing-2022
3rd row	beijing-2022
4th row	beijing-2022
5th row	beijing-2022

Common Values

Value	Count	Frequency (%)
tokyo-2020	1006	6.0%
rio-2016	912	5.4%
beijing-2008	896	5.3%
london-2012	895	5.3%
athens-2004	878	5.2%
sydney-2000	876	5.2%
atlanta-1996	772	4.6%
barcelona-1992	654	3.9%
los-angeles-1984	604	3.6%
seoul-1988	548	3.3%
Other values (43)	8778	52.2%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
tokyo-2020	1006	6.0%
rio-2016	912	5.4%
beijing-2008	896	5.3%
london-2012	895	5.3%
athens-2004	878	5.2%
sydney-2000	876	5.2%
atlanta-1996	772	4.6%
barcelona-1992	654	3.9%
los-angeles-1984	604	3.6%
seoul-1988	548	3.3%
Other values (43)	8778	52.2%

Most occurring characters

Value	Count	Frequency (%)
-	19216	9.5%
0	14222	7.0%
o	13281	6.6%
2	12847	6.3%
1	12657	6.2%
n	12467	6.2%
9	11772	5.8%
e	10879	5.4%
a	9944	4.9%
l	9069	4.5%
Other values (26)	76204	37.6%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	116066	57.3%
Decimal Number	67276	33.2%
Dash Punctuation	19216	9.5%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
o	13281	11.4%
n	12467	10.7%
e	10879	9.4%
a	9944	8.6%
l	9069	7.8%
i	7755	6.7%
s	7508	6.5%
t	6831	5.9%
r	5581	4.8%
y	4102	3.5%
Other values (15)	28649	24.7%

Decimal Number

Value	Count	Frequency (%)
0	14222	21.1%
2	12847	19.1%
1	12657	18.8%
9	11772	17.5%
8	5361	8.0%
6	4499	6.7%
4	3450	5.1%
7	1031	1.5%
5	743	1.1%
3	694	1.0%

Dash Punctuation

Value	Count	Frequency (%)
-	19216	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	116066	57.3%
Common	86492	42.7%

Most frequent character per script

Latin

Value	Count	Frequency (%)
o	13281	11.4%
n	12467	10.7%
e	10879	9.4%
a	9944	8.6%
l	9069	7.8%
i	7755	6.7%
s	7508	6.5%
t	6831	5.9%
r	5581	4.8%
y	4102	3.5%
Other values (15)	28649	24.7%

Common

Value	Count	Frequency (%)
-	19216	22.2%
0	14222	16.4%
2	12847	14.9%
1	12657	14.6%
9	11772	13.6%
8	5361	6.2%
6	4499	5.2%
4	3450	4.0%
7	1031	1.2%
5	743	0.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	202558	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
-	19216	9.5%
0	14222	7.0%
o	13281	6.6%
2	12847	6.3%
1	12657	6.2%
n	12467	6.2%
9	11772	5.8%
e	10879	5.4%
a	9944	4.9%
l	9069	4.5%
Other values (26)	76204	37.6%

event_title
Categorical

Distinct	1192
Distinct (%)	7.1%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

Individual men	208
individual mixed	183
doubles men	162
1500m men	154
doubles women	138
Other values (1187)	15974

Length

Max length	52
Median length	43
Mean length	20.328795
Min length	3

Characters and Unicode

Total characters	341910
Distinct characters	75
Distinct categories	11 ?
Distinct scripts	2 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	< 0.1%

Sample

1st row	Mixed Doubles
2nd row	Mixed Doubles
3rd row	Mixed Doubles
4th row	Mixed Doubles
5th row	Mixed Doubles

Common Values

Value	Count	Frequency (%)
Individual men	208	1.2%
individual mixed	183	1.1%
doubles men	162	1.0%
1500m men	154	0.9%
doubles women	138	0.8%
5000m men	136	0.8%
Singles men	135	0.8%
10000m men	129	0.8%
Singles women	128	0.8%
Individual women	125	0.7%
Other values (1182)	15321	91.1%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
men	10209	18.4%
women	4409	7.9%
individual	1774	3.2%
freestyle	1259	2.3%
kilograms	974	1.8%
mixed	969	1.7%
double	801	1.4%
	751	1.4%
100m	681	1.2%
200m	653	1.2%
Other values (638)	32991	59.5%

Most occurring characters

Value	Count	Frequency (%)
	39230	11.5%
e	35649	10.4%
m	25327	7.4%
n	24356	7.1%
i	17486	5.1%
o	15594	4.6%
l	14683	4.3%
a	13019	3.8%
t	12252	3.6%
0	12116	3.5%
Other values (65)	132198	38.7%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	249937	73.1%
Space Separator	39230	11.5%
Decimal Number	31558	9.2%
Uppercase Letter	11810	3.5%
Dash Punctuation	2789	0.8%
Other Punctuation	2361	0.7%
Close Punctuation	1450	0.4%
Open Punctuation	1450	0.4%
Math Symbol	1105	0.3%
Final Punctuation	215	0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
e	35649	14.3%
m	25327	10.1%
n	24356	9.7%
i	17486	7.0%
o	15594	6.2%
l	14683	5.9%
a	13019	5.2%
t	12252	4.9%
s	11663	4.7%
r	10565	4.2%
Other values (17)	69343	27.7%

Uppercase Letter

Value	Count	Frequency (%)
M	2190	18.5%
F	1144	9.7%
S	1081	9.2%
I	836	7.1%
W	792	6.7%
G	679	5.7%
R	667	5.6%
H	591	5.0%
K	557	4.7%
P	514	4.4%
Other values (14)	2759	23.4%

Decimal Number

Value	Count	Frequency (%)
0	12116	38.4%
1	4169	13.2%
5	3925	12.4%
2	2682	8.5%
6	2092	6.6%
7	1949	6.2%
4	1389	4.4%
8	1384	4.4%
3	1220	3.9%
9	632	2.0%

Other Punctuation

Value	Count	Frequency (%)
,	1042	44.1%
'	979	41.5%
.	307	13.0%
/	21	0.9%
:	12	0.5%

Math Symbol

Value	Count	Frequency (%)
≤	908	82.2%
+	104	9.4%
>	93	8.4%

Space Separator

Value	Count	Frequency (%)
	39230	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	2789	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1450	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	1450	100.0%

Final Punctuation

Value	Count	Frequency (%)
’	215	100.0%

Other Number

Value	Count	Frequency (%)
½	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	261747	76.6%
Common	80163	23.4%

Most frequent character per script

Latin

Value	Count	Frequency (%)
e	35649	13.6%
m	25327	9.7%
n	24356	9.3%
i	17486	6.7%
o	15594	6.0%
l	14683	5.6%
a	13019	5.0%
t	12252	4.7%
s	11663	4.5%
r	10565	4.0%
Other values (41)	81153	31.0%

Common

Value	Count	Frequency (%)
	39230	48.9%
0	12116	15.1%
1	4169	5.2%
5	3925	4.9%
-	2789	3.5%
2	2682	3.3%
6	2092	2.6%
7	1949	2.4%
)	1450	1.8%
(	1450	1.8%
Other values (14)	8311	10.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	340584	99.6%
Math Operators	908	0.3%
Punctuation	215	0.1%
None	203	0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	39230	11.5%
e	35649	10.5%
m	25327	7.4%
n	24356	7.2%
i	17486	5.1%
o	15594	4.6%
l	14683	4.3%
a	13019	3.8%
t	12252	3.6%
0	12116	3.6%
Other values (59)	130872	38.4%

Math Operators

Value	Count	Frequency (%)
≤	908	100.0%

Punctuation

Value	Count	Frequency (%)
’	215	100.0%

None

Value	Count	Frequency (%)
é	186	91.6%
à	6	3.0%
É	6	3.0%
½	5	2.5%

event_gender
Categorical

Distinct	4
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

Men	10842
Women	5003
Open	638
Mixed	336

Length

Max length	5
Median length	3
Mean length	3.6728105
Min length	3

Characters and Unicode

Total characters	61773
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	Mixed
2nd row	Mixed
3rd row	Mixed
4th row	Mixed
5th row	Mixed

Common Values

Value	Count	Frequency (%)
Men	10842	64.5%
Women	5003	29.7%
Open	638	3.8%
Mixed	336	2.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
men	10842	64.5%
women	5003	29.7%
open	638	3.8%
mixed	336	2.0%

Most occurring characters

Value	Count	Frequency (%)
e	16819	27.2%
n	16483	26.7%
M	11178	18.1%
W	5003	8.1%
o	5003	8.1%
m	5003	8.1%
O	638	1.0%
p	638	1.0%
i	336	0.5%
x	336	0.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	44954	72.8%
Uppercase Letter	16819	27.2%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
e	16819	37.4%
n	16483	36.7%
o	5003	11.1%
m	5003	11.1%
p	638	1.4%
i	336	0.7%
x	336	0.7%
d	336	0.7%

Uppercase Letter

Value	Count	Frequency (%)
M	11178	66.5%
W	5003	29.7%
O	638	3.8%

Most occurring scripts

Value	Count	Frequency (%)
Latin	61773	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
e	16819	27.2%
n	16483	26.7%
M	11178	18.1%
W	5003	8.1%
o	5003	8.1%
m	5003	8.1%
O	638	1.0%
p	638	1.0%
i	336	0.5%
x	336	0.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	61773	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
e	16819	27.2%
n	16483	26.7%
M	11178	18.1%
W	5003	8.1%
o	5003	8.1%
m	5003	8.1%
O	638	1.0%
p	638	1.0%
i	336	0.5%
x	336	0.5%

medal_type
Categorical

Distinct	3
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

BRONZE	5959
SILVER	5451
GOLD	5409

Length

Max length	6
Median length	6
Mean length	5.3567989
Min length	4

Characters and Unicode

Total characters	90096
Distinct characters	12
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	GOLD
2nd row	GOLD
3rd row	SILVER
4th row	SILVER
5th row	BRONZE

Common Values

Value	Count	Frequency (%)
BRONZE	5959	35.4%
SILVER	5451	32.4%
GOLD	5409	32.2%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
bronze	5959	35.4%
silver	5451	32.4%
gold	5409	32.2%

Most occurring characters

Value	Count	Frequency (%)
R	11410	12.7%
E	11410	12.7%
O	11368	12.6%
L	10860	12.1%
B	5959	6.6%
N	5959	6.6%
Z	5959	6.6%
S	5451	6.1%
I	5451	6.1%
V	5451	6.1%
Other values (2)	10818	12.0%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	90096	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
R	11410	12.7%
E	11410	12.7%
O	11368	12.6%
L	10860	12.1%
B	5959	6.6%
N	5959	6.6%
Z	5959	6.6%
S	5451	6.1%
I	5451	6.1%
V	5451	6.1%
Other values (2)	10818	12.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	90096	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
R	11410	12.7%
E	11410	12.7%
O	11368	12.6%
L	10860	12.1%
B	5959	6.6%
N	5959	6.6%
Z	5959	6.6%
S	5451	6.1%
I	5451	6.1%
V	5451	6.1%
Other values (2)	10818	12.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90096	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
R	11410	12.7%
E	11410	12.7%
O	11368	12.6%
L	10860	12.1%
B	5959	6.6%
N	5959	6.6%
Z	5959	6.6%
S	5451	6.1%
I	5451	6.1%
V	5451	6.1%
Other values (2)	10818	12.0%

participant_type
Categorical

Distinct	2
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

Athlete	14021
GameTeam	2798

Length

Max length	8
Median length	7
Mean length	7.1663595
Min length	7

Characters and Unicode

Total characters	120531
Distinct characters	9
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	GameTeam
2nd row	GameTeam
3rd row	GameTeam
4th row	GameTeam
5th row	GameTeam

Common Values

Value	Count	Frequency (%)
Athlete	14021	83.4%
GameTeam	2798	16.6%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
athlete	14021	83.4%
gameteam	2798	16.6%

Most occurring characters

Value	Count	Frequency (%)
e	33638	27.9%
t	28042	23.3%
A	14021	11.6%
h	14021	11.6%
l	14021	11.6%
a	5596	4.6%
m	5596	4.6%
G	2798	2.3%
T	2798	2.3%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	100914	83.7%
Uppercase Letter	19617	16.3%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
e	33638	33.3%
t	28042	27.8%
h	14021	13.9%
l	14021	13.9%
a	5596	5.5%
m	5596	5.5%

Uppercase Letter

Value	Count	Frequency (%)
A	14021	71.5%
G	2798	14.3%
T	2798	14.3%

Most occurring scripts

Value	Count	Frequency (%)
Latin	120531	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
e	33638	27.9%
t	28042	23.3%
A	14021	11.6%
h	14021	11.6%
l	14021	11.6%
a	5596	4.6%
m	5596	4.6%
G	2798	2.3%
T	2798	2.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	120531	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
e	33638	27.9%
t	28042	23.3%
A	14021	11.6%
h	14021	11.6%
l	14021	11.6%
a	5596	4.6%
m	5596	4.6%
G	2798	2.3%
T	2798	2.3%

athlete_full_name
Categorical

HIGH CARDINALITY UNIFORM

Distinct	12074
Distinct (%)	71.8%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

Michael PHELPS	16
Marit BJOERGEN	12
Ireen WÜST	10
Takashi ONO	10
Alexei NEMOV	10
Other values (12069)	16761

Length

Max length	38
Median length	34
Mean length	15.055651
Min length	3

Characters and Unicode

Total characters	253221
Distinct characters	103
Distinct categories	7 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	9028 ?
Unique (%)	53.7%

Sample

1st row	Stefania CONSTANTINI
2nd row	Amos MOSANER
3rd row	Kristin SKASLIEN
4th row	Magnus NEDREGOTTEN
5th row	Almida DE VAL

Common Values

Value	Count	Frequency (%)
Michael PHELPS	16	0.1%
Marit BJOERGEN	12	0.1%
Ireen WÜST	10	0.1%
Takashi ONO	10	0.1%
Alexei NEMOV	10	0.1%
Björn DAEHLIE	9	0.1%
Paavo NURMI	9	0.1%
Sawao KATO	9	0.1%
Ole Einar BJØRNDALEN	9	0.1%
Ray EWRY	8	< 0.1%
Other values (12064)	16717	99.4%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
john	207	0.6%
van	122	0.3%
thomas	121	0.3%
robert	116	0.3%
michael	114	0.3%
david	107	0.3%
peter	104	0.3%
charles	103	0.3%
william	102	0.3%
kim	99	0.3%
Other values (14704)	36006	96.8%

Most occurring characters

Value	Count	Frequency (%)
	20391	8.1%
a	13444	5.3%
A	13097	5.2%
E	12598	5.0%
e	11110	4.4%
i	9692	3.8%
n	9682	3.8%
N	9481	3.7%
R	9448	3.7%
r	8601	3.4%
Other values (93)	135677	53.6%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	132821	52.5%
Lowercase Letter	98619	38.9%
Space Separator	20391	8.1%
Dash Punctuation	936	0.4%
Other Punctuation	388	0.2%
Open Punctuation	33	< 0.1%
Close Punctuation	33	< 0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
a	13444	13.6%
e	11110	11.3%
i	9692	9.8%
n	9682	9.8%
r	8601	8.7%
o	6675	6.8%
l	6205	6.3%
t	4326	4.4%
s	4309	4.4%
h	3362	3.4%
Other values (40)	21213	21.5%

Uppercase Letter

Value	Count	Frequency (%)
A	13097	9.9%
E	12598	9.5%
N	9481	7.1%
R	9448	7.1%
S	8318	6.3%
I	8318	6.3%
O	8135	6.1%
L	7243	5.5%
T	5679	4.3%
M	5495	4.1%
Other values (36)	45009	33.9%

Other Punctuation

Value	Count	Frequency (%)
.	311	80.2%
'	47	12.1%
,	30	7.7%

Space Separator

Value	Count	Frequency (%)
	20391	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	936	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	33	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	33	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	231440	91.4%
Common	21781	8.6%

Most frequent character per script

Latin

Value	Count	Frequency (%)
a	13444	5.8%
A	13097	5.7%
E	12598	5.4%
e	11110	4.8%
i	9692	4.2%
n	9682	4.2%
N	9481	4.1%
R	9448	4.1%
r	8601	3.7%
S	8318	3.6%
Other values (86)	125969	54.4%

Common

Value	Count	Frequency (%)
	20391	93.6%
-	936	4.3%
.	311	1.4%
'	47	0.2%
(	33	0.2%
)	33	0.2%
,	30	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	252374	99.7%
None	824	0.3%
IPA Ext	23	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	20391	8.1%
a	13444	5.3%
A	13097	5.2%
E	12598	5.0%
e	11110	4.4%
i	9692	3.8%
n	9682	3.8%
N	9481	3.8%
R	9448	3.7%
r	8601	3.4%
Other values (49)	134830	53.4%

None

Value	Count	Frequency (%)
Ö	198	24.0%
ö	116	14.1%
Ä	102	12.4%
Ü	78	9.5%
é	63	7.6%
ü	53	6.4%
Ø	21	2.5%
ä	20	2.4%
á	19	2.3%
ç	18	2.2%
Other values (33)	136	16.5%

IPA Ext

Value	Count	Frequency (%)
ə	23	100.0%

country_name
Categorical

Distinct	141
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

United States of America	2616
Germany	923
Great Britain	812
France	746
People's Republic of China	734
Other values (136)	10988

Length

Max length	37
Median length	28
Mean length	12.656341
Min length	3

Characters and Unicode

Total characters	212867
Distinct characters	56
Distinct categories	6 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	20 ?
Unique (%)	0.1%

Sample

1st row	Italy
2nd row	Italy
3rd row	Norway
4th row	Norway
5th row	Sweden

Common Values

Value	Count	Frequency (%)
United States of America	2616	15.6%
Germany	923	5.5%
Great Britain	812	4.8%
France	746	4.4%
People's Republic of China	734	4.4%
Italy	618	3.7%
Sweden	555	3.3%
Russian Federation	511	3.0%
Japan	508	3.0%
Australia	495	2.9%
Other values (131)	8301	49.4%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
of	4029	12.7%
united	2620	8.2%
america	2616	8.2%
states	2616	8.2%
republic	1996	6.3%
germany	1591	5.0%
britain	812	2.6%
great	812	2.6%
people's	791	2.5%
france	746	2.3%
Other values (158)	13192	41.5%

Most occurring characters

Value	Count	Frequency (%)
a	24209	11.4%
e	22086	10.4%
i	15512	7.3%
	15002	7.0%
n	13874	6.5%
t	13336	6.3%
r	12839	6.0%
o	8509	4.0%
c	7166	3.4%
l	6947	3.3%
Other values (46)	73387	34.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	168152	79.0%
Uppercase Letter	27976	13.1%
Space Separator	15002	7.0%
Other Punctuation	805	0.4%
Close Punctuation	466	0.2%
Open Punctuation	466	0.2%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
a	24209	14.4%
e	22086	13.1%
i	15512	9.2%
n	13874	8.3%
t	13336	7.9%
r	12839	7.6%
o	8509	5.1%
c	7166	4.3%
l	6947	4.1%
s	6415	3.8%
Other values (17)	37259	22.2%

Uppercase Letter

Value	Count	Frequency (%)
S	3889	13.9%
A	3709	13.3%
G	3040	10.9%
R	2901	10.4%
U	2818	10.1%
C	1913	6.8%
F	1879	6.7%
B	1430	5.1%
P	1142	4.1%
N	1047	3.7%
Other values (14)	4208	15.0%

Other Punctuation

Value	Count	Frequency (%)
'	795	98.8%
,	10	1.2%

Space Separator

Value	Count	Frequency (%)
	15002	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	466	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	466	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	196128	92.1%
Common	16739	7.9%

Most frequent character per script

Latin

Value	Count	Frequency (%)
a	24209	12.3%
e	22086	11.3%
i	15512	7.9%
n	13874	7.1%
t	13336	6.8%
r	12839	6.5%
o	8509	4.3%
c	7166	3.7%
l	6947	3.5%
s	6415	3.3%
Other values (41)	65235	33.3%

Common

Value	Count	Frequency (%)
	15002	89.6%
'	795	4.7%
)	466	2.8%
(	466	2.8%
,	10	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	212863	> 99.9%
None	4	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
a	24209	11.4%
e	22086	10.4%
i	15512	7.3%
	15002	7.0%
n	13874	6.5%
t	13336	6.3%
r	12839	6.0%
o	8509	4.0%
c	7166	3.4%
l	6947	3.3%
Other values (45)	73383	34.5%

None

Value	Count	Frequency (%)
ô	4	100.0%

country_code
Categorical

Distinct	140
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

US	2616
DE	1125
GB	812
FR	746
CN	734
Other values (135)	10786

Length

Max length	4
Median length	2
Mean length	2.0869255
Min length	2

Characters and Unicode

Total characters	35100
Distinct characters	26
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	20 ?
Unique (%)	0.1%

Sample

1st row	IT
2nd row	IT
3rd row	NO
4th row	NO
5th row	SE

Common Values

Value	Count	Frequency (%)
US	2616	15.6%
DE	1125	6.7%
GB	812	4.8%
FR	746	4.4%
CN	734	4.4%
IT	618	3.7%
SE	555	3.3%
RU	511	3.0%
JP	508	3.0%
AU	495	2.9%
Other values (130)	8099	48.2%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
us	2616	15.6%
de	1125	6.7%
gb	812	4.8%
fr	746	4.4%
cn	734	4.4%
it	618	3.7%
se	555	3.3%
ru	511	3.0%
jp	508	3.0%
au	495	2.9%
Other values (130)	8099	48.2%

Most occurring characters

Value	Count	Frequency (%)
U	4564	13.0%
S	3699	10.5%
E	2837	8.1%
D	2782	7.9%
R	2568	7.3%
C	2221	6.3%
N	1852	5.3%
A	1714	4.9%
B	1431	4.1%
I	1309	3.7%
Other values (16)	10123	28.8%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	35100	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
U	4564	13.0%
S	3699	10.5%
E	2837	8.1%
D	2782	7.9%
R	2568	7.3%
C	2221	6.3%
N	1852	5.3%
A	1714	4.9%
B	1431	4.1%
I	1309	3.7%
Other values (16)	10123	28.8%

Most occurring scripts

Value	Count	Frequency (%)
Latin	35100	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
U	4564	13.0%
S	3699	10.5%
E	2837	8.1%
D	2782	7.9%
R	2568	7.3%
C	2221	6.3%
N	1852	5.3%
A	1714	4.9%
B	1431	4.1%
I	1309	3.7%
Other values (16)	10123	28.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	35100	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
U	4564	13.0%
S	3699	10.5%
E	2837	8.1%
D	2782	7.9%
R	2568	7.3%
C	2221	6.3%
N	1852	5.3%
A	1714	4.9%
B	1431	4.1%
I	1309	3.7%
Other values (16)	10123	28.8%

country_3_letter_code
Categorical

Distinct	141
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	262.8 KiB

USA	2616
GER	923
GBR	812
FRA	746
CHN	734
Other values (136)	10988

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Characters and Unicode

Total characters	50457
Distinct characters	26
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	20 ?
Unique (%)	0.1%

Sample

1st row	ITA
2nd row	ITA
3rd row	NOR
4th row	NOR
5th row	SWE

Common Values

Value	Count	Frequency (%)
USA	2616	15.6%
GER	923	5.5%
GBR	812	4.8%
FRA	746	4.4%
CHN	734	4.4%
ITA	618	3.7%
SWE	555	3.3%
RUS	511	3.0%
JPN	508	3.0%
AUS	495	2.9%
Other values (131)	8301	49.4%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
usa	2616	15.6%
ger	923	5.5%
gbr	812	4.8%
fra	746	4.4%
chn	734	4.4%
ita	618	3.7%
swe	555	3.3%
rus	511	3.0%
jpn	508	3.0%
aus	495	2.9%
Other values (131)	8301	49.4%

Most occurring characters

Value	Count	Frequency (%)
A	6037	12.0%
R	6013	11.9%
U	5917	11.7%
S	4975	9.9%
N	4022	8.0%
E	3053	6.1%
G	2846	5.6%
C	1884	3.7%
O	1719	3.4%
I	1704	3.4%
Other values (16)	12287	24.4%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	50457	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
A	6037	12.0%
R	6013	11.9%
U	5917	11.7%
S	4975	9.9%
N	4022	8.0%
E	3053	6.1%
G	2846	5.6%
C	1884	3.7%
O	1719	3.4%
I	1704	3.4%
Other values (16)	12287	24.4%

Most occurring scripts

Value	Count	Frequency (%)
Latin	50457	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
A	6037	12.0%
R	6013	11.9%
U	5917	11.7%
S	4975	9.9%
N	4022	8.0%
E	3053	6.1%
G	2846	5.6%
C	1884	3.7%
O	1719	3.4%
I	1704	3.4%
Other values (16)	12287	24.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	50457	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
A	6037	12.0%
R	6013	11.9%
U	5917	11.7%
S	4975	9.9%
N	4022	8.0%
E	3053	6.1%
G	2846	5.6%
C	1884	3.7%
O	1719	3.4%
I	1704	3.4%
Other values (16)	12287	24.4%

Auto

Heatmap
Table

	discipline_title	slug_game	event_gender	medal_type	participant_type
discipline_title	1.000	0.183	0.568	0.033	0.768
slug_game	0.183	1.000	0.222	0.000	0.108
event_gender	0.568	0.222	1.000	0.000	0.386
medal_type	0.033	0.000	0.000	1.000	0.000
participant_type	0.768	0.108	0.386	0.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	discipline_title	slug_game	event_title	event_gender	medal_type	participant_type	athlete_full_name	country_name	country_code	country_3_letter_code
0	Curling	beijing-2022	Mixed Doubles	Mixed	GOLD	GameTeam	Stefania CONSTANTINI	Italy	IT	ITA
1	Curling	beijing-2022	Mixed Doubles	Mixed	GOLD	GameTeam	Amos MOSANER	Italy	IT	ITA
2	Curling	beijing-2022	Mixed Doubles	Mixed	SILVER	GameTeam	Kristin SKASLIEN	Norway	NO	NOR
3	Curling	beijing-2022	Mixed Doubles	Mixed	SILVER	GameTeam	Magnus NEDREGOTTEN	Norway	NO	NOR
4	Curling	beijing-2022	Mixed Doubles	Mixed	BRONZE	GameTeam	Almida DE VAL	Sweden	SE	SWE
5	Curling	beijing-2022	Mixed Doubles	Mixed	BRONZE	GameTeam	Oskar ERIKSSON	Sweden	SE	SWE
12	Freestyle Skiing	beijing-2022	Men's Moguls	Men	SILVER	Athlete	Mikael KINGSBURY	Canada	CA	CAN
13	Freestyle Skiing	beijing-2022	Men's Moguls	Men	GOLD	Athlete	Walter WALLBERG	Sweden	SE	SWE
14	Freestyle Skiing	beijing-2022	Men's Moguls	Men	BRONZE	Athlete	Ikuma HORISHIMA	Japan	JP	JPN
15	Freestyle Skiing	beijing-2022	Men's Freeski Halfpipe	Men	GOLD	Athlete	Nico PORTEOUS	New Zealand	NZ	NZL

	discipline_title	slug_game	event_title	event_gender	medal_type	participant_type	athlete_full_name	country_name	country_code	country_3_letter_code
21685	Tennis	athens-1896	doubles men	Men	SILVER	GameTeam	Dimitrios PETROKOKKINOS	Greece	GR	GRE
21688	Wrestling	athens-1896	Unlimited Class, Greco-Roman Men	Men	GOLD	Athlete	Carl SCHUHMANN	Germany	DE	GER
21689	Wrestling	athens-1896	Unlimited Class, Greco-Roman Men	Men	SILVER	Athlete	Georgios TSITAS	Greece	GR	GRE
21690	Wrestling	athens-1896	Unlimited Class, Greco-Roman Men	Men	BRONZE	Athlete	Stefanos Khristopoulos	Greece	GR	GRE
21691	Weightlifting	athens-1896	heavyweight - one hand lift men	Men	GOLD	Athlete	Launceston ELLIOT	Great Britain	GB	GBR
21692	Weightlifting	athens-1896	heavyweight - one hand lift men	Men	SILVER	Athlete	Viggo JENSEN	Denmark	DK	DEN
21693	Weightlifting	athens-1896	heavyweight - one hand lift men	Men	BRONZE	Athlete	Alexandros Nikolopoulos	Greece	GR	GRE
21694	Weightlifting	athens-1896	heavyweight - two hand lift men	Men	GOLD	Athlete	Viggo JENSEN	Denmark	DK	DEN
21695	Weightlifting	athens-1896	heavyweight - two hand lift men	Men	SILVER	Athlete	Launceston ELLIOT	Great Britain	GB	GBR
21696	Weightlifting	athens-1896	heavyweight - two hand lift men	Men	BRONZE	Athlete	Sotirios VERSIS	Greece	GR	GRE

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Space Separator

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Decimal Number

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Decimal Number

Other Punctuation

Math Symbol

Space Separator

Dash Punctuation

Close Punctuation

Open Punctuation

Final Punctuation

Other Number

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Math Operators

Punctuation

None

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Most occurring scripts

Most frequent character per script

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Most occurring scripts