Alpine Ski on Instagram | Part I: Athletes’ posts
At the end of the 2019 FIS Alpine World Ski Championships, my partner Jacopo Ghiglieri and I decided to embark on what was supposed to be a quick side project. The aim was to analyse the differences in the way athletes from the women and men’s circuits speak about their accomplishments on social media.
Being passionate followers of alpine-ski events, over the years we have developed a qualitative feeling about these differences, especially when it comes to mentions of other athletes, whether they be competitors, team mates, or skiers from the other sex¹. The reactions from fans, in particular, seem to diverge considerably in the two cases, though they tend to be so numerous it is impossible to process them all humanly.
We decided to collect a sample of Instagram posts², comments included, and run some basic text-mining techniques to see if interesting patterns showed up. In this post we focus on the athletes and their writing. We hope to address the followers’ reactions in a not-too-distant future.
The sample we are using consists of 82 posts referring to race outcomes from the Åre World Championship.
The FIS Alpine World Ski Championships is a biennial event held for the first time in 1931. The 45th edition was held in Åre (Sweden) in February 2019 and featured 471 athletes from 76 countries. It’s a prestigious event: winning a medal shoots athletes to stardom and grants them a place in sport history, regardless of previous or future performance. (A more accurate evaluation of an athlete’s value should be based on their results in the annual FIS Alpine Ski World Cup, with 30–40 races over one season.)
A high-profile event, a single location, a time span of only 13 days, 11 races covering all disciplines (Slalom, Giant Slalom, Super-G, Downhill, Alpine Combined, Team Event). We thought this was a neat setup to focus on.
In what follows, we spend a few words on data collection and a number of choices we had to make when setting up the study. We start by an overview of the athletes and their Instagram base, the different languages involved, and how we evaluate the results of the races they competed in. We then move on to their writing: what they say in their posts (whether it is via words, emojis, or others) and if there are tangible differences between women and men.
Data collection and general properties of the sample
In collecting and analysing the data we relied on a number of tools, mostly open source, which we acknowledge at the end of the article. Our firsts steps were manual, though, and they involved the definition of the athlete samples as well as the identification of the relevant posts.
Whose posts were we going to collect? We decided for the athletes in the top 30 of the 2018–2019 World Cup overall ranking as of February 3, 2019 (date of the last World Cup race before the start of Åre 2019), plus the medalists in individual races from Pyeongchang-2018 (Winter Olympics) and St.Moritz-2017 (previous World Championship). This choice may seem odd–why not simply collecting all the posts from the World Championship? One reason is that we wanted to focus on athletes with a large enough visibility to generate a considerable social media engagement. The other reason is that we initially set out to continue following athletes throughout the World Cup season, a context where selecting the top-30 athletes would be considered standard. We didn’t pursue this plan eventually, and, in hindsight, our choice may have been different without this additional goal. However, we believe we have been sufficiently cautious in our conclusions that a minor shift in the reference sample would be inconsequential as far as these are concerned³.
The following figure shows the selected athletes, ranked by the number of followers as of February 2019.
Not all of these athletes took part in the Åre World Championships and not all of those who did posted messages on Instragram commenting on their performance. We grey these cases out, adding a brief explanation. For all the athletes in blue, we have collected at least one relevant post.
We stress that we are only interested in posts where the athletes react to their performance in a given competition. Generic posts and those untied to a specific race were not taken into consideration. This was in order to frame the problem as precisely as possible: of the whole spectrum of possible interactions, we are uniquely interested in how athletes speak about their performance (and how fans react to it). We only consider the 10 individual races (Slalom, Giant Slalom, Super-G, Downhill, Alpine Combined) and leave out the (mixed) Team Event.
From now on, when we refer to results for the women and men sample it is intended that we refer to the subgroup of athletes with at least one post (those in blue in Fig. 1). This consists of 49 athletes, 19 women and 30 men. The distribution of posts is more balanced: 37 for women and 45 for men, for a total of 82. This corresponds to an average of ~2 versus 1.5 posts per female and male athlete, respectively.
Though here we are only focussing on what the athletes post, Fig. 1 also gives us an idea about the size of the audience they speak to. The median number of Instagram followers is of 67 150 and 35 228 for the female and male sample, respectively. The outlier is Lindsey Vonn, with 1.6 million followers as of February 2019. (Mind this does not influence much the median value for the female sample, which is over 60 000 whether you include Vonn or not.) At the other end we find much smaller communities, but still a public of no less than 6 000.
Dealing with different languages
All of the athletes in the sample come from either Europe or North America. Indeed, of the 76 participating countries, only a dozen are competitive in any given World Cup season (and less than 30 have claimed a World Cup win in the history of the sport). We have identified 9 native languages and assigned one to each skier. This required making a choice for some of the Swiss athletes and the Italian athletes from South Tyrol, often bi- or tri-lingual. We went for the language we think each of them is most comfortable with, based on our subjective evaluation.
One thing is the mother tongue, and another thing is the language one picks to communicate with their fan base. We have detected 5 such languages overall. We assigned a language to each post, though, again, this required making a choice. For example, in some cases the athlete would write part of the post in their native language and the rest in English. In such cases, we consider the post to have been written in the native language.
We summarise the information on languages in Fig. 2. We don’t find it worthwhile to show the two samples separately in this case, so we are considering all athletes and all posts at once.
We can see how the vast majority of the collected posts (59 over 82, or ~72%) are written in English, though only very few of the athletes have English as their native language. The grid on the right-end side shows the detailed break-up of of these two language dimension; each square shows the number and the proportion of posts for each language combination. Looking at it, we can see how, for example, 34.1% of the posts have been written in English by athletes whose first language is German⁴.
Before analysing the textual content of posts, we performed an automatic translation into English for the ~28% of posts in other languages. In the presence of enough context and a well-enough structured text, the automatic translation is reasonably accurate. In the case of blatant failures, we decided to intervene and manually edit the translation. The problem of the quality of automatic translation is much more of a concern when it comes to treating the comments to the posts. Much shorter content (less context), typo-prone and heavily informal writing, many more languages to deal with, and a volume such that human supervision is impossible. A major headache, but one we will discuss next time. For now, we stick to the posts.
An attempt to evaluate the outcome of a race
For each race an athlete has competed in, we assign one of the following labels: underperformance, typical performance, overperformance. This is an attempt to an objective evaluation of the race outcome, based on the athlete’s previous results in the same discipline. For those who are interested in how this was computed, we refer to the technical notes at the end. To give an idea of what this indicator captures, imagine an athlete who collected placements between the 15th and 20th position over the 2018–2019 World Cup season and who ended up 6th at the World Championship: this would be considered an overperformance, even if no medal was involved. Conversely, a bronze medal could be considered an underperformance for an athlete who typically ranks first or second (as was the case with Vlhová in Slalom).
Such an indicator can only partially capture the expectations and reactions that athletes may have towards a given outcome, especially during a World Championship. (For example, a 4th place may well correspond to an overperformance by an objective measure and still be a let-down emotionally.) To get an idea, we read each of the posts and tagged them with a global sentiment flag (‘Excited’, ‘Neutral’, ‘Disappointed’). This is, of course, something that involves human judgement and can result into errors or oversimplifications. When comparing the ‘sentiment’ to the ‘objective’ classification, we register a good correspondence in around 60% of the cases (meaning that, say, an overperformance results into an ‘Excited’ post). Most of the mismatch involves typical-performance posts, in other words those for which the subjective reaction is most unpredictable.
Another important aspect to keep in mind is that the post corpus is biassed towards overperformances–events that are more likely to be memorialised into a post than, say, disappointments. Stated otherwise, we do not have a post for each of the races and it is typically those that didn’t end well that are missing.
This can be seen in the figure below, where the gray bars represent the number of races outcomes and the blue ones the number of posts. The gaps between the two are the posts that “weren’t written”, so to say.
Keeping this in mind, we now move on to analysing the corpus of posts that were indeed written. Fig. 4 shows the athletes and the performance categories they belong to, for reference.
The different elements of a post
In what follows, we will have a look at hashtags, mentions, emojis, and words as separate components of the posts. For starters, we can appreciate some global differences in their use by the two groups of athletes. The next graph shows how many of these elements are typically present in any given posts and the extent of the variability.
The first observation is that women tend to write longer texts: half of their posts have more than 33 words, compared to 17 for men. They also tend to use more emojis, though this may be a consequence of longer texts and not mean anything per se. As for the use of hashtags and mentions, the two distributions seem compatible.
What also comes up is a large post-to-post variation. The colored boxes, ecompassing half of the posts, as well as the gray whiskers, stretching towards the most extreme cases, are large. What’s more, there is extensive overlap between the women and the men’s range. In ther words, the differences from post to post (or athlete to athlete) are larger than, say, the average differences between the two groups. This is an important point of caution when interpreting any future finding.
Hashtags
Fig. 5 above shows that the median number of hashtags per post is 3 for both samples, meaning that half of the posts have more (or less) than 3 hashtags. A closer inspection of the most common hashtags in each sample, or even performance category, does not result in anything insightful. In fact, hashtags are, more often than not, related to sponsors; their use may well be mandatory and not reflect any particular intention by the athlete. We show a global wordcloud just for fun and move on to mentions.
Mentions
We said before that there doesn’t seem to be a difference in the number of mentions men and women add to their posts (typically 2, as shown by Fig. 5). Back then we considered all mentions, whereas right now we would like to focus on mentions of other athletes only (thus excluding sponsors, national teams, photo agencies and the like). Though the observation of a large post-to-post variability remains valid, some differences between the two samples start to emerge. Women tend to include more mentions of other athletes in their posts, with a median of 1 as opposed to 0 for men (meaning half of men’s posts contain no mentions of other athletes). Interestingly, women’s underperformance posts are those with most mentions, while the opposite is true for the men’s (average of 1.5 vs 0.4 mentions per post). Again, the variability is high and the underperformance samples are small (18 posts, of which only 6 by women), so we should abstain from reading too much into this and ideally get back to the question with a larger sample.
Using mentions as edges and athletes as nodes, we built networks. There are no cross mentions between the two samples, so the female and male networks are disjoint. We show them separately in Fig. 7 and 8. The first observation is that the men network is itself made of two disjoint sub-networks. Any follower of men races will recognise the larger and smaller groups as made of technical and speed skiers, respectively. They rarely participate in races other than those they are specialised in. On the other hand, the female circuit currently has more (and prominent) athletes that are competitive in both technical and speed events. It is not surprising that their mention-based graph is connected (remember we are only considering race-related posts).
We computed a number of measures to evaluate the properties of the two networks, especially how “connected” they are. Measures of density, degree, and modularity all point to the women graph being more strongly connected than the men’s⁵. We then wondered what nodes are the most important, or “central” to each graph. There exist different measures for this. We stuck to betweenness centrality, which, broadly speaking, tells you how likely you are to encounter a given node when trying to go from one point of the network to another. In the women network, one athlete has a considerably higher betweenness centrality that any other: Mikaela Shiffrin. This is not surprising, as she won gold medals in both technical and speed races and, in our opinion, is by far the most skilled communicator in (either) sample. In the men network we have a tie at much smaller values (remember the graph is disjoint!): Marco Schwarz and Marcel Hirscher (both double medalists)⁶.
So far we haven’t accounted for the direction of the arrows (edges) in the networks. Who received/gave the most mentions? We address this question via the Hub and Authority scores. In simple terms, a good hub is a node that points to many nodes (an athlete that mentions many other athletes), while a good authority is a node that is pointed to a lot (an athlete that receives many mentions). The two main hubs in the female network are Ilka Štuhec and Mikaela Shiffrin. The top authorities are Corinne Suter, Sofia Goggia, and Mikaela Shiffrin. As ski fans will remember, Corinne Suter obtained the first podium of her career in the Super-G race (bronze medal) and continued on to a silver medal in Dowhnill; Sofia Goggia won a silver medal in Super-G; Mikaela Shiffrin won the Slalom and Super-G races and ended up 3rd in Giant Slalom. As for the male network, the top hub is Manuel Feller (positions from 2nd to 5th are tied) while Marcel Hirscher, Henrik Kristoffersen, and Alexis Pinturault are the top authorities (each has won a gold medal, at least). A clear authority of the smaller sub-graph of speed athletes is Aksel Svindal; he won a silver medal in what he announced to be the last race of his career.
Emojis
Contrary to mentions, Fig. 5 above already suggests that women tend to use more emojis than men (a median of 4 and 2 per post, respectively). Their posts are also considerably longer, though, so that the emoji-to-word ratio may well be very very similar between the two groups.
Before proceeding further, we performed some cleaning up: we turned skin tones and heart colors to yellow, converted national flags to a red, triangular one, and removed camera emojis (they are typically just used to give credits for pictures used in the post).
First, we note that the average number of emojis is the largest in underperformance posts for women, while the opposite is true for men (average of 6.2 and 1.7 emojis per post, respectively). Interestingly, as we’ve just seen, the same is also true for mentions. Again, there is a lot of variability from post to post, but, in this case, also a stronger signal that a fundamental difference is indeed present.
What are the most frequent emojis? We show this in Fig. 9, where we rank them from most to least frequent in each performance category.
We find the most striking difference to be the use of folded hands as well as the national flag: both are among the highest-ranked emojis in the women sample and practically inexistent in the men’s. When it comes to the three medal emojis, one should keep in mind that they can be used in multiple contexts: to signify having won a medal or to congratulate other medalists. This may explain why, for example, the gold-medal emoji ranks at the top in the (women) underperformance category.
We will come back to the issues of emojis with a more quantitative approach at the end of next section.
Text
Finally, we take a look at the textual content of the posts. Starting from their English translation, we perform a number of classical text pre-processing operations such as removing uninteresting words (of, the, is, etc.), getting nouns and verbs to their plain form (races → race, winning → win, etc.), as well as grouping expressions together (eg, “giant slalom”, in this context, should be considered an inseparable expression). After these operations, the median values decrease to 14 words per post for women and 7 for men. Whichever way we look at it, women’s posts are typically twice as long as the men’s (see also Fig. 5). In both cases, typical-performance posts tend to be the longest (median of 19 and 9.5 words per post, respectively).
The panel below gives an idea of what the most frequent words are when looking at the different performance categories.
We then tried a more quantitative approach to better characterise the relationship between words and the different categories. What words are most distinctively related to, say, the group of posts written by female athletes on races where they overperformed? For this analysis, we integrate emojis back into the text. We performed Correspondence Analysis, which allows us to display the data spatially–words/emojis that are close together being likely to belong to the same category. The results are shown in Fig. 11 below. The larger the word/emoji, the strongest the association to the given category. (Only the top 10 are shown.)
For example, the folded-hands emoji and the word “support” turn out to be very strongly linked to women’s overperformance posts and situated far from words associated to other categories. This confirms what Fig. 9 and 10 already suggested. One may be surprised not to find the flag emoji in the graph. Though this element is indeed characteristic of women posts, it is found in all performance categories; it is therefore not distinctively related to one specific gender-performance subgroup, which is what this specific graph highlights.
The sample being small and the amount of the text in each of categories not being uniform (remember the men write less, and in general the corpus is biassed against underperformances), we are careful not to overinterpret these results beyond registering some of the highlights and giving more substance to our previous, more qualitative observations. For example, it is possible that the over-representation of the word “risk” in the men underperfomance class may be related to the extreme weather conditions that have characterised the Donwhill race–an anecdotal fact that would be specific to the corpus under consideration and say nothing on the male athletes’ discourse on social networks. Similarly, it is possible the writing of a single athlete represents a considerable fraction of the text, as is the case for Petra Vlhová in the female-underperformance section (46% of the words).
Conclusions
The 2021 World Championship is getting started in Cortina, Italy, as we finally get to publish part I (!) of this project. The timing is so bad it’s actually good–we hope memories of the previous event in Åre will resurface as the new one gets underway. If nothing else, we will have contributed to that at least.
This is what we retain of our exploration on how skiers from the two circuits communicate on Instagram, based on 49 athletes and their 82 posts from Åre 2019:
- Women speak to a larger audience;
- The vast majority of posts are in English, though very few of the athletes are native English speakers;
- Tentative: Negative race outcomes are less likely to be posted about, particularly for women;
- Women refer to other athletes in their posts to a larger extent than men. The resulting network is more connected than the men’s. The presence of prominent athletes racing both technical and speed events may contribute to this outcome;
- Women write more and longer posts;
- They use more emojis than men, though this may come with writing longer texts;
- The folded-hand (prayer) emojis as well as the national flags are particularly over-represented in their posts, when compared to men’s;
- As for the folded-hand emoji, this seems to be accompanied by a large occurrence of words rooted in thank- and support-;
- The previous observation is backed up by a quantitative assessment of how strongly given words or emojis are associated to a given sub-sample of posts (from the women-overperformance category, in this example).
As we stressed multiple times, the large post-to-post variability prevents us from pushing the conclusions any further. Though this variability may well be an instrinsic property of athlete’s the communication habits, analysing a larger post corpus (especially of the ‘underperformance’ quality) would certainly help strenghtening the points above. As a next step, however, we’d rather stick to the current dataset and analyse the ~37 000 comments to these 82 posts.
Technical notes and acknowledgements
The data collection and analysis have been performed in Python via JupyterLab.
A special thank you to the creator of the Instaloader tool and its contributors. The automatic collection of posts and other relevant information from Instagram wouldn’t have been possible without it.
We took advantage of the free plan provided by ElephantSQL to store the data and used a (free) private GitLab repository for the code.
Most of the plots were created with the Altair library. The exceptions are the mention networks, based on Graphviz, the hashtag wordcloud, and the Correspondence-Analysis plot (Matplotlib + AdjustText + major refinements via a famous proprietary design program).
Other notable packages we used: Pandas, NetworkX, Emoji, SpaCy, Prince.
The “objective” evaluation of each performance was carried out as follows:
- We collect the points gained by the athlete in the same discipline during the 2018–19 World Cup season, prior to the World Championship;
- We compute the mean and standard deviation for these points (Did-Not-Start, or “DNS” races are excluded);
- In the case of the Alpine Combined, for lack of races in the 2018–19 seasons (except just one for men), we include events from the previous season as well as the 2018 Winter Olympics;
- We assign “typical performance” if the points corresponding to the Åre placement fall within 0.75 standard deviations from the mean;
- We assign either “over” or “under” performance if the points fall outside of that interval.
The choice of 0.75 is somewhat arbitrary. One standard deviation would be the default choice (though no more justified), but it results in too many “typical performance” posts and therefore strongly imbalanced categories. We ran the whole analysis using 1, 0.75, and 0.5 and the conclusions are not significantly impacted. Here we stick to 0.75 but, again, we should all be conscious that there is a level of arbitrariness in this choice (and more generally in the attempt to objectively judge performances, for that matter).
We gratefully acknowledge the skireference API, which allowed us to significantly speed up the process of collecting race results.
The analysis notebooks are available here. The original figures uploaded in this post can be seen here.
- Though sex and gender are not synonyms, for the sample under study we believe we can use them interchangeably.
- Instagram is widely used by alpine skiers to communicate to their fan base.
- All the medallists in Are’s individual races end up being in our sample, except for Štefan Hadalin. He won a silver medal in Alpine Combined, but wasn’t in World Cup top 30 at the start of the championship.
- Note that the percentages do not add up to 100% exactly because one of the 82 posts contains no text and is therefore not represented in the plot.
- For those who are interested in the actual numbers and the specific indicators we used in this section, we refer to this notebook.
- We should note that Schwartz’s centrality is boosted by it being the only link to Hadalin’s node. If we had decided to exclude mentions of out-of-sample athletes (as is Hadalin), his centrality would have been much reduced.