.
Abstract
Emojis or ‘picture characters’ have become ubiquitous in modern-day digital communication, including social media sharing and smartphone texting.
Despite this ubiquity, many questions remain about their usage, especially with respect to global variations in language and country.
These questions are important, in part because they reveal how people communicate digitally on social platforms, but also because they provide a lens through which different regions and cultures can be studied.
In this paper, we conduct a principled, quantitative study to understand emoji usage in terms of linguistic and country correlates.
Our study involves 30 languages and countries each, and is conducted over tens of millions of tweets collected from the Twitter decahose over an entire month.
Drawing on both statistical measures and information theory, our results reveal that, not only does emoji usage have strong dependencies at both the language and country level, but that some languages and countries are much more constrained in the diversity of their emoji usage.
However, we also discover that the ‘popularity’ of emojis, both globally and within the context of a given language, follows a robust and invariant trend that emerges fairly quickly (over just a day’s worth of data) and cannot be explained either by a power-law or Heap’s law-like distribution.
.
Keywords
Twitter, Emojis, Empirical study, Linguistic dependencies, National dependencies
.
https://www.sciencedirect.com/science/article/pii/S2468696421000318