Language communities of Twitter

I had been wanting to make this for a while before finding out on Saturday that Mike McCandless had extracted Chrome's open-source language detector into a standalone library, which suddenly made it much more practical.

There are a lot of near-identical colors for different languages because I was optimizing for maximum distinguishability of languages used near each other rather than for global uniqueness. The exception is English, which is in gray because it is so common almost everywhere that it threw off the process of choosing the other colors.

Nathan pointed out an error in the language labels. Fixed now.

Data from the Twitter streaming API, May 14-October 20, 2011.

Comments and faves

  1. sbouboux, Bruno Boutot, Blau Zahl, chwarnow, and 176 other people added this photo to their favorites.

  2. ♫ Lion ♫ (19 months ago | reply)

    I would have choosen another color for turkish, so this language would have been visible in Germany due to the high number of immigrants there...but your work is anyway fantastic:)

  3. ♫ Lion ♫ (19 months ago | reply)

    ps it looks like there are some "japanese oasis" in North Korea

  4. Eric Fischer (19 months ago | reply)

    I did as well as I could... I don't think the Turkish-speaking population of Germany is actually tweeting very much in that language. There are fewer areas of Turkish-German crossover than there are for German crossover with Dutch, Japanese, Spanish, French, Russian, Indonesian, or Filipino, so all of those color distinctions took priority.

    The locations of Japanese tweets are kind of strange in general. There are also lots of them off the coast rather than on land.

  5. bior (19 months ago | reply)

    Very interesting graphs! What especially stands out to me in the data is the overwhelming popularity of Twitter in places such as Japan, Korea, Thailand, compared to other nations. Access to technology is high in those countries, but I wonder if Twitter popularity closely mimics that level of access, or are some cultures less inclined to use Twitter?

    Along those lines, it's interesting that not only does Twitter not appear very popular in India, but that English is the most popular language to use. I wonder, is there a Twitter competitor that's popular in India?

    In some areas, such as central Spain, the Twitter usage follows the spines of the transportation networks. I wonder if this is representative of where people live, or are these spines visible because people tend to use Twitter when travelling? Especially since rail travel is predominate in that region, and rail travel is especially suited (more than cars or planes) to using technology in transit.

    And lastly, I wonder if there's enough meaningful data to show this same map, but with the country's primary language removed? For instance, the US with English removed, and Denmark with Danish removed, and so on.

  6. Eric Fischer (19 months ago | reply)

    Replaced with new image that corrects some errors in the language labels.

  7. kubasa (19 months ago | reply)

    Fascinating. I found it interesting and slightly disturbing that Africa is almost non-existant on this map despite it's large geographic coverage and many languages.

  8. bior (19 months ago | reply)

    It is true that Africa is not proportionately represented on this graph, but also what Twitter usage there is there is primarily in English, which is gray on this map, and isn't as bright as the other colors.

  9. David FCB (19 months ago | reply)

    Please, focus on the real existance of the catalan nation, situated in the north-east corner of the Iberian Peninsula, among Spain and France. Our language, catalan, forbidden by the Spanish goverment for hundred years, has survived until today. Nowadays, the spanish goverment, does not allow to catalan language to be an official language of the European Union. However, Catalan has around 12 milion spoken people.
    From Catalonia we claim for help to the rest of the World to join our cause for the offcial recognition of our language, nation and our national sport teams. We go on working on our independence, lost on the 11th September 1714. Please HELP US!
    www.gencat.cat/catalunya/eng/index.htm

  10. georgebaz (19 months ago | reply)

    You've noticed concentration of tweets along major railways in Spain, and I can't but comment on a similar pattern along the Transsiberian Railway in Russia. For all I know, it shows where people live here - mostly along the railway which is the country's lifeline. It may be different in Spain of course, and the lines show where people travel. Who knows!

  11. georgebaz (19 months ago | reply)

    Twelve million people speaking Catalan are very well represented on Eric's map, I do not see what your problem is.

    What surprises me even more is why nobody from the Ukraine or Belarus has yelled that their languages are not represented. I may be wrong of course but I assume there are more than 100 million people who speak Ukrainian and Belorussian, taken together. Does Google Chrome recognize them? From what I see on the map, it does not because the areas where I expected to see these languages I can see only Russian. Or do they avoid tweeting in their mother tongue? Could anyone tell?

  12. railclaimore (19 months ago | reply)

    What's also interesting to note is the deadzone just north of Tokyo that's the Fukushima exclusion zone.

  13. samuel_wade (19 months ago | reply)

    Love it. I could spend hours exploring this. A couple of notes:

    It might be more accurate to label "Chinese" and "Chinese (TW)" as "simplified Chinese" and "traditional Chinese" respectively. What it's detecting there is the script, not the language, which is why Hong Kong and Taiwan appear to be the same colour.

    (Neal Stephenson fans, note also the small blob of "Chinese (TW)" just off Xiamen.)

    A greater contrast between Chinese (TW) and Japanese would help highlight the contrast between Taiwan and the various Japanese islands stretching down towards it.

  14. togocody (19 months ago | reply)

    On the big original size map, I'm puzzled by all the specs I see in the Atlantic ocean - bad data? Ships? What's up with that? Also, what's all the spanish activity in lower Morocco - I know it used to be Spanish Sahara, but why so much activity, more than almost anywhere else in Africa?

  15. Nekrashevych (19 months ago | reply)

    Are there no Ukrainian tweets at all, or are they just classified as Russian?

  16. bior (19 months ago | reply)

    FYI -- This map is featured on the Strange Maps blog.

  17. Eric Fischer (19 months ago | reply)

    Thanks for the link! I hadn't realized.

    I'll look into the Ukranian and Belarusian tweets. I think there are some but they are few compared to other languages. And yes, I probably should have said Simplified and Traditional Chinese but stuck too closely to the terminology that the Chrome language detector uses.

  18. imgoph (19 months ago | reply)

    Eric: What projection did you use? Is this a Peters?

  19. jcwexford (19 months ago | reply)

    Wow, I love this map. I've stuck up a link to it on my Irish language blog: faoicheilt.blogspot.com/2011/11/tuit-tuit.htm l

    Is there a reason why you haven't included lesser-used languages on the map - e.g. Irish, Welsh, Basque? Is this to do with the software not detecting these languages? Or did you decide that the colours would get too confusing if you included too many languages? Or that they just wouldn't show up clearly enough to merit inclusion?

    Personally, I'd love to see a map of Britain and Ireland with English, Irish, Welsh and Scots Gaelic included to see if our traditional perception of where these languages are spoken ties up with where they're actually used in an online context.

  20. johnbirch (19 months ago | reply)

    No Celtic languages? Surely some Welsh speakers use Welsh?

  21. Eric Fischer (19 months ago | reply)

    Yes, it's Peters projection because equal-area seemed important for showing density. I know it looks lousy in some places—sorry about that.

    The software claims to detect Irish, Welsh, and Basque, but the numbers are very small compared to other languages. I don't know whether it is misidentifying them as something else or if they are just really very rarely used on Twitter. I did cut off the caption at 10,000 tweets because it seemed like anything lower than that was such a small fraction that they would be more confusing than useful to list. The dots are still drawn on the map, they just aren't labeled.

  22. andrewg.com (19 months ago | reply)

    That's not "lower Morocco", that's the Canary Islands.

    I note that some of the oceanic dots do seem to follow shipping lanes. There's a notable line between the Canaries and mainland Spain, for example, but the real standouts for me are the Irish Sea and North Sea ferry routes which are easily picked out.

  23. jaumetet (19 months ago | reply)

    Eric, thanks a lot for the map and the idea itself.
    Will you publish the data and the application in order to let a lot more people improve it?
    .. it is just an idea :-)

  24. gbernsdorff (19 months ago | reply)

    @blor @kubasa Black in Africa due to native languages (except for Afrikaans) not being represented. No Lingala, Swahili, Xhosa, Zulu etc.
    @ togocody Spanish activity on Morocco's North Coast probably due to enclaves of Ceuta and Melilla
    Re: Afrikaans: more native speakers than English, therefore small dots unlikely to represent true numbers
    Re: concentration of turquoise dots in between Dutch/French/German. Not consistent with number of Dutchmen/Belgians. Luxembourgish German being counted as separate language?
    Re: turquoise area South-Western France. Approximately Dordogne, known for its large number of Dutch and Belgian second home owners. Preponderance over local French?? Maybe related to high twitter density in Flanders and Holland?
    Q1: How many twitter messages does one single dot represent? Same for all regions?
    Q2: Would it be possible to lay national boundaries accurately over this picture?

  25. marcbel (19 months ago | reply)

    Eric,

    The Catalan Government is referencing your work ;-)

    www20.gencat.cat/portal/site/Llengcat/menuite m.21576464db...

  26. petrikpeta (19 months ago | reply)

    Well done, Eric, it´s very impressive!

    European Commission´s DG Interpretation is sharing it with their fans on Facebook!
    :-)
    www.facebook.com/#!/pages/Interpreting-for-Eu rope/1731226....
    Dated 9.11.2011

    Many greetings from Brussels.

  27. Eric Fischer (19 months ago | reply)

    Thanks for the links!

    Jaumetet, unfortunately as far as I know I can't distribute the data without violating Twitter's terms of use, which do not allow redistribution of tweets or locations.

    Gbernsdorff, Central Africa appears empty because there are few tweets from there, not because those languages are intentionally excluded from the map. The map does not distinguish Luxembourgish German from German German.

    Each dot is one tweet, except where there are many tweets at the same location, in which case they diffuse outward but fade with distance. Marcbel has added the national boundaries by overlaying it in Google Earth.

  28. martpaulsen (19 months ago | reply)

    @Eric, thank you for this. It is very useful! I would be extremely greatful if you could indeed have a look at the Ukrainian (and the Belarusian) tweets, it would be very useful for my research. The Russian search engine company Yandex indicated that by March 2010 there were 20.000 active users of Twitter in Ukraine, a fourth of these wrote in Ukrainian.

    By the way, the total population of Ukraine and Belarus is approximately 55 million. Russian is by far the dominant language in Belarus, while Ukrainian and Russian compete for dominance in Ukraine. We should expect a prevalence of Ukrainian tweets to the West of the country.

  29. Eric Fischer (19 months ago | reply)

    What I am seeing in Ukraine is 195927 tweets in Russian, 73023 in English, 13934 in Ukrainian.
    In Belarus, 84004 in Russian, 19607 in English, 1299 in Bulgarian, 609 in Belarusian.
    These are only geotagged tweets, so it may still be true that a fourth of total tweets in Ukraine are in Ukrainian.

  30. fer_abella (19 months ago | reply)

    Hi Eric,

    Your work is fantastic.
    It shows REAL USE of idioms in real time at the front of IT technologies like twitter.
    It is of major importance as relevant data to nationalism and antinationalism parties which are always debating data.
    If you were kind enough to use different color for Gallego in the north of Spain it would be distingishable from Castellano so we could have a new data about the actual relevance of that idiom in the territory. It could elevate the quality of your work to the TOP and be the ultimate word about that idea you had. If you just change color code and let reprocess data with same program you could deliver the BEST final product. I think your work will be in mainstream soon, in magazines, so just fix some minor details -real work is done yet- and you will be recognized because your images are unique and real. Your enemy is people saying data is wrong because stupid problems like colors. Beat them in front and left them without any argument to dismiss your excellent idea.
    Regards

  31. Iñaki Agirre (19 months ago | reply)

    About Basque, I think the library is reporting wrong. Chrome sometimes offer traslation for Basque pages, reporting them as malasian or indonesian. There is a Basque twitter feed here:
    eu.umap.eu/

  32. axanon (19 months ago | reply)

    HEY, where is Swiss-German?
    btw - switzerland uses a full color-spectrum at it's own... if u go correct and more dtailed ;-)

  33. thirstforwine (19 months ago | reply)

    Very interesting indeed. I was particularly intrigued by the volume of what appears to be Italian (but could be a combination of "blue" languages) throughout France. Tourists on the motorways, presumably? French use does seem lower, and very urbanised.

    Thanks for the work.

  34. martpaulsen (19 months ago | reply)

    Thank you for the figures, Eric. I’m afraid there might be some serious sources of error in the more detailed material. There is no doubt that Russian is the dominant language in both Ukraine and Belarus, but the relationship between Bulgarian and Belarusian tweets in Belarus does not give much sense.

    As you indicate, the use of the geotagging feature is a challenge since, as far as I have understood, this is an optional feature. In addition, I have understood from my friends in Belarus and Ukraine that their locations frequently come up as Russia, even when they are hundreds of kilometers away from the border.

    Still, these are probably things that can be corrected in the longer run. Keep up the good work!

  35. carloshgv (19 months ago | reply)

    And yet you're the lucky ones, it's amazing how you catalonians give so much value to your culture and language and how you are not ashamed of using catalan normally. Look at Galiza. I shed tears :'(

  36. outeiro (18 months ago | reply)

    The colors of Galician and Spanish languages are nearly the same. It's impossible to see galician language in this map with this colors, for me. By the way, galician and portuguese are the same language.

  37. Eric Fischer (18 months ago | reply)

    Sorry, Swiss German isn't distinguished from German German because the Chrome language detector can't tell the difference. (And Galician is distinguished from Portuguese because it thinks it can tell the difference, maybe because there are enough orthographic differences even if they are mutually intelligible.)

    The main languages reported in France (behind large numbers for English and French) are Dutch, Portuguese, Spanish, Tagalog (seems unlikely to me), Italian, Arabic, and Turkish.

    It is definitely possible that Bulgarian and Belarus are being misidentified. This is not an exact science, and I am actually surprised the language detector does as well as it does given the short, abbreviated texts it is being asked to identify. Thanks for the information about known problems identifying Basque.

    I will have to see if I can make the colors any better. They really are calculated to have maximum contrast between languages in locations where multiple languages are used in close proximity, but maybe the problem is that that would still optimize for contrast along national boundaries rather than for language minorities within a country.

  38. JackD23 (18 months ago | reply)

    Hi, very interesting map and thanks for linking to it. It would be good to have the lesser used languages as well rather than just official ones if possible, plus some of them have more speakers than official languages. Try indigenous tweets.com which lists all the current tweeters in these languages. Good luck.

  39. Silverionmox (18 months ago | reply)

    As Gbernsdorff said, the spread of Dutch in French areas is particularly interesting. If this indicates real languague use, the spread of Dutch in French language areas is hugely underestimated. You're sure to get lots of pageviews if you make a separate map of the language situation in Belgium...

  40. daviskessler (18 months ago | reply)

    Fantastic map and concept. An interfaced program to allow the user to turn off and on certain languages would be really cool.

  41. Wakablogger (18 months ago | reply)

    indigenoustweets.com/ has some languages likely not included in the Chrome language list.

  42. velocityftl (18 months ago | reply)

    Very interesting map! It would be nice if it could be interactive, i.e., have the ability to turn on or off any or all languages. For example, I would love to see only where French is spoken. Or maybe just see where Chinese has spread. Just an idea...

  43. Johan Van Loon (18 months ago | reply)

    I agree with Silverionmox, making a few separate highres country views for multiethnic countries like Belgium or Spain could really grab the national headlines in those countries...

  44. dyaseen (18 months ago | reply)

    Wonderful map, great work. Would love to know what the white dots are that are scattered throughout the US and UK. Half the time, they look blue to me. At one point I almost thought you had uncovered evidence of systematic Russian spying in the two countries.

  45. Yaponeshia (17 months ago | reply)

    Great map !!!
    what is the data source exactly ? only one day twitter use last october ? another period ? thanks for the info !

  46. Eric Fischer (17 months ago | reply)

    As the description says, it is from over the course of months, May 14-October 20, 2011.

  47. Miquelmiquel (16 months ago | reply)

    Eric,
    Any chance of homing in on multiethnic countries like Switzerland, Belgium, Finland, Canada or Spain? I'm particularly interested in a better rendering (=less blurred!) of the Catalan-speaking areas, as reproduced here:
    miquelstrubell.blogspot.com/2011/11/mapa-ling uistic-de-le...

  48. amareas (16 months ago | reply)

    Make Polish or german other color.

  49. travisjthompson (16 months ago | reply)

    We're showing a lot of your pics and talking about you.

    www.openequalfree.org/nerd-alert-mapping-the- twitter-lang...

    Thanks for the great work!

  50. thebedbugsgeek (7 months ago | reply)

    Loved the neon color selection, really made the locations pop.

    Mike.

← prev 1 2
(64 comments)
keyboard shortcuts: previous photo next photo L view in light box F favorite < scroll film strip left > scroll film strip right ? show all shortcuts