MonaTweeta II

    Preliminary result of a little competition between me and Ralph Hauwert (who had the initial idea) with the goal to write an image encoder/decoder that allows to send an image in a tweet. The image on the left is what I currently manage to send in 140 characters via twitter.

    This is the tweet for the image:
    圑嘌婂搒孵怤實恄幖戰怴搝愩娻屗奊唀唭嚟帧啜徠山峔巰喜圂嗊埯廇嗕患嚵幇墥彫壛嶂壋悟声喿墰廚埽崙嫖嘵奰恛嬂啷婕媸姴嚥娐嗪嫤圣峈嬻尤囮愰啴屽嶍屽嶰 寂喿嶐唥帑尸庠啞彐啯廂喪帄嗆怠嗙开唅恰唦慼啥憛幮悐喆悠喚忐嗳惐唔戠啹媊婼捐啸抃岖嗅怲幀嗈拀唹坭嵄彠喺悠單囏庰抂唋岰媮岬夣宐彋媀恦啼彐壔姩宔嬀

    I am using chinese characters here since in UTF-8 encoding they allow me to send 210 bytes of data in 140 chars. In theory I could use the whole character code range from 0x0000-0xffff, but there are several control chars among them which probably could not be sent properly. With some tweaking and testing it would be possible to use at least 1 or 2 more bits which would allow to sneak 17 or 35 more bytes into a tweet, but the whole encoding would be way more nasty and the tweets would contain chars that have no font representation.

    Besides this char hack there are a few other tricks at work in the encoding. I will reveal them over time. For now I just mention the difficulties involved here:

    A typical RGB color needs 24 bits which is 3 bytes. This means if you just stored raw colors you could send 70 colors. Unfortunately you couldn't send anything else. At least that would allow you to send a 7x10 pixel matrix.

    The worst way to store one full x/y coordinate would be 2 times 4 bytes, which is 26 coordinates in one tweet. That's 8 triangles. Obviously you have to do some concessions with the precision here. 2 bytes per number maybe? Gives you 52 points or 17 triangles. Unfortunately those come without color info.

    --- Additional info added on May 12th --
    Looks like my little project got a bit of attention lately, so I guess I should explain a few more of the details.

    The image file format currently looks like this:

    [0x00-0x17] 8 color lookup table, each RGB color is 24 bit

    [0x18] approximate image proportions, stored in 2 x 4 bits, the proportion is (v >> 4) / (v&4) - which means the actualy physical size of the image is not stored, which is not necessary since it gets rendered in vectors anyway. So the height will be derived from the available width.

    [0x19-0xD0] 61 points with color info each stored in 3 bytes:
    The first two bytes are the x and the y position whereby their final position is calculated byte / 0xff * displayWidth and byte / 0xff * displayHeight

    The color info is stored in the third byte and the way it is done is quite nifty I think: since my lookup table stores only 8 colors I just need 3 bits to store an index to a color. This would leave me with 5 unused bits. So I use these additional bits to give me a wider range of colors by creating blends between the colors in the table. So additionally to one color index I store another color index in the same byte. The remaining 2 bits I use as the blending factor. 2 bits allow for 4 different values. The ones I pick are 0 = 0.125, 1=0.25, 2=0.375, 4 = 0.5. I don't need any higher values since I can simply switch the order of the "upper" and "lower" color to get the same result as e.g. 0.75. I also do not need 0 or 1 since if I want a full color I just mix two times the same color. The 0.5 is a bit of a waste since it means I get the same mix in both directions, maybe it would be smarter to use 0.45 in this case. Overall this trick means that instead of just 8 colors I have a choice of about 256 shades of color.

    The actual creation of the image is an evolutionary algorithm. I start by quantizing the image's colors to get 8 representative colors. And I scatter the 61 points over the image area. At each point I read the pixel color of the blurred image and choose the closest shade I can create with my extended color table. With this data I greate a binary "gene" ( the encoded version of is the chinese twitter tweet). From the gene I create a voronoi diagram which is the image you see on the left.

    In order to get the best representation (meaning best positions of points and their choice of color) I compare the rendered image with the original by summing up the squared difference of the pixel colors and dividing it by the amount of pixels. The result is the fitness value. The ideal value would be 1 which meant that there is no difference at all between original and rendered image, but obviously that is improssible to reach for most images.

    After calculating the fitness value I clone the gene and make a few random mutations to it. Once again I calculate the fitness of the mutation and if it is higher than of its parent the mutation becomes the new parent. This process can run indefinitely but usually the rate of improvement decreases rapidly after a few minutes.

    My current goal is to figure out the optimum ways to get good results quickly.

    Comments and faves

    1. yezzer, watz, mark knol, Gwen Vanhee, and 244 other people added this photo to their favorites.

    2. yezzer (37 months ago | reply)

      Will you cover this at FlashBrighton? Hope so! :)

    3. Quasimondo (37 months ago | reply)

      Maybe I will, but in any case there won't be any slides for that.

    4. groovelock (37 months ago | reply)

      Good idea to use this for compression! I thought I'd try JPEG to see what it could do. I got it down to 608 bytes and 20x30 pixels. Below that the image was unrecognizable! Even at 1x1 px, JPEG couldn't get an image less than 525 bytes.

      This is impressive, but it also shows how ridiculous twitter is. If they increase the limit beyond 140 characters, it might be worthwhile. Until then, I don't see a lot of incentive to click on tinyurl.xy/z3kc when I have no idea what it is.

    5. flickraft (37 months ago | reply)

      so a b/w pic would have 3 times the detail?

    6. davebollinger (37 months ago | reply)

      i love a puzzle :), can i posit some guesses?

      it looks roughly voronoi-like, so all you'd need to send are the centroid xy's not each vertex. and if you coded them as deltas (like PCM audio) from the prior posiiton (with some modulo wrapping to create rough virtual rows and columns) then you could probably do it with 6 bits each (limiting span between centroids to 64 pixels, maybe even 5 bits/32 pixels?). (or somewhat equivalently, use absolute coords, but quantize the voronoi points to ~1/4 the original image resolution) then use a 4:2:0 yiq/yuv (or similar) colorspace. all that ought to get you in the range of about 3 bytes per colored centroid. then perhaps huffman or rle encode the entire message.

    7. Quasimondo (37 months ago | reply)

      @flickcraft: in my algorithm I would not be able to get 3 times more detail, I think I could only store a few more points. Maybe a 10% improvement.

      @davebollinger: yes it's voronoi, so I am storing the centroids. And I am using 3 bytes per coordinate + color info, but I don't use deltas or a different color space. But those ideas sound definitely like an alternative path to explore. And I don't do any additional compression yet.

    8. rmondonedo (37 months ago | reply)

      dude! i have no idea what the F you just said, but if you're going to get images on twitter.. well then you are so awesome!

    9. Quasimondo (37 months ago | reply)

      Sorry, there won't be any images on twitter - at least not with this method. This is more of a personal challenge without much practical use.

    10. @MSG (37 months ago | reply)

      u speak chinese? or u just convert it automatically?

    11. Quasimondo (37 months ago | reply)

      The chinese in this case is just a host for my parasitic image bytes - it does not really make sense (although, sending it through the translator translate.google.com/translate_t?hl=en&sl =zh-CN&t... gives me ideas for some other experiments). I could have used other parts of unicode ( unicode.org/charts/ ), but then there might be a lot of non-defined font symbols among them which would not look so nice.

    12. Gryftir (37 months ago | reply)

      Hmmm... It occurs to me that this might be useful for captchas somehow

    13. mikeg626 (37 months ago | reply)

      The translation (via Google)--with wild liberties--makes interesting word salad:

      The whip is war
      that easily comes
      framing a wild mountain.

      Hello, you in the closet,
      singing--posing carved peaks
      of sound understanding.

      Upon a kitchen altar
      visit a prostitute--
      an ugly woman saint--
      who decoys.

      Particularly
      lonesome mountain valley,
      your treasury: a dumb corpse and
      funeral car, idle choke open.

      Reclassification:
      exactly what you would call nervous.
      Well, do not suggest recalcitrance
      those who donated sad.

      The smell of a rugged frame
      strikes cement block once.

      Where you?
      Cape. Cylinder. Cry.

    14. lallaporter (37 months ago | reply)

      I'm entranced by the wild the poetry of the image. Thank you mikeg626

    15. Raymond Brigleb (37 months ago | reply)

      That poem really turned out lovely!

    16. Jeremy Banks (37 months ago | reply)

      I think that lots of the traffic you're getting doesn't really understand what you did. :P

    17. The Pageman (37 months ago | reply)

      @Quasimodo what would be the minimum dimensions for a thumbnail image before it would not be recognizable? Maybe your process lends itself to providing us preview/thumbnail images so we would know what we're clicking on?

    18. Quasimondo (37 months ago | reply)

      @The Pageman - I haven't tested it yet. One thing I fear is that my choice of the test image is a bit misleading as for the recognizability of an encoded image. If I'd take another previously unknown photo I'm not so sure if you get a good idea what is on it.

      But I will publish the encoder and the decoder very soon so everybody will be free to do their own tests.

    19. Rackdoll (37 months ago | reply)

      I have translated some of your chinese codes....

      "Visit a prostitute Jie" ..... ?

      Lol.
      Is this subliminal messaging ? :)

      good work Mario keep up the good work! Am enjoying it \ o /

    20. chrleon (37 months ago | reply)

      Incredibly smart! Looking forward to seeing the code :)

      *pat on the back*

    21. simurai (37 months ago | reply)

      I was wondering what that Chinese Tweet you sent a couple days ago all meant.. I never would've thought it could be an image. Nice job.

    22. bjdawes (37 months ago | reply)

      er, what?

    23. bjdawes (37 months ago | reply)

      seriously - very clever stuff.

    24. 7pc (37 months ago | reply)

      So, this www.flickr.com/photos/7pc/3525269607/ might be all of it then? pretty impressiv - even more impressive than the talk you gave (wich was a great talk #fmx).

    25. asjo (37 months ago | reply)

      See also X-Face.

    26. jolyon_russ (37 months ago | reply)

      Once again Mario, you've blown my mind!

    27. visualrinse (37 months ago | reply)

      OMG, the stuff you do is just amazing.

    28. _fluffy (37 months ago | reply)

      Two things I'd try:

      1. Reduce the color precision in the LUT (every bit of savings helps). You could probably get away with 3:3:2 RGB instead of 8:8:8. At the very least try 5:6:5.

      2. Instead of storing the raw coordinates for the Voronoi regions, sort them by Y value, then store the first one raw and each subsequent one as the delta from the last, using a variable-bit encoding. Or you could even just do a repeated transform (on the separate coordinates) and keep track of the number of wavelet passes it takes to decompose it into something that is efficient to store with bitwise RLE (i.e. 5 bits of 0, 100 bits of 1, etc.). If you sort the Voronoi regions by priority you could even just have it store regions in order of importance (perhaps based on total contrast to its adjacent regions) and then stop when it runs out of regions you can always be sure to make the most of your storage.

    29. John Seaborn Gray (37 months ago | reply)

      Nice, but you should've gone with a Rothko. 2 colors, easy to tweet. Hell, you could've tweeted 5 or 6 of 'em.

    30. _fluffy (37 months ago | reply)

      Also, you might consider just using raw UTF16 multiword characters and screw the fact that many of them won't render. Lots of UAs won't render the Chinese text either.

    31. Quasimondo (37 months ago | reply)

      Thanks for the load of great suggestions _fluffy!

      In my first version I actually had been using 16 RGB instead of 24. Don't remember why I abandoned it. Sounds worth putting it back into - that's 8 bytes saved.

      UTF-16 unfortunately does not help in this case since Twitter counts each of those chars as at least 2 chars.

      One approach with the coordinates I tried was to use a grid of equally distributed points, derived from the image's proportions and the amount of available points and then use just one byte per point as an offset from its grid position. This means that I've got 4 bits for each x and y. The offset formulas are
      offsetX = (((value >> 4) / 0xf) - 0.5) * maxOffset;
      offsetY = (((value & 0xf) / 0xf) - 0.5) * maxOffset;

    32. kylemechler (37 months ago | reply)

      What a fascinating challenge you are undertaking.

      The compression and interpretation by the image is perhaps the most interesting part that I've read about (description and comments). Wouldn't you gain _SO MUCH_ simply by expanding your character set, though? I mean, you'd be able to encode multiples of what you are now, wouldn't you? I understand the hurdle of some control characters, though those are limited. I somewhat understand the idea of not wanting characters that have no font representation as well, but simply for the technical challenge, haven't you already dismissed the ability to parse the message by anything other than the decompression process?

      I don't mean to sound like I know what I'm talking about. Don't take my comments as critical. I'm just curious and _FASCINATED_. Keep going! Bookmarked!

    33. S Crume (37 months ago | reply)

      Interesting experiment... I have a corollary for you the ponder:
      You'll always have a very low max theoretical byte limit BUT if you made these packets and sent a string of tweets to a # hash tag, you could essentially have unlimited data transfer capability

    34. Quasimondo (37 months ago | reply)

      @kylemechler My decision to limit this to Chinese characters was one part estetical since I wanted to avoid "ugly" chars, one part practical since this started as a weekend project and I wanted to get some results quickly.Having to figure out which characters I can use and which not sounded a bit tedious.

      My tests showed me that the maximum range of UTF-8 encoded characters that Twitter still counts as "1 char" is 0 - 0xffff. So if I could actually use the whole range (which I can't since there are definitely control codes among them) the theoretical maximum would be 280 chars, so that's not really multiples.

      @S_Crume of course one could send a huge image in small packets, but wouldn't that be against the spirit of Twitter? It would be like sending a novel in chunks of 140 chars.

    35. Darric (37 months ago | reply)

      24 bit colour for your lookup table seems a bit excessive.

      Also, a (admittedly only marginally) better alternative for your blending factor is [0=0.111, 1=0.222, 2=0.333, 3=0.444] mirrored to [0=0.555, 1=0.666, 2=0.777, 3=0.888], for an even distribution that isn't wasteful.

    36. Jim Rees (37 months ago | reply)

      Fluffy's right, you should reduce the color precision. Transform it to YUV and you can throw away much of the color info. Then find a way to reduce the spatial resolution of the U and V channels compared to the Y, like NTSC did 60 years ago. Although since you're using a color lookup table I'm not sure how much this will save.

    37. Darric (37 months ago | reply)

      Huge fan of this idea, incidentally. Going to have to give it a go myself!

    38. Quasimondo (37 months ago | reply)

      @Darric yes - that's the idea. I surely don't claim that mine the best possible way and would love to see alternative approaches.

    39. tjp1982 (37 months ago | reply)

      Ever thought of using fractal compression?

    40. guzi4real (37 months ago | reply)

      Very interesting.

    41. ThenAndAgain (37 months ago | reply)

      @Quasimondo: a group of friends and I have done some research on a similar topic. If you happen to find yourself attending LayerOne, you can sit in on our TwatFS talk. Otherwise, I'll update this with links to the tools we'll be releasing during the talk.

      Good luck!

    42. Rick Cogley (37 months ago | reply)

      This is quite amazing!

    43. Giantnerd (37 months ago | reply)

      we are hiring a php developer to lead our open source commerce store. Hit me up on twitter if interested

    44. massmediamobile (37 months ago | reply)

      impressive! i have a practical business application for this if you're interested. please contact me either way.

    45. Avi Muchnick from Aviary (37 months ago | reply)

      This rocks. Great job Mario!

    46. JEBloem (37 months ago | reply)

      Quite ambitious guy. Love it!

    47. Brian Campbell (37 months ago | reply)

      This is a pretty neat idea! I like the challenge of doing the best compression in 140 characters you can. Since a lot of people have offerred suggestions for how you can do it better, I've decided to offer a challenge on StackOverflow to see what people can come up with.

      Anyone who thinks they can do better can post their code on StackOverflow, along with an example, and I'll give a 500 rep bounty to the person I think did the best job. Other users can vote as well, so we can really see who has the best solution. And Quasimondo, feel free to post your solution there too if you want, though you do have a bit of a head start on everyone else.

    48. pxidistribution (37 months ago | reply)

      this too kool

    49. l'Ours (37 months ago | reply)

      ça me rappelle les deuxlignes d'HHHHHHHebdogiciel !

    50. rjzii (37 months ago | reply)

      Have you done any work with run length encoding of the image information? That might allow you to get a bit more information in to the allowed bytes.

      Also, are you using each character to represent a block of data, or are you allowing the data blocks to overlap between the characters? Granted doing that would require you to do a bit more work to ensure a valid character is returned, but it might allow for some more data to be sent.

    51. Quasimondo (37 months ago | reply)

      I have not used RLE yet because I think that it cannot play out its full potential in such a short span of data. But actually whilst' I'm typing this I do see a way that would allow me to introduce a span marker without having to sacrifice a dedicated bytecode for it, by simply using a different charcode outside my "data" range. So this would not add any overhead.
      Good point, I will try that.

      Currently 3 bytes of data are stored in two characters, which means 1 byte and a nibble per char.

    ← prev 1 2
    (92 comments)
    keyboard shortcuts: previous photo next photo L view in light box F favorite < scroll film strip left > scroll film strip right ? show all shortcuts