Have you read Gulliver’s travels? If you have then you must have read about endianness. The Lilliputians were divided into two factions over what end of an egg to break – the big end or the small end. Isn’t it interesting that such a petty and seemingly trivial issue caused both parties to become embroiled in clashes? Similarly, some flame wars in programming are over issues as ‘important’ as endianness.
Ok, history lesson over; you now know the origin of endianness. Lets get to the issue of endianness in computer science. Computers can read streams of bits and bytes just as we can read written text. However, interpretations vary, for example, I can’t read Greek or Spanish even though I can see the writing marks.
Lets take the popular quote
“There are 10 kinds of people, those that understand this and those that don’t.”
Someone who doesn’t know binary wouldn’t realize that 10 in the quote means 2 in decimal. As such 10 can be the decimal ten or the decimal two; infact it might also mean input/output or the moon Io (in other context). It’s difficult to interprete 10 unless you know what the author meant.
The same interpretation problem applies to computers; in the early days of computing, a byte depended on the particular hardware architecture and there was no standard definition. The problem of endianness arises when you have lots of bytes being exchanged between several computers. Local data is fine as computers understand locally-stored data; however when you have to talk to another computer; how do you interpret the data you get?
Big-endian systems store the largest byte in a byte sequence in the very first byte location while little-endian systems store the largest byte in a byte sequence in the very last byte location. This is somewhat similar to writing/reading styles of languages; some are left-to-right while others are right-to-left; some are even top-to-bottom.
Here is how the 16-bit number 00000001 00000010 will be interpreted on both systems:
- To a little-endian system, the very first byte is the smallest, so the number is equivalent to (1 + 2*256) = 513.
- A big-endian number however sees the first byte as the biggest so the number will be equivalent to (1*256 + 2) = 258.
Amazing right? Some people feel that one endianness is better (just like the Lilliputians :) ) and have their reasons too. Both systems have their strenghs and flaws.
Big-endian systems represent numbers the way we understand them and this makes debugging easier; (in English, we all recognize thirteen as 13 and not 31). Also, you can check if a number is positive or negative by looking at the leading byte.
Little-endian systems allow you to check the lowest order byte (e.g. if you want to know if a number is odd or not) and make it easier to write math routines in assembly language.
Another issue is the NUXI issue; say you want to store the four bytes labeled UNIX (U being the largest byte) on machines that store numbers as 2-byte integers; hence the number UNIX is split into two chunks UN and IX. A big-endian system will store it as UN (U is larger than N right?) and also IX. A little-endian system will store UN as NU ( U is larger than N and the largest byte comes last, remember? ). Similarly IX is stored as XI and its internal representation ends up as NUXI. Both computers understand their internal representations perfectly; however imagine a big-endian storing UNIX on a little-endian computer; when it tries to retrieve its data it’ll get NUXI. I wonder if computers get perplexed… :P
Fixes for the endianness problem include using a standard format across computers and using headers to describe the information format ( yes this wastes space but you’ve got no choice ;( ).
So what endian are you?
Good to know:
- Intel processors for PCs are little-endian while Motorola processors for Mac are big-endian.
- Adobe Photoshop files and JPEG files are big-endian while bitmaps are little-endian.
- The network order (order of transmission over networks) is big-endian.