What every programmer should know about types I

What is a type?

Type: (noun) a category of people or things having common characteristics.

A type represents the range of values of a particular type. Let’s take an example from the mathematical concept of sets in number theory. The set of integers can be seen as a type – only values such as 1, 2, 3 are integers; decimals (e.g. 1.1) or irrational numbers (e.g. π) aren’t members of the integer set. Extrapolating to common programming types, we can create more examples:

• Integers can only be 1,2,3…
• Boolean types can only be true or false
• Floating point / real numbers can represent decimals (with some loss of accuracy for irrational numbers)
• Strings are strings.

Buckets

Think about a bucket – it can hold various things (water, sand or even fruits). What you can do with a bucket depends on two things:

• What the bucket contains
• The rules determining what the bucket can hold.

Lets look at two bucket usage philosophies.

Strict bucket land

The standing rule in strict bucket-land is to have specialized buckets for distinct content types. If you try to use the same bucket for apples and oranges, you’ll get a good talking to.

One advantage is that you immediately know what you are getting once you read a bucket’s label; however, if you need to fetch a single apple and a single orange, you’ll need two buckets; I know this sounds ridiculous but rules are rules.

Loose bucket land

Everything is relaxed in loose bucket-land. Everything – apples, oranges, some sand, ice cream, water etc.) – goes into the same bucket. Unlike the strict folks, no one is going to fret as long as you don’t disrupt their daily lives.

If you need an orange, you dip into the bucket, pull ‘something’ out and then check if you really got an orange. The value is tied to what you actually pull out of the bucket and not the bucket’s label.

Some loose bucketers try to imitate the strict bucketers by using explicitly-labeled buckets . Because there is no hard rule preventing mixes; a trickster can drop an apple into your orange bucket!

Static vs Dynamic Types

The metaphors in the scenario explained above follows:

• The bucket represents variables (they are containers after all)
• The bucket contents (e.g. apples, oranges, sand etc.) are the values
• The rules are the programming language rules

The big schism between dynamically typed and statically typed languages revolves around how they view variables and values.

In static languages; a variable has a type and this restricts the values you can put into it and the operations you can do with it. To draw on the bucket analogy – you typically won’t (and can’t) pour sand into the fruits bucket.

Dynamic language variables have no type – they are like the loose buckets described earlier. Because the containers are ‘loose’, valid actions for a variable depend on its content. To give an analogy – you wouldn’t want to eat out of a sand bucket.

1. Static systems – variables have a type. So a container may only hold ints, floats or doubles etc.
2. Dynamic type systems – variables can hold anything. However the values (contents of the variables) have a type.

That is why you can’t assign a string to an int in C#/Java because the variable container is of type int and only allows ints. However, in JavaScript, you can put anything in a variable. The typeof operator checks the type of the value in the variable and not the variable itself.

Why does this matter?

The adherents of the static typing methodology always argue that if it ‘compiles’ then it should work. Dynamically typed language adherents would quickly poke holes in this and tout good testing because that guarantees code works as expected.

In a very strictly-typed language, it would be theoretically impossible to write code that would have runtime bugs! Why? Because typically bugs are invalid states and the type checks would make it impossible to represent such states programmatically. Surprised? Check out Haskell, F#, Idris (yes it is a programming language) or Agda.

The strictness vs testing spectrum

How much testing is required given a language’s type system?

I would view this as a spectrum – to the left would be the extremely loose languages (JavaScript) whereas the right end would contain the strictest languages (Idris). Languages like Java and C# would fall around the middle of this spectrum – they are not strict enough as is evident by loads of runtime bugs.

As you move from the left to the right, then the amount of testing you need to validate the correctness of your program should reduce. Why? The type system will provide the checks for you on your behalf.

Conclusion

I hope this post clarifies some thoughts about type systems and testing for you. Here are some other posts that are related:

And you thought you knew all programming languages…

Have you ever heard about the esoteric Programming Languages. An esoteric programming language (sometimes shortened to esolang) is a computer programming language designed either as a test of the boundaries of programming language design, to experiment with weird ideas or simply as a joke, rather than for practical reasons. There is usually no intention of the language being adopted for real-world programming. Such languages are often popular among hackers and hobbyists.

Usability is rarely a high priority for such languages; often quite the opposite. The usual aim is to remove or replace conventional language features while still maintaining a language that is Turing-complete, or even one for which the computational class is unknown. Intercal belongs to this family.

INTERCAL was created in 1972, thus probably making it the first ever esoteric programming language. Donald R. Woods and James M. Lyon invented it, with the goal of creating a language with no similarities whatsoever to any existing programming languages.

According to the original manual by the authors, “The full name of the compiler is ‘Compiler Language With No Pronounceable Acronym,’ which is, for obvious reasons, abbreviated ‘INTERCAL’.”

Common operations in other languages have cryptic and redundant syntax in INTERCAL. The INTERCAL Reference Manual contains many paradoxical, nonsensical or otherwise humorous instructions, like:
“Caution! Under no circumstances confuse the mesh with the interleave operator, except under confusing circumstances!”

INTERCAL has many other features designed to make it even more aesthetically unpleasing to the programmer: it uses statements such as “IGNORE” and “FORGET”, as well as modifiers such as “PLEASE”. This last keyword provides two reasons for the program’s rejection by the compiler: if “PLEASE” does not appear often enough, the program is considered insufficiently polite, and the error message says so; if too often, the program could be rejected as excessively polite.

What do you think?

Language Rankings

There seem to be all sorts of rankings; university, SEO, hospital and all imaginable groups. I stumbled on this ranking and it caught my eye immediately. Why? It’s the first time I’ll be seeing such, moreover I’m a programmer and I need to stay informed about current trends in the industry to avoid losing my competitive edge.

Well, its about the top programming languages and is offered by TIOBE Software; a company specialized in assessing and tracking the quality of software. Their  Tiobe Programming Community Index is a measure that has been used to track the ascent and descent of programming languages over the years. Languages are basically ranked on how much they turn up in search engine results, this points to how much they are used though it does not indicate how much developers/engineers love the languages.

Well, according to the TIOBE index for 2011,  Java is still numero uno, followed closely by C. The first ten languages on their rankings are listed below.

1. Java
2. C
3. C++
4. PHP
5. Python
6. C#
7. Visual Basic
8. Objective-C
9. Perl
10. Ruby

You can review the entire list here: TIOBE Programming Community Index for January 2011

Personally, I’m surprised that JavaScript didn’t make the top ten; afterall JavaScript is at the core of most web applications and is widely used. Well, maybe it didn’t meet some of their criteria. I believe that JavaScript should at least be number 8 on the list. ;;D

Since the index is based on current trends, you can use it to check whether your programming expertise is outdated or still good. Java, C, C++, PHP and Python are the languages that are in heavy use and will have plenty support in years to come. For other languages, I can’t say…. :|

Ciao.