What every programmer should know about types I


What is a type?

Type: (noun) a category of people or things having common characteristics.

A type represents the range of values of a particular type. Let’s take an example from the mathematical concept of sets in number theory. The set of integers can be seen as a type – only values such as 1, 2, 3 are integers; decimals (e.g. 1.1) or irrational numbers (e.g. π) aren’t members of the integer set. Extrapolating to common programming types, we can create more examples:

  • Integers can only be 1,2,3…
  • Boolean types can only be true or false
  • Floating point / real numbers can represent decimals (with some loss of accuracy for irrational numbers)
  • Strings are strings.

Buckets

Think about a bucket – it can hold various things (water, sand or even fruits). What you can do with a bucket depends on two things:

  • What the bucket contains
  • The rules determining what the bucket can hold.

Lets look at two bucket usage philosophies.

Strict bucket land

The standing rule in strict bucket-land is to have specialized buckets for distinct content types. If you try to use the same bucket for apples and oranges, you’ll get a good talking to.

One advantage is that you immediately know what you are getting once you read a bucket’s label; however, if you need to fetch a single apple and a single orange, you’ll need two buckets; I know this sounds ridiculous but rules are rules.

Loose bucket land

Everything is relaxed in loose bucket-land. Everything – apples, oranges, some sand, ice cream, water etc.) – goes into the same bucket. Unlike the strict folks, no one is going to fret as long as you don’t disrupt their daily lives.

If you need an orange, you dip into the bucket, pull ‘something’ out and then check if you really got an orange. The value is tied to what you actually pull out of the bucket and not the bucket’s label.

Some loose bucketers try to imitate the strict bucketers by using explicitly-labeled buckets . Because there is no hard rule preventing mixes; a trickster can drop an apple into your orange bucket!

Static vs Dynamic Types

The metaphors in the scenario explained above follows:

  • The bucket represents variables (they are containers after all)
  • The bucket contents (e.g. apples, oranges, sand etc.) are the values
  • The rules are the programming language rules

The big schism between dynamically typed and statically typed languages revolves around how they view variables and values.

In static languages; a variable has a type and this restricts the values you can put into it and the operations you can do with it. To draw on the bucket analogy – you typically won’t (and can’t) pour sand into the fruits bucket.

Dynamic language variables have no type – they are like the loose buckets described earlier. Because the containers are ‘loose’, valid actions for a variable depend on its content. To give an analogy – you wouldn’t want to eat out of a sand bucket.

  1. Static systems – variables have a type. So a container may only hold ints, floats or doubles etc.
  2. Dynamic type systems – variables can hold anything. However the values (contents of the variables) have a type.

That is why you can’t assign a string to an int in C#/Java because the variable container is of type int and only allows ints. However, in JavaScript, you can put anything in a variable. The typeof operator checks the type of the value in the variable and not the variable itself.

Why does this matter?

The adherents of the static typing methodology always argue that if it ‘compiles’ then it should work. Dynamically typed language adherents would quickly poke holes in this and tout good testing because that guarantees code works as expected.

In a very strictly-typed language, it would be theoretically impossible to write code that would have runtime bugs! Why? Because typically bugs are invalid states and the type checks would make it impossible to represent such states programmatically. Surprised? Check out Haskell, F#, Idris (yes it is a programming language) or Agda.

The strictness vs testing spectrum

How much testing is required given a language’s type system?

I would view this as a spectrum – to the left would be the extremely loose languages (JavaScript) whereas the right end would contain the strictest languages (Idris). Languages like Java and C# would fall around the middle of this spectrum – they are not strict enough as is evident by loads of runtime bugs.

As you move from the left to the right, then the amount of testing you need to validate the correctness of your program should reduce. Why? The type system will provide the checks for you on your behalf.

Conclusion

I hope this post clarifies some thoughts about type systems and testing for you. Here are some other posts that are related:

  1. Programming Language Type Systems I
  2. Programming Language Type Systems II
Advertisements

Programming Language Type Systems II


Strong/weak typing describes the ease of mixing variables of different types in expressions. Strong typing does not imply static typing just as weak typing does not mean dynamic typing.

To put it simply, if a programming language tries to interpret an expression containing variables of varying types at runtime (e.g. adding an int to a string), then it is most probably weakly-typed however if it throws an error; then it is strongly-typed.

Strong Typing

Strongly-typed languages do not try to deduce programmer intent when they come across expressions containing varying types; they throw errors and do not do any implicit type coercion.

#python
two = "2"; #String

four = two + 2; #KA-BOOM!!!

#TypeError: cannot concatenate
'str' and 'int' objects

This should not be confused with dynamic typing where a variable’s type can change; the issue here is trying to use a type (string) where another type (int) is expected.  To make the above example pass, the string needs to be cast into an int.

#python
two = "2" #String

four = int(two) + 2;

#four is now 4

Weak Typing

Weakly-typed languages allow you to mix types in expressions: the compiler will not throw an error, rather it’ll try to deduce your intentions. This can lead to really weird bugs that occur one thousand miles away from the origin.

//JavaScript
var four = "4"

var five = four + 1;

console.log(five);//prints "41"

The compiler implicitly assumed that you were concatenating strings (the expression is ambiguous: it involves adding a String to a Number) and ‘helped’ to make this happen. However, the code will probably blow up because “five” is not 5.

What you should know

Strongly-typed languages do type-checking on variables at runtime and only allow legal operations for that variable type to be carried out whereas weakly-typed languages try to inteprete (understand/guess) the user intentions.

Strong typing does not imply static typing, it means that a language will not automatically coerce a variable into another type to make expressions valid. Weak typing however means that the type of a variable can be changed at run time based on its context in expressions.

In general, the ease of detecting programming errors decreases as you go from one to four in the list below. Due to the implicit conversions in dynamic weakly-typed languages, bugs might be caused by seemingly innocent code.

1. Static & strongly-typed: Haskell

2. Static & weakly-typed: C

3. Dynamic & strongly-typed: Ruby, Python

4. Dynamic & weakly-typed: PHP, JavaScript

And Finally…

In conclusion, it is good to know that  the strong/weak delineation however is blurred and not clearly-cut. To avoid ambiguity and confusion, it’s better to describe languages with respect to type safety and that should insha Allaah be the next post in this series. Hang on…

Did you enjoy this post? Read the first post.