2014-03-21

Call by what? Understanding Python variables

You might run across something similar to this in a Python program:

consumer_x = consume(consumer_x, banana, milk)
weigh(consumer_x)

def consume(consumer, food, beverage):
    consumer.eat(food)
    consumer.drink(beverage)
    return consumer

This code is a bit more convoluted than it has to be. To understand why, we need to understand how Python variables work.

Labeled Shoeboxes

A variable in an old language like C is like a box with a label. (For some reason I thought shoeboxes when I studied C.) It has a certain size, e.g. big enough to fit an integer, or a double precision floating point number. You can write something like this in C:

int a = 12345;
int b;
b = a;
a = a + b;
b = a;

This roughly means the following:
1. Make an integer sized box, label it "a", and place the value 12345 in it.
2. Make another integer sized box and label it "b". Leave it empty for now.
3. Copy the value of a into the b box. Now both a and b contain 12345.
4. Calculate 12345 + 12345 = 24690 and put that in the a box.
5. Let's copy the value of a into b again. Both boxes contain 24690 now.

Call by what?

If you do something like "consume(consumer, food, beverage)" in a language with C-like variables, you need to make up your mind about the semantics here. What's going on with the variable "a" if you pass it to something like consume?

Will the value in the "a box" be copied into some other box inside consume? We would call this call by value or pass by value.

The other option would be that the code inside consume would be able to access the "a box". We call that call by reference.

So, what about Python? Is it call by value or call by reference? Neither I'd say. Some say call by object or call by sharing. The experienced C programmer would probably say that we're passing a pointer to "a" by value, but let me explain this with something more similar to the shoebox.

Balloon, tags and strings

For some reason, I see Python variables like balloons floating in the sky. They can take any size, and while you can't really control where they are, there are strings attached. You hold on to the other end of the string, and you've put a tag on it.

Let us repeat the example from above in a Pythonic way:

a = 12345
b = a
a = a + b
b = a

1. Let's make a balloon and put the integer 12345 in it. Attach a string and tag it "a" in the end you hold on to. (In fact, this balloon will never contain any other value than 12345, but we'll get back to that.)
2. Let's attach a new string to the 12345 balloon, and tag that "b". Now there are two strings attached to 12345.
3. Let's make a new balloon and fill it with 12345 + 12345 = 24690. Move the string tagged "a" from the 12345 balloon to the 24690 balloon. The balloons now have a string each.
4. Move the string tagged "b" to the balloon with the "a" string attached, i.e. the 24690 balloon. The 12345 balloon no longer has any strings attached, so it flies away.

A more technical term is that 12345 get's garbage collected (or to be particular, it will be reference counted if it's the standard version of Python).

Mutable and immutable types

All Python objects (oops, I meant balloons) have a type. Every type is either mutable or immutable. Integers are immutable, so once you've created an integer balloon (or object) it will never change its value. So, while "a = a + 1" means "change the value in the a-box" in C, it means "move the a-tagged string to a balloon with another value" in Python.

That's true for both mutable and immutable types: Assigment in Python always means "move the string to a new balloon". The silly thing with the example we began with, is that it's the same balloon it was already attached to. Let's go back to that:

consumer_x = consume(consumer_x, banana, milk)

def consume(consumer, food, beverage):
    consumer.eat(food)
    consumer.drink(beverage)
    return consumer

Before we call consume(...), consumer_x is a tagged string attached to some balloon. When we call consume() we attach the string with a consumer-tag inside the consume() function to the same balloon. Then we call the .eat() and .drink() methods on our ballon. Then we pass our balloon back to the caller. Finally, we reuse our variable name and do consumer_x = consume(...). This means that we detatch the string from the balloon it was to connected to, and instead we ... yes that's right ... we re-attach it to the same ballon again. That's silly isn't it?

The crucial thing is that we don't reassign consumer in consume. Since we can't see any "consumer = ..." in the body of consume, we know for sure that we return the same object as we got as input. Not much point in that. It's as if I would visit your home,  grab one of your flower pots from one of your window sills and present it to you as a gift.

For this function to make any sense at all, consumer is hardly of an immutable type, like an integer. It's probably an instance of a class. Along with e.g. lists, sets and dicts, that's a mutable type.

The difference between immutable and mutable, is that mutable objects can change (or mutate) after their creation. This is pretty important in Python. For instance, you can only use immutable values as keys in dicts. With an object such as a list, the value can change even though it's the same balloon a.k.a. object.

>>> a = 1
>>> print a, id(a)
1 30716560
>>> a += 2
>>> print a, id(a)
3 30716536
>>> # See, new value and new id, i.e. another object.
...
>>> l = []
>>> print l, id(l)
[] 37012744
>>> l.append('x')
>>> print l, id(l)
['x'] 37012744
>>> # New value, but still same object!
...

Let's make it a little more complicated...

If we look at the list above, it's balloon can obviously grow. A Python list is a like an array or vector in other languages, so if it's a list of 100 floating point numbers, its a big balloon. It won't contain 100 floats though. It will contain 100 strings, each leading to a floating point balloon.

So, the strings we attach to balloons can either end in another balloon, or they can have a tag in a location we call a scope. The balloons float in a part of the computer memory we call the heap, and the scopes with tags are in another part of the memory, called the stack. As long as you stick to Python, you don't really have to care about that.

Which is the variable?

In a C-like language, it's pretty obvious what a variable is. It's a labeled shoebox of a particular size/type. It's called variable, since its content can vary (within the constrains of the type). If you declare it const, it's not a variable, but a constant.

But what about Python? Which is actually the variable? The tag? The balloon? It's not the string, is it? If it's the tag, then Python variables don't have types, and that's a silly thing to claim. If it's the balloon, then Python don't have integer variables, just constants, and that would be an equally silly claim.

Perhaps it's the whole arrangement which is the variable. Perhaps the term variable doesn't make so much sense in Python? Maybe it's better to just talk about objects and names?

Want to know more? Take a look at Fredrik Lundh's explanation at http://effbot.org/zone/python-objects.htm

2014-03-20

print 0100?

Only start numbers with 0 if the second character is a decimal point!



A few days ago, I saw some Python 2 code looking like this:

expval = date(2011, 03, 02)

Ok, that's harmless, but only because the number was less than 8... Let me show you:

>>> print 02
2
>>> print 03
3
>>> print 07
7
>>> print 08
  File "<stdin>", line 1
    print 08
           ^
SyntaxError: invalid token
>>> print 0100
64
>>> print 100
100

If it wasn't clear to you from the beginning, you probably got it now: Numbers beginning with a 0 are seen as octal number in Python 2. (I.e. base 8 instead of base 10 as the decimal numbers we normally use.) This was probably not Guido's brightest move when he designed Python. He simply copied a common practice in other languages such as C. Prefix 0x always meant hexadeciaml, and later 0b appeared for binary (even though he rejected it when I first asked) and 0o as a saner syntax for octal.

>>> print 0xff
255
>>> print 0b01010
10
>>> print 0b10010
18
>>> print 0o10010
4104
>>> print 0o100
64

In Python 3, 0o ithe the only way to write octal numbers. Literals consisting of digits starting with 0 is a syntax error:

>>> print(0o100)
64
>>> print(0100)
  File "<stdin>", line 1
    print(0100)
             ^
SyntaxError: invalid token

2014-03-15

Picky Python

I first discovered Python back in version 1.4, in 1996. It's been my favourite programming language since then.

One of the things I like best with Python is that it's designed to make it easy to make things right, rather than hard to make things wrong.

There is still a lot of code which could be improved a lot. There are several reasons for that:
  • One way or another, the problem is often that the programmers don't know Python so well.
  • Some seem stuck in the idioms of another language, such as C. A variant is people who try to compensate for the lack of static typing, rather than make use of dynamic typing in an efficient way.
  • Many reinvent the wheel instead of using the vast flora of libraries available in the standard and in the Cheese Shop etc.
  • Since Python is easy to learn, it's often used by people who aren't hard-core programmers. While they will do much better with Python than with e.g. C++, they'll still make beginner's mistakes.
  • Regardless of language, there is a lot of software which was written with less care and attention than it deserved. Perhaps because the programmer didn't have to maintain it?
  • I've seen a lot of poorly written code in corporate settings, and that's really an organizational problem, rather than a personal: Companies can hire the right people, train them well, establish a culture of software quality and provide constructive goals and priorities. Or not! As Deming said: Don't place blame on the workforce

I thought I'd write a bit about problematic Python constructs I've come across, explain why they cause trouble, and what to do instead. That's the main thought with this blog. Hopefully, it will help someone now and then...