Never confuse education with intelligence, you can have a PhD and still be an idiot.
- Richard Feynman -

# ITIC:Digital representation - Binary

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

# Digital representation

## Binary numbers

### bits and Bytes

A computer stores data in "digital" format, meaning that it stores information as numerical representation, using what is called "binary" numbers.

A Binary number is a number which can be represented using only two symbols (normally 0 and 1). One such binary digit (a 1 or a 0) is called a "bit" (which is actually short for BInary-digiT).

Using only 0 and 1, we can represent two values in one such bit. The decimal numbers we can represent using only one bit are also 010 and 110.

If we use two bits, we can represent four different decimal numbers:

• 010 `002`
• 110 `012`
• 210 `102`
• 310 `112`

What if we use three bits? How many combinations can we get? Each combination will represent one decimal number. It turns out that we get eight combinations from three bits, and 16 combinations from four bits. The pattern comes from powers of 2:

• (20 = 1) here for completion - we don't use 0 bits to store information ;-)
• 21 = 2 using one bit
• 22 = 4 using two bits
• 23 = 8 using three bits
• 24 = 16 using four bits
• etc etc

Of course, the value doubles every time the exponent increases by one (because that's what happens when you multiply a number by 2 ;-) ).

The powers of 2 are the number of decimal numbers we can represent. If we start at 0 and only represent positive numbers, then the decimal numbers we can represent will be 0 through the exponent minus 1:

• 21 = 2 - the numbers 0 - 1 since (2-1=1)
• 22 = 4 - the numbers 0 - 3 since (4-1=3)
• 23 = 8 - the numbers 0 - 7 since (8-1=7)
• 24 = 16 - the numbers 0 - 15 since (16-1=15)

We take the power minus one, since one combination will be the zero, the next one etc.

A common term used for data in binary format is "byte". A byte normally means eight bits. How many combinations can you get using eight bits? 256! Since 28 is 256. So, for positive numbers, we can represent 0 - 255 using a byte.

### Negative numbers

A common way to represent negative numbers is to use a "sign bit", for instance the leftmost bit (when we write the binary number down). This means that we have one bit less to represent the value (since we are wasting one bit for the sign). Typically, the sign bit is implemented such that a one means "negative" and a zero means "positive".

The following could be one way of representing the number minus one using eight bits: `10000001`, in binary. But then the problem becomes deciding how to represent the value zero. Should we have two representations for zero? Minus zero: `10000000` and positive zero: `00000000`?

Back in the day, a technique called one's complement was used for representing negative numbers. A leftmost 1 still signifies "negative" and a leftmost 0 "positive". To represent a negative number, one takes its positive representation and then flips all bits (all 0s become 1 and all 1s become 0). This way too renders two zeros, negative zero (in eight bits): `1111 1111` and positive zero `0000 0000`. Counting with one's complement numbers is also a little tricky when it comes to subtraction. See one's complement for details.

A common way to "solve" these "problems" is to use an algorithm (yeah, we love to use that word, it gives us 5 imaginary academic points every time we write or say "algorithm") called "two's complement". In this way of representing numbers, there is no representation of "minus zero" because in this way of representing numbers in binary, we only need "one" single representation of zero. In fact, trying to represent minus zero using this technique, renders the same pattern as for positive zero.

The "trick" (or "algorithm" if you must) used in two's complement, is to take these steps:

• For a positive number, represent it with a leading zero for the sign bit, and use the remaining bits to represent the number in binary format
• For a negative number, represent it with a leading one for the sign bit, then represent the number as the binary number with all bits flipped (all ones become zeros and vice versa), and then add one to the resulting number.
• Or: represent the number as normal, flip all bits (including the sign bit), add one to the result.

Some examples:

• the number +310 in eight bit binary:

`000000112`

To evaluate that binary number using the two's complement, we only have to look at the sign bit (leftmost) and see that it is a positive number, so the seven "value bits" are then to be evaluated as normal:

• the rightmost bit is a one, so it represents the value 1 (1*20 = 1)
• the next (going left) bit is also one, so it represents the value 2 (1*21 = 2)
• 1 + 2 = 3
• the value `000000112` is then 310

To represent the value -310 in eight bit binary (using two's complement), we'd do the following:

• calculate the binary number for 3 in eight bits, which is `0000 00112`
• flip all the bits, which gives us `1111 1100`
• add one, which gives us:
``` 1111 1100
+0000 0001
---------
1111 1101
```

Conclusion: -310 is `1111 11012` using two's complement.

To go the other way, from `1111 11012` to decimal, we'd look at the sign bit and see that it is a negative number.

It's a negative number, so we have to calculate its "two's complement" representation.

```a) Flip all the bits
1111 1101
becomes:
0000 0010
```

```b) add one:
0000 0010
+0000 0001
----------
0000 0011 (3 in decimal)
```

The number we now have in binary, should be interpreted "normally", and then changed to its inverse number (itself times -1):

Change the sign of 3 to -3.

```Answer: 11111101 in binary is -3 in decimal.
```

Another example:

What decimal number is the binary number, in eight bits, 0000 1000 (using two's complement)?

The number starts with 0, which means positive, so we don't need to take the number's two's complement.

The answer is: The number is 8 in decimal.

One more example: What decimal number is the binary number, in eight bits, 1000 1000 (using two's complement)?

The number starts with a 1, so it's negative and we have to take its two's complement:

```a) Flip all bits:
0111 0111

111  (carry)
0111 0111
+0000 0001
----------
0111 1000

64 + 32 + 16 + 8 = 120
Change the sign, so the answer is: 1000 1000 represented using two's complement is -120 in decimal (using eight bits).
```

Finally, lets see how the two's complement work with the value negative zero. We'll take zero and flip the bits and add one:

```  1111 1111
+0000 0001
-----------
1 0000 0000

In eight bits, that's 0000 0000
```

(the leftmost 1 in the result is truncated off, since we are only looking at eight bits)

This means we end up with the same bit pattern as we started with, meaning we have only one representation of zero.

How does addition and subtraction work in binary? The same as in decimal (or any base!). Let's look at some examples using four bits for brevity:

```   1  <--- carry since 1+1 = 10 in binary
0001
+0001
-----
0010

In decimal:
1
+1
--
2

Let's subtract:
1 <-- borrow 1 from the digit to the left
0010
-0001
-----
0001

In decimal:
2
-1
--
1

One more example:
***1     (borrow)
1000     (8)
-0001     (1)
-----
0111     (7)
```

Working from right to left, 0 - 1, we have to borrow. But the next bit is also 0, so we have to borrow. And again, until we borrow from the leftmost 1 in 1000.

We borrow 10 which is 2 in decimal. When we borrow 10 from the leftmost 1 in 1000, we get 10 to the right of it and so on.

Substraction in decimal works the very same:

```Decimal 1000 - 1:

***1  (borrow 10s)
1000
-   1
-----
0999
```

Only in decimal, we borrow 10 which is 10 in base 10. In binary, we also borrow 10 (which is 2 in decimal).

In decimal, when we borrow 10, and subtract 1, we get 9.

In binary, when we borrow 10, and subtract 1, we get 1 (2 - 1 is 1)

Put differently, in binary:

``` 1 - 0 = 1
1 - 1 = 0
0 - 1 = 1 (borrow 10 from left)
10 - 1 = 1
```

### What numbers can we represent?

Now that we know how the two's complement work (and that it was "invented" to allow us to get only one representation of zero), and that signed numbers use one bit for the sign (1 means negative, 0 means positive), we also understand that we can represent fewer values when we have signed values.

If we take a type for a signed byte of eight bits, like the Java primitive `byte` type, we will thus only be able to represent the following values: -12810 through +12710 (256 values including the 0). This is because we only have seven bits for the value and one bit for the sign. Again, 2^8 is 256, so that's the number of combinations of ones and zeros, and thus the number of representations for decimal numbers. The span of the0 negative numbers are -128 through -1, then we have zero, and then the positive numbers go from 1 through 127. This is a pattern, an eight bit type (eight bits of memory) can only produce 127 different positive numbers and zero since 2^7 is 128. All zeros would represent -128.

So, in two's complement, the values are the from minus 2^(number of bits -1) to 2^(number of bits -1) -1:

• 8 bits: -27 -- (+27) - 1 (-128 -- +127)
• 16 bits: -215 -- (+215) -1 (-32,768 -- 32,767)
• 32 bits: -231 -- (+231) -1
• etc

Here's how you would arrive at the eight bit two's complement for -127:

```
In decimal, 127 is 64 + 32 + 16 + 8 + 4 + 2 + 1, so the bits to set are for the binary number is:
111 1111
||| ||| `----2^0, which is 1
||| |||
||| || `-----2^1, which is 2
||| ||
||| | `------2^2, which is 4
||| |
|||  `-------2^3, which is 8
|||
|| `---------2^4, which is 16
||
| `----------2^5, which is 32
|
`-----------2^6, which is 64

Padded with one zero to get eight bits we get:
0111 1111

Flip the bits:
1000 0000

1000 0000
+0000 0001
----------
1000 0001

Answer: the two's complement representation for -127 is 1000 0001
```

Again, to go the other way around, if we have the eight bit two's representation `1000 0001` and want to know what decimal number that represents, we'd do:

```We have:
1000 0001

1. Flip the bits:
0111 1110

0111 1110
+0000 0001
----------
0111 1111

3.Calculate the value of the result (as if it were a positive number):

111 1111
||| ||| `----2^0, which is 1
||| |||
||| || `-----2^1, which is 2
||| ||
||| | `------2^2, which is 4
||| |
|||  `-------2^3, which is 8
|||
|| `---------2^4, which is 16
||
| `----------2^5, which is 32
|
`-----------2^6, which is 64

so, 1 + 2 + 4 + 8 + 16 + 32 + 64 = 127 (decimal)

4. Take the inverse of that value:
127 * -1 = -127

Answer: The two's complement representation 1000 0001 represents the decimal number -127.
```

We can check the maximum positive value possible to represent in two's complement representation in eight bits, 0111 1111, and calculate that it is 127 (see above).

To calculate the minimum value, we know that it starts with a 1 (sign bit for negative). Since 27 is 128, this would be `1000 0000` using unsigned binary. Let's try to represent -128 in two's complement to see if it is possible.

```1. Take the absolute number of -128 and represent it in binary:
1000 0000

2. Flip the bits:
0111 1111

1111 111  (carry)
0111 1111
+0000 0001
----------
1000 0000

Answer: The two's complement representation for -128 is 1000 0000

Comment: It is a rather weird number, since its two's complement is the very same number.
```

Would it be possible to represent an even smaller number than -128 in eight bits using two's complement representation? No, since we have only eight bits, the maximum number of representations is 256 (28). We saw that we can represent 127 positive numbers and zero. That's 128 representations (bit combinations). 256 - 128 = 128. We saw that we could represent 128 negative numbers. So there can't be a bit pattern for -129.

For positive values, the number is padded with zeros from left to the first one. For negative values, the number is padded with ones from left to the first zero.

Example: eight bit 310 is:

```0000 0011
```

Same number using 16 bits:

```0000 0000 0000 0011
```

Eight bit -310 is:

```1111 1101
```

Same number in 16 bits:

```1111 1111 1111 1101
```

## Text

Text is stored in a computer as any data is stored - using bits representing binary numbers. But, you may rightfully wonder, can a number represent a piece of text? There are standard character tables (the most common of which is the ASCII table), which map numbers to characters.

So when a computer stores a string of characters, it stores the numbers of the characters from some table as binary numbers (ones and zeros as above).

Let's look at an example of storing ABC using the ASCII table. Here's the relevant part of the ASCII table for the letters A, B and C:

• `A` 65
• `B` 66
• `C` 67

In binary, A is coded as 65 in binary, which is 100 0001. B is 66 which is 100 0010, and C is 67 which is 100 0011. So when the computer stores `ABC` in e.g. a file, using the ASCII table, the computer will store the following bits using 8 bit bytes: `0100 0001 0100 0010 0100 0011`. Note that we group the bits in chunks of four for increased readability here (the computer stores them consecutively). This is so common, that one actually uses special notation for a chunk of four bits, using base 16 (a.k.a. hexadecimal). Since in base 16, unsigned numbers go from 0 to 15, we need notation to express those values. In hexadecimal, values go from 0 to F:

• 0
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• A (1010)
• B (1110)
• C (1210)
• D (1310)
• E (1410)
• F (1510)

Since we need four bits to represent decimal values 0 - 15, grouping bits in fours and translating to hexadecimal is very convenient (and common).

Here's another example of encoding text, the text `XYZ` will be encoded like this:

• X binary: 0101 1000 decimal: 88 hex: 58
• Y binary: 0101 1001 decimal: 89 hex: 59
• Z binary: 0101 1010 decimal: 90 hex: 5A

So the data stored will be:

```0101 1000 0101 1001 0101 1010
which can be written for us humans as hex:
58 59 5a
the mapping is:
0101 1000 0101 1001 0101 1010
5    8    5    9    5    A
```

What about text with lines, tabs, spaces and sections?

Those characters are also present in the ASCII table. A line of text is simply text ending with a newline character. The newline character has decimal value 10 in the ASCII table. Two newline characters in a row will make the text look like it is divided in sections. A tab is simply a number in the ASCII table, 9. Space has number 32 (decimal) in the ASCII table.

Note that we can only represent structured text using the ASCII table. There is no formatting such as fonts, font faces (bold, italics, underline etc). Data which contains only structured text from the ASCII table is called "plain text". In order to represent style and fonts, we need a different encoding (like a file format for Libre Office Write, or Microsoft Word). Styled documents are NOT called text documents or plain text. They are called something else, like word processor documents etc.

## Overflow and underflow

If the largest number we can store in eight bits, using two's complement, is 127, what happens if we try to add one to that number?

``` 1111 111  <-- carry
0111 1111
+0000 0001
----------
1000 0000
```

In binary, 1+1 = 10. So we get one in carry. Eventually, the sign bit gets changed.

This means that if we have an eight bit signed type like `byte` in Java, adding 1 to 127 using this type, will produce -128. This might seem surprising, but if we think of it, how would you represent +128 if you had as many bits you needed?

```1000 0000 = 2^7
2^7 = 128
```

But using two's complement, the leftmost bit isn't worth 27, since it is the sign bit. So, if we are using eight bits and two's complement, we'd interpret the same bit pattern differently:

```1. Flip the bits:
0111 1111

1111 111  (carry)
0111 1111
+ 0000 0001
-----------
1000 0000 (128)

So the number is -128 in eight bit binary (using two's complement).
```

So, the same bits represent 128 and -128. If we have 16 bits, 128 would be:

```0000 0000 1000 0000
```

If we only have eight bits, we can't represent 128 (if we represent it using two's complement), so the same bit pattern will represent -128:

```          1000 0000
```

From this we hopefully learn that how to interpret a given bit pattern depends on what method is used for representation. `1000 0000` represents -128 if we are using two's complement using eight bits. If we are using the binary number system mathematically for unsigned numbers, `1000 0000` instead represents +128.

A related question is, what happens if we subtract 1 from -128 using eight bits (also using two's complement)?

``` **** ***  (borrow)
1000 0000 (-128)
-0000 0001    (1)
----------
0111 1111  (127)
```

Now the sign changed to positive, and the value is 127.

Here's a small Java program showing what happens:

```public class Bin {
public static void main(String[] args) {

byte b = 125;
System.out.println(b + " + 1 = " + ++b);
System.out.println(b + " + 1 = " + ++b);
System.out.println(b + " + 1 = " + ++b);
System.out.println(b + " + 1 = " + ++b);
System.out.println(b + " - 1 = " + --b);
System.out.println(b + " - 1 = " + --b);
System.out.println(b + " - 1 = " + --b);

System.out.println();
System.out.println("32 bits for -128: " + Integer.toBinaryString(-128));
System.out.println("32 bits for +128: " + padding + Integer.toBinaryString(128));

System.out.println("Using 8 bits:"); // int is actually 32 bits, but we cheat and take the last eight chars
System.out.println("-128: " + Integer.toBinaryString(-128).substring(24));
System.out.println("+128: " + Integer.toBinaryString(128)); // the 24 leading zeros are not shown
}
/*
\$ javac Bin.java && java Bin
125 + 1 =  126
126 + 1 =  127
127 + 1 = -128
-128 + 1 = -127
-127 - 1 = -128
-128 - 1 =  127
127 - 1 =  126

32 bits for -128: 11111111111111111111111110000000
32 bits for +128: 00000000000000000000000010000000
Using 8 bits:
-128: 10000000
+128: 10000000
*/
```

• TODO