# What is a Floating Point Number?

## A basic data type, but what is it?

A simple question. this is a number which can store a decimal point. But why don’t we use this for every type of number? Read on to find out!

Difficulty: Beginner | Easy | Normal | Challenging

# Prerequisites:

• None, although negative numbers are stored as Two’s Complement which is not covered here (guide HERE)

# Terminology

Data types: A representation of the tyoe of data that can be processed, for example Integer or String

Exponent: The section of a decimal place after the decimal place

Floating Point: A number without a fixed number of digits before and after the decimal point

Integer: A number that has no fractional part, that is no digits after the decimal point

Mantissa: The section of a Floating point number before the decimal place

Precision: How precise or accurate something is

Real numbers: Another name for Floating Point Numbers

# Floating point numbers:

Why they are required:

Compared to `Floating Point` numbers `Integers` are precise and there can never be any rounding errors. However, `Integer` division typically means 1 / 2 = 1 which may not be suitable for all uses being coded.

## A simple definition:

A `Floating Point` number usually has a decimal point. This means that 0, 3.14, 6.5, and -125.5 are `Floating Point` numbers.

Since `Floating Point` numbers represent a wide variety of numbers their `precision` varies.

## Storing Integer Numbers

`Integer` numbers can be stored by just manipulating bit positions. One possible way of doing this is shown in the image below:

We can only store (2 to the power of n) — 1 numbers, but this is a simple way to store `Integer` numbers.

## Storing Floating Point

`Floating Point` numbers can’t be stored exactly like `Integer` numbers are. The issue is there is a decimal place — so what is the first number we store.

0.1

0.01

0.001

0.0001

We would have to define this for each number we stored, and then we would be restricted to that decision…so we would not be able to change from that choice.

So clearly this isn’t the way that we store `Floating Point` numbers. We split a `Floating Point` number into `sign`, `exponent` and `mantissa` as in the following diagram showing 23 bits for the `mantissa` and 8 bits for the `exponent`:

The above image shows an exponent (in Denary) of 1, with a mantissa of 1 — that is 1.1

Now in a real example this would be stored as `Two's complement` and even the mantissa can be offset by 127, but this basic example shows how it might be solved.

# Precision

## Single precision

Single precision `Floating Point` numbers are 32-bit. That means that 2,147,483,647 is the largest number can be stored in 32 bits.

That is, 2³¹ − 1 = 2,147,483,647

(remember: -1 because of the sign bit)

The smallest number that can be stored is the negative of the largest number, that is -2,147,483,647

## Double precision

Double precision `Floating Point` numbers are 64-bit. That means that 9,223,372,036,854,775,807 is the largest number that can be stored in 64 bits.

That is, ²³¹ − 1 = 9,223,372,036,854,775,807

(remember: -1 because of the sign bit)

The smallest number that can be stored is the negative of the largest number, that is -9,223,372,036,854,775,807

# Issues

## Overflow

`Floating Point` overflow occurs when an attempt is made to store a number that is larger than can be adequately stored by the model chosen. This is known as a `floating Point` overflow.

# Memory usage of Floating Point Numbers

The memory usage of `Floating Point` numbers depends on the precision `precision` chosen for the implementation.

# Conclusion:

`Floating Point` numbers are used in the real application of computing. This involves `sign`, `exponent` and `mantissa` as different parts of the number to store the number at the precision you desire.

• Two’s complement is shown in this article, but Wikipedia have a nice article (HERE)

Any questions? You can get in touch with me here

Written by