# Floating Point Number Confusion

First of all, I will ask a small question. How many time the below-mentioned for-loop will execute;

Most of you may say it will iterate 100 or 99. But actually, the for-loop will run indefinitely without stopping. To understand that, we need to see how the floating-point behaves inside the computers. The **IEEE **has introduced a **Standard for the Floating Point Arithmetics (IEEE 754)** and it is a technical standard for floating-point arithmetic established in **1985 **by the Institute of Electrical and Electronics Engineers (IEEE).

**Output of the above code segment :**

10.0

9.9

9.8

9.700000000000001

9.600000000000001

9.500000000000002

9.400000000000002

9.300000000000002

9.200000000000003

9.100000000000003

9.000000000000004

8.900000000000004

8.800000000000004

8.700000000000005

8.600000000000005

8.500000000000005

8.400000000000006

8.300000000000006

8.200000000000006

8.100000000000007

8.000000000000007

7.9000000000000075

7.800000000000008

7.700000000000008

7.6000000000000085

7.500000000000009

7.400000000000009

7.30000000000001

7.20000000000001

7.10000000000001

7.000000000000011

6.900000000000011

6.800000000000011

6.700000000000012

6.600000000000012

6.500000000000012

6.400000000000013

6.300000000000013

6.2000000000000135

6.100000000000014

6.000000000000014

5.900000000000015

5.800000000000015

5.700000000000015

5.600000000000016

5.500000000000016

5.400000000000016

5.300000000000017

5.200000000000017

5.100000000000017

5.000000000000018

4.900000000000018

4.8000000000000185

4.700000000000019

4.600000000000019

4.5000000000000195

4.40000000000002

4.30000000000002

4.200000000000021

4.100000000000021

4.000000000000021

3.9000000000000212

3.800000000000021

3.700000000000021

3.600000000000021

3.500000000000021

3.400000000000021

3.3000000000000207

3.2000000000000206

3.1000000000000205

3.0000000000000204

2.9000000000000203

2.8000000000000203

2.70000000000002

2.60000000000002

2.50000000000002

2.40000000000002

2.30000000000002

2.2000000000000197

2.1000000000000196

2.0000000000000195

1.9000000000000195

1.8000000000000194

1.7000000000000193

1.6000000000000192

1.500000000000019

1.400000000000019

1.300000000000019

1.2000000000000188

1.1000000000000187

1.0000000000000187

0.9000000000000187

0.8000000000000187

0.7000000000000187

0.6000000000000187

0.5000000000000188

0.4000000000000188

0.3000000000000188

0.2000000000000188

0.1000000000000188

1.8790524691780774E-14

-0.09999999999998122

-0.19999999999998122

-0.2999999999999812

-0.39999999999998126

-0.49999999999998124

-0.5999999999999812

-0.6999999999999812

-0.7999999999999812

-0.8999999999999811

-0.9999999999999811

-1.0999999999999812

-1.1999999999999813

-1.2999999999999814

-1.3999999999999815

.

.

.

.

.

.

// and like so it will continue indefinitely

As all of you know, computers will convert any number to binary(bits) before doing any kind of arithmetic operation. Then perform addition and subtraction on those converted binary data to obtain results of the arithmetic operations. According to the above standard, the floating-point number will be categorized based on precision as **single**, **double**, and **long double**, as shown below. Most importantly, a given floating-point number will divide into three parts as **Sign**, **Exponent**, and **Mantissa**. Each part has a designated maximum number of bits that can be used to represent a floating-point number at the machine level, as shown below.

# Decimal to Binary Conversion Process

We will consider 9.1 as a single-precision number, but the process is the same for other precision numbers as well. As shown in figure 1, when converting 9.1 to binary first convert **9** to binary and it will be **1001**. Then convert the** 0.1** to binary it will be **00011001100110011…**. recurring binary number. After that, write the binary format of the **9.1** in the** scientific nation,** and it will be **1.00100011001100110011…. x ²³**. The **²³** is referred to as **Exponent Base** and adds that power** 3** to the **127** get the** Exponent Bits** (since it ranges from -128 to 127) of floating-point number representation as shown in figure 2. Since this is a positive number, it will be **0** at **Sign Bit**. Then exclude the 1 in the scientific notation of the 9.1 and take the decimal part and store it in **Mantissa Bits **(Only 23 bits ignore the rest). So the finalize floating-point number representation of **9.1** according to the IEEE 754 is **01000001000100011001100110011001, **as shown in figure 3.

Now comes the tricky part, the **Mantissa **section is **00100011001100110011001….. **when storing on computer memory, it will have only have **23 bits**. What happened here is when storing only 23 bits from this recurring decimal, it will look whether the 24th bit is 1 or 0. If it is 1, add 1 bit to the 23rd-bit position. If it is 0, leave it as it is. Since in binary representation of 9.1 have 1 bit at the 24th-bit position, it will add 1 bit to the 23rd-bit position. So our final IEEE 754 floating-point number representation would be **01000001000100011001100110011011**. As shown in figure 3, I have highlighted the difference in IEEE 754 floating-point number representation due to approximation happens in the Mantissa Section.

# Binary to Decimal Conversion Process

So the binary representation of 9.1 according to IEEE 754 is **01000001000100011001100110011011**. When converting this binary format to decimal format again, since the **Sign Bit** is **0** number will be **positive**. The **Exponent Bits** have **130** in decimal value, so to get the** Exponent Base**, **deduct 127** from that, as shown in figure 4. Therefore Converted **Exponent **part** **will be **²³**. Then we can convert the Mantissa Bits to decimal, as shown in figure 4, and add 1 to it since we exclude bits left to the decimal point in scientific notations. So the **Mantissa decimal value** is **1.137500048**. After that, multiply the converted Exponent and Mantissa parts, so now the **finalized result **when **convert binary back to decimal value** will be **9.10000048, **as shown in figure 5.

So you can see when we convert back and forth from decimal to binary and binary to decimal inside the computers, we will get some additional values due to IEEE 754 standard approximation happen in the Mantissa section’s last bit. As a result of that, when we do subtraction continuously from the floating-point numbers in a computer, it will not hit zero exactly but go beyond zero to the negative side because of extra values generated from rounding off happen in the mantissa last bit, which is called **Floating-Point Rounding Error**. Now you can understand why the for-loop mention at the beginning of the blog will continue to execute indefinitely.

**BigDecimal in Java**

To avoid this floating-point issue, java has introduced a class called BigDecimal. **BigDecimal** is an **arbitrary-precision signed immutable decimal number**. BigDecimal consists of two parts as **Unscaled **value and **Scale **value. For example, if we consider the 10.342 BigDecimal number, it has an unscaled value of 10342, and the value of scale is 3.

**Unscaled Value**: An arbitrary precision integer**Scale**: A 32-bit integer representing the number of digits to the right of the decimal point

When we are, dealing with high precision arithmetic or when we need control over scaling and rounding off behavior, BigDemical will be very helpful. One such example is **calculations involving financial transactions**. There are several ways to create Big Decimals using Strings, character arrays, integers, and long, as shown below.

Also, we can perform arithmetic operations such as **addition**, **subtraction**, **multiplication**, and **division **using BigDecimal, as shown below. If we need to perform a comparison on BigDecimal objects, we can use the **compareTo** operator. Since BigDecimal is immutable, arithmetic operations do not modify the existing objects. They will return new objects. To get to know about more operations you can perform on BigDecimal objects, visit https://www.baeldung.com/java-bigdecimal-biginteger.

Now, let’s rewrite the for-loop mentioned at the beginning of the article using BigDecimal and see we will get the output as we are expecting. I have used the **compareTo **operator to perform the comparison on BigDecimal objects.

**Now the output of the above code segment is similar to what we all think:**

10

9.9

9.8

9.7

9.6

9.5

9.4

9.3

9.2

9.1

9.0

8.9

8.8

8.7

8.6

8.5

8.4

8.3

8.2

8.1

8.0

7.9

7.8

7.7

7.6

7.5

7.4

7.3

7.2

7.1

7.0

6.9

6.8

6.7

6.6

6.5

6.4

6.3

6.2

6.1

6.0

5.9

5.8

5.7

5.6

5.5

5.4

5.3

5.2

5.1

5.0

4.9

4.8

4.7

4.6

4.5

4.4

4.3

4.2

4.1

4.0

3.9

3.8

3.7

3.6

3.5

3.4

3.3

3.2

3.1

3.0

2.9

2.8

2.7

2.6

2.5

2.4

2.3

2.2

2.1

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

# References

For further more clarification check these resources;