Floating Point Number Confusion

Ranmal Dewage
6 min readMay 7, 2021

--

Source (https://www.rd.com/article/what-is-mental-math-tricks/)

First of all, I will ask a small question. How many time the below-mentioned for-loop will execute;

Most of you may say it will iterate 100 or 99. But actually, the for-loop will run indefinitely without stopping. To understand that, we need to see how the floating-point behaves inside the computers. The IEEE has introduced a Standard for the Floating Point Arithmetics (IEEE 754) and it is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE).

Output of the above code segment :
10.0
9.9
9.8
9.700000000000001
9.600000000000001
9.500000000000002
9.400000000000002
9.300000000000002
9.200000000000003
9.100000000000003
9.000000000000004
8.900000000000004
8.800000000000004
8.700000000000005
8.600000000000005
8.500000000000005
8.400000000000006
8.300000000000006
8.200000000000006
8.100000000000007
8.000000000000007
7.9000000000000075
7.800000000000008
7.700000000000008
7.6000000000000085
7.500000000000009
7.400000000000009
7.30000000000001
7.20000000000001
7.10000000000001
7.000000000000011
6.900000000000011
6.800000000000011
6.700000000000012
6.600000000000012
6.500000000000012
6.400000000000013
6.300000000000013
6.2000000000000135
6.100000000000014
6.000000000000014
5.900000000000015
5.800000000000015
5.700000000000015
5.600000000000016
5.500000000000016
5.400000000000016
5.300000000000017
5.200000000000017
5.100000000000017
5.000000000000018
4.900000000000018
4.8000000000000185
4.700000000000019
4.600000000000019
4.5000000000000195
4.40000000000002
4.30000000000002
4.200000000000021
4.100000000000021
4.000000000000021
3.9000000000000212
3.800000000000021
3.700000000000021
3.600000000000021
3.500000000000021
3.400000000000021
3.3000000000000207
3.2000000000000206
3.1000000000000205
3.0000000000000204
2.9000000000000203
2.8000000000000203
2.70000000000002
2.60000000000002
2.50000000000002
2.40000000000002
2.30000000000002
2.2000000000000197
2.1000000000000196
2.0000000000000195
1.9000000000000195
1.8000000000000194
1.7000000000000193
1.6000000000000192
1.500000000000019
1.400000000000019
1.300000000000019
1.2000000000000188
1.1000000000000187
1.0000000000000187
0.9000000000000187
0.8000000000000187
0.7000000000000187
0.6000000000000187
0.5000000000000188
0.4000000000000188
0.3000000000000188
0.2000000000000188
0.1000000000000188
1.8790524691780774E-14
-0.09999999999998122
-0.19999999999998122
-0.2999999999999812
-0.39999999999998126
-0.49999999999998124
-0.5999999999999812
-0.6999999999999812
-0.7999999999999812
-0.8999999999999811
-0.9999999999999811
-1.0999999999999812
-1.1999999999999813
-1.2999999999999814
-1.3999999999999815
.
.
.
.
.
.
// and like so it will continue indefinitely

As all of you know, computers will convert any number to binary(bits) before doing any kind of arithmetic operation. Then perform addition and subtraction on those converted binary data to obtain results of the arithmetic operations. According to the above standard, the floating-point number will be categorized based on precision as single, double, and long double, as shown below. Most importantly, a given floating-point number will divide into three parts as Sign, Exponent, and Mantissa. Each part has a designated maximum number of bits that can be used to represent a floating-point number at the machine level, as shown below.

IEEE 754 Floating-Point Standard

Decimal to Binary Conversion Process

Figure 1: Conversion of 9.1 to Binary
Figure 2: Scientific Notation, Sign value, Exponent value, and Manytissa value

We will consider 9.1 as a single-precision number, but the process is the same for other precision numbers as well. As shown in figure 1, when converting 9.1 to binary first convert 9 to binary and it will be 1001. Then convert the 0.1 to binary it will be 00011001100110011…. recurring binary number. After that, write the binary format of the 9.1 in the scientific nation, and it will be 1.00100011001100110011…. x ²³. The ²³ is referred to as Exponent Base and adds that power 3 to the 127 get the Exponent Bits (since it ranges from -128 to 127) of floating-point number representation as shown in figure 2. Since this is a positive number, it will be 0 at Sign Bit. Then exclude the 1 in the scientific notation of the 9.1 and take the decimal part and store it in Mantissa Bits (Only 23 bits ignore the rest). So the finalize floating-point number representation of 9.1 according to the IEEE 754 is 01000001000100011001100110011001, as shown in figure 3.

Figure 3: Approximation in the Mantissa Bits

Now comes the tricky part, the Mantissa section is 00100011001100110011001….. when storing on computer memory, it will have only have 23 bits. What happened here is when storing only 23 bits from this recurring decimal, it will look whether the 24th bit is 1 or 0. If it is 1, add 1 bit to the 23rd-bit position. If it is 0, leave it as it is. Since in binary representation of 9.1 have 1 bit at the 24th-bit position, it will add 1 bit to the 23rd-bit position. So our final IEEE 754 floating-point number representation would be 01000001000100011001100110011011. As shown in figure 3, I have highlighted the difference in IEEE 754 floating-point number representation due to approximation happens in the Mantissa Section.

Binary to Decimal Conversion Process

Figure 4: Converting Binary back to Decimal values

So the binary representation of 9.1 according to IEEE 754 is 01000001000100011001100110011011. When converting this binary format to decimal format again, since the Sign Bit is 0 number will be positive. The Exponent Bits have 130 in decimal value, so to get the Exponent Base, deduct 127 from that, as shown in figure 4. Therefore Converted Exponent part will be ²³. Then we can convert the Mantissa Bits to decimal, as shown in figure 4, and add 1 to it since we exclude bits left to the decimal point in scientific notations. So the Mantissa decimal value is 1.137500048. After that, multiply the converted Exponent and Mantissa parts, so now the finalized result when convert binary back to decimal value will be 9.10000048, as shown in figure 5.

Figure 5: additional value getting from back and forth binary to decimal conversions

So you can see when we convert back and forth from decimal to binary and binary to decimal inside the computers, we will get some additional values due to IEEE 754 standard approximation happen in the Mantissa section’s last bit. As a result of that, when we do subtraction continuously from the floating-point numbers in a computer, it will not hit zero exactly but go beyond zero to the negative side because of extra values generated from rounding off happen in the mantissa last bit, which is called Floating-Point Rounding Error. Now you can understand why the for-loop mention at the beginning of the blog will continue to execute indefinitely.

BigDecimal in Java

To avoid this floating-point issue, java has introduced a class called BigDecimal. BigDecimal is an arbitrary-precision signed immutable decimal number. BigDecimal consists of two parts as Unscaled value and Scale value. For example, if we consider the 10.342 BigDecimal number, it has an unscaled value of 10342, and the value of scale is 3.

  • Unscaled Value: An arbitrary precision integer
  • Scale: A 32-bit integer representing the number of digits to the right of the decimal point

When we are, dealing with high precision arithmetic or when we need control over scaling and rounding off behavior, BigDemical will be very helpful. One such example is calculations involving financial transactions. There are several ways to create Big Decimals using Strings, character arrays, integers, and long, as shown below.

Different ways of Creating a BigDecimal Object

Also, we can perform arithmetic operations such as addition, subtraction, multiplication, and division using BigDecimal, as shown below. If we need to perform a comparison on BigDecimal objects, we can use the compareTo operator. Since BigDecimal is immutable, arithmetic operations do not modify the existing objects. They will return new objects. To get to know about more operations you can perform on BigDecimal objects, visit https://www.baeldung.com/java-bigdecimal-biginteger.

How to perform Arithmetic Operations using BigDecimal Objects

Now, let’s rewrite the for-loop mentioned at the beginning of the article using BigDecimal and see we will get the output as we are expecting. I have used the compareTo operator to perform the comparison on BigDecimal objects.

For-Loop at the beginning of the article using BigDecimal
Now the output of the above code segment is similar to what we all think:
10
9.9
9.8
9.7
9.6
9.5
9.4
9.3
9.2
9.1
9.0
8.9
8.8
8.7
8.6
8.5
8.4
8.3
8.2
8.1
8.0
7.9
7.8
7.7
7.6
7.5
7.4
7.3
7.2
7.1
7.0
6.9
6.8
6.7
6.6
6.5
6.4
6.3
6.2
6.1
6.0
5.9
5.8
5.7
5.6
5.5
5.4
5.3
5.2
5.1
5.0
4.9
4.8
4.7
4.6
4.5
4.4
4.3
4.2
4.1
4.0
3.9
3.8
3.7
3.6
3.5
3.4
3.3
3.2
3.1
3.0
2.9
2.8
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2.0
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

References

For further more clarification check these resources;

--

--

Ranmal Dewage
Ranmal Dewage

Written by Ranmal Dewage

Software Engineer at Sysco Labs, Graduate of Sri Lanka Institute of Information Technology (SLIIT).

No responses yet