Can we get rid of primitive types in Java — [Notes]
· Basics#
∘ Boxing and Unboxing
∘ Memory Footprint
∘ Memory Footprint for Arrays
∘ Performance
· Why do people still use primitive types in Java?
∘ #1
∘ #2
∘ #3
∘ #4
∘ #5
∘ #6
· Reference
I’m comfortable using primitive data types (int, long, etc.) when writing Java code, but when I need to use a built-in data structure (like queue, list, etc.), I have to use wrapper classes because these data structures don’t support primitive data types.
So, here’s my question:
Can’t we use Wrapper classes only and forget primitives? (to avoid boxing, unboxing hassle+cost)?
Basics#
Boxing and Unboxing
Integer j = 1; // autoboxing
int i = new Integer(1); // unboxing
Memory Footprint
Just for the reference, the primitive type variables have the following impact on the memory:
- boolean — 1 bit
- byte — 8 bits
- short, char — 16 bits
- int, float — 32 bits
- long, double — 64 bits
In practice, these values can vary depending on the Virtual Machine implementation.
In Oracle’s VM, the boolean type, for example, is mapped to int values 0 and 1, so it takes 32 bits, as described here: Primitive Types and Values.
Variables of these types live in the stack and hence are accessed fast.
The reference types are objects, they live on the heap and are relatively slow to access. They have a certain overhead concerning their primitive counterparts
single instance of a reference type on this JVM occupies 128 bits except for Long and Double which occupy 192 bits:
- Boolean — 128 bits
- Byte — 128 bits
- Short, Character — 128 bits
- Integer, Float — 128 bits
- Long, Double — 192 bits
We can see that a single variable of Boolean type occupies as much space as 128 primitive ones, while one Integer variable occupies as much space as four int ones.
Memory Footprint for Arrays
The situation becomes more interesting if we compare how much memory occupy arrays of the types under consideration.
When we create arrays with the various number of elements for every type, we obtain a plot:
that demonstrates that the types are grouped into four families with respect to how the memory m(s) depends on the number of elements s of the array:
- long, double: m(s) = 128 + 64 s
- short, char: m(s) = 128 + 64 [s/4]
- byte, boolean: m(s) = 128 + 64 [s/8]
- the rest: m(s) = 128 + 64 [s/2]
where the square brackets denote the standard ceiling function.
Surprisingly, arrays of the primitive types long and double consume more memory than their wrapper classes Long and Double.
We can see either that single-element arrays of primitive types are almost always more expensive (except for long and double) than the corresponding reference type.
Performance
The performance of a Java code is quite a subtle issue, it depends very much on the hardware on which the code runs, on the compiler that might perform certain optimizations, on the state of the virtual machine, on the activity of other processes in the operating system.
As we have already mentioned, the primitive types live in the stack while the reference types live in the heap. This is a dominant factor that determines how fast the objects get be accessed.
To demonstrate how much the operations for primitive types are faster than those for wrapper classes, let’s create a five million element array in which all elements are equal except for the last one; then we perform a lookup for that element:
while (!pivot.equals(elements[index])) {
index++;
}
and compare the performance of this operation for the case when the array contains variables of the primitive types and for the case when it contains objects of the reference types.
We use the well-known JMH benchmarking tool (see our tutorial on how to use it), and the results of the lookup operation can be summarized in this chart:
Even for such a simple operation, we can see that it’s required more time to perform the operation for wrapper classes.
In case of more complicated operations like summation, multiplication or division, the difference in speed might skyrocket.
Why do people still use primitive types in Java?
#1
In Joshua Bloch’s Effective Java, Item 5: “Avoid creating unnecessary objects”, he posts the following code example:
public static void main(String[] args) {
Long sum = 0L; // uses Long, not long
for (long i = 0; i <= Integer.MAX_VALUE; i++) {
sum += i;
}
System.out.println(sum);
}
and it takes 43 seconds to run. Taking the Long into the primitive brings it down to 6.8 seconds… If that’s any indication why we use primitives.
#2
The lack of native value equality is also a concern (.equals()
is fairly verbose compared to ==
)
class Biziclop { public static void main(String[] args) {
System.out.println(new Integer(5) == new Integer(5)); //1
System.out.println(new Integer(500) == new Integer(500)); //2 System.out.println(Integer.valueOf(5) == Integer.valueOf(5)); //3
System.out.println(Integer.valueOf(500) == Integer.valueOf(500)); //4
}
}
Results in:
false
false
true
false
Why does (3) return true
and (4) return false
?
Because they are two different objects.
- The 256 integers closest to zero [-128; 127] are cached by the JVM, so they return the same object for those.
- Beyond that range, though, they aren’t cached, so a new object is created.
- To make things more complicated, the JLS demands that at least 256 flyweights be cached.
- JVM implementers may add more if they desire, meaning this could run on a system where the nearest 1024 are cached and all of them return true… #awkward
#3
Autounboxing can lead to hard to spot NPEs
Integer in = null;
...
...
int i = in; // NPE at runtime
#4
Boxed types have poorer performance and require more memory.
#5
Can you really imagine a
for (int i=0; i<10000; i++) {
do something
}
loop with java.lang.Integer instead? A java.lang.Integer is immutable, so each increment round the loop would create a new java object on the heap, rather than just increment the int on the stack with a single JVM instruction. The performance would be diabolical.
I would really disagree that it’s much mode convenient to use java.lang.Integer than int.
- On the contrary. Autoboxing means that you can use int where you would otherwise be forced to use Integer, and the java compiler takes care of inserting the code to create the new Integer object for you.
- Autoboxing is all about allowing you to use an int where an Integer is expected, with the compiler inserting the relevant object construction.
It in no way removes or reduces the need for the int in the first place. With autoboxing you get the best of both worlds. You get an Integer created for you automatically when you need a heap based java object, and you get the speed and efficiency of an int when you are just doing arithmetic and local calculations.
#6
Besides performance and memory issues, I’d like to come up with another issue: The List
interface would be broken without int
.
The problem is the overloaded remove()
method (remove(int)
vs. remove(Object)
). remove(Integer)
would always resolve to calling the latter, so you could not remove an element by index.
On the other hand, there is a pitfall when trying to add and remove an int
:
final int i = 42;
final List<Integer> list = new ArrayList<Integer>();
list.add(i); // add(Object)
list.remove(i); // remove(int) - Ouch!
Use
int
when possible, and useInteger
when needed. Sinceint
is a primitive, it will be faster.Modern JVMs know how to optimize
Integer
s using auto-boxing, but if you're writing performance critical code,int
is the way to go..So, use
int
whenever possible (I will repeat myself: if you're writing performance critical code). If a method requires an Integer, use that instead.If you don’t care about performance and want to do everything in an object oriented fashion, use
Integer
.