How does garbage collection work?

When asked about garbage collection, it's essential to start by specifying which garbage collector is being referred to. The concept of a garbage collector is defined in the JVM specification, but the specifics depend on the implementation. A single JVM may contain multiple garbage collectors, and one collector may employ different algorithms under different circumstances. Technically, GC may do nothing. The System.gc() method promises that the garbage collector will make its "best effort" to reclaim memory, which in reality offers no guarantees.

GC (garbage collector) is often jokingly referred to as the reason "Java is slow". It is a necessary price paid for stable, automatic memory management. As such, it is one of the most dynamic and evolving areas in the Java world.

The primary approaches to garbage collection are reference counting and tracing live objects through a mark-and-sweep or copying collection. The first approach struggles with cyclic references, while the second is mainly used in Java.

Most collectors rely on the weak generational hypothesis, which assumes that younger objects die more frequently. For this reason, the heap is divided into regions based on the lifetime of objects—generations. Garbage collection in these regions is performed separately.

The general algorithm used by most collectors is described in numerous articles. Essentially, reachable objects are marked and grouped, while unreachable ones are removed.
GC Roots are the starting points for traversing the object graph for reachability. The set of root objects (root set) is considered reachable unconditionally. It’s common in interviews to be asked to list them.

An important concept for garbage collectors is the Stop The World (STW) pause. This is when all application threads are stopped to safely perform garbage collection and other system operations. It occurs at specific program points called safepoints.

The specific collector used in HotSpot can be specified with a JVM startup parameter. Each collector has many settings unique to it. As of Java 10, HotSpot offers four collectors:

• Serial – Single-threaded, with generations. It provides high throughput (lower total delays);
• Parallel – A multi-threaded version of Serial;
• CMS (Concurrent Mark-Sweep) – Aims for lower latency (shorter individual pauses), by performing some of the collection outside of STW pauses. The trade-off is lower throughput. It operates similarly to previous methods and works with generations. Declared deprecated in Java 9;
• G1 (Garbage First) – Also aims to reduce latency. Instead of generations, it operates on regions;
• Shenandoah – A new collector to be added soon;