Home > Blogs > The Java classpath, shading, and Amazon EMR

The Java classpath, shading, and Amazon EMR

Sunday, July 24, 2016

The Java classpath allows dependencies to be brought in from various locations at runtime. However, if an environment already defines a classpath, it can override classpath resources specified downstream, as only one version of a particular named class can be loaded by the Java classloader at the same time. This can cause difficulties working in an environment with provided libraries, such as Amazon EMR’s ability to run an arbitrary JAR as a job.

Brief Demo of the Classpath

For explanation’s sake, take a simple class, versionone/Version.java, containing a single method which prints out a version string:

public class Version {
        public static void printVersion() {
                System.out.println("Version one of the Version class");
        }
}

Next, a second version is forked and bumped, as versiontwo/Version.java:

public class Version {
        public static void printVersion() {
                System.out.println("Version two of the Version class");
        }
}

Finally, a third class, VersionTest.java, will call the Version class to print its version:

public class VersionTest {
        public static void main(String[] args) {
                Version.printVersion();
        }
}

Compiling the Version classes is straightforward - they have no dependencies, and as such can be compiled by being passed as the lone argument to javac. Compiling the VersionTest class, however, requires one of the Version classes to be on the classpath. Of note is that, because the class signature is the same, it does not matter which version of Version is compiled against.

[root@app01 ~]$ javac -cp versionone VersionTest.java

Things become slightly less straightforward when running the VersionTest class. When run with the classpath pointed at the versionone Version class, VersionTest behaves as expected:

[root@app01 ~]$ java -cp versionone:. VersionTest
Version one of the Version class

However, because the VersionTest class file itself does not actually contain any of Version’s code, swapping out versionone for versiontwo on the classpath makes VersionTest use version two of the class, despite having been compiled against version one:

[root@app01 ~]$ java -cp versiontwo:. VersionTest
Version two of the Version class

While this is all well and good, what happens when both versionone and versiontwo are on the classpath at the same time? Java’s default classloader evaluates resources on the classpath in the order they’re provided, so which one is loaded depends on the order in which they’re specified:

[root@app01 ~]$ java -cp versionone:versiontwo:. VersionTest
Version one of the Version class
[root@app01 ~]$ java -cp versiontwo:versionone:. VersionTest
Version two of the Version class

Solutions and Shading

One possible way to overcome this is by renaming the VersionTest class, perhaps from Version to VersionTwo. However, this would break compatibility with anything depending on Version when it comes time to upgrade. Likewise, managing version names through package declarations would fall short for much the same reason.

Enter shading. Shading is, at its simplest, the process of programmatically renaming classes during compilation. Because of the issue described above, however, not only the code of the implementing class, but also the signature of the implemented class, are renamed by the shading process. A number of tools implement this pattern, likely with the most popular being the Maven Shade Plugin. The following Maven plugin declaration for VersionTest would rename the Version class to VersionTwo, and update all references in the VersionTest class:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.4.3</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <relocations>
                    <relocation>
                        <pattern>Version</pattern>
                        <shadedPattern>VersionTwo</shadedPattern>
                    </relocation>
                </relocations>
            </configuration>
        </execution>
    </executions>
</plugin>

Then, even with both versions on the classpath, only the bundled, shaded version of the class will be loaded at runtime and executed. Note that the pattern and shadedPattern parameters are not limited to individual class names - entire package hierarchies can be shaded at once using this mechanism.

Practical Example: Guava and Amazon EMR

At the time of this writing, the latest version of Amazon EMR (4.7.1) puts version 11 of Google Guava on the classpath of any running JAR step. Guava 11 was released nearly four years ago, and is missing a number of new methods on existing classes in its latest version, Guava 19. Even if a dependency on a more recent Guava is included as a JAR step, Guava 11 will still be used when running the application, potentially causing unexpected behavior due to a mix of old and new classes being in use, or, at worst, for the JAR to error out with a NoSuchMethodError.

In this case, the Maven POM snippet to shade for a particular local version of Guava might look like:

<plugin>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.4.3</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <relocations>
                    <relocation>
                        <pattern>com.google</pattern>
                        <shadedPattern>shade.com.google</shadedPattern>
                    </relocation>
                </relocations>
            </configuration>
        </execution>
    </executions>
</plugin>

Under the hood, all references to the local Guava libraries underneath com.google will be rewritten to shade.com.google instead, ensuring that the correct dependencies are pulled in at runtime.

Extra Credit: sbt

While the Maven Shade Plugin likely serves as a better universal example for this blog post, this issue was initially observed in an [sbt]http://www.scala-sbt.org/) project. Luckily, versions 0.14.0 and up of the sbt-assembly plugin for sbt support shading. The equivalent directive to the Maven POM example above for an sbt project’s build.sbt looks like:

assemblyShadeRules in assembly := Seq(
    ShadeRule.rename("com.google.**" -> "shade.com.google.@1").inAll
)

…making sure to adjust the project’s plugins file accordingly:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")