CodeQL library for Java and Kotlin¶
When you’re analyzing a Java/Kotlin program, you can make use of the large collection of classes in the CodeQL library for Java/Kotlin.
About the CodeQL library for Java and Kotlin¶
There is an extensive library for analyzing CodeQL databases extracted from Java/Kotlin projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.
The library is implemented as a set of QL modules, that is, files with the extension .qll. The module java.qll imports all the core Java library modules, so you can include the complete library by beginning your query with:
import java
The rest of this article briefly summarizes the most important classes and predicates provided by this library.
Note
The example queries in this article illustrate the types of results returned by different library classes. The results themselves are not interesting but can be used as the basis for developing a more complex query. The other articles in this section of the help show how you can take a simple query and fine-tune it to find precisely the results you’re interested in.
Summary of the library classes¶
The most important classes in the standard Java/Kotlin library can be grouped into five main categories:
Classes for representing program elements (such as classes and methods)
Classes for representing AST nodes (such as statements and expressions)
Classes for representing metadata (such as annotations and comments)
Classes for computing metrics (such as cyclomatic complexity and coupling)
Classes for navigating the program’s call graph
We will discuss each of these in turn, briefly describing the most important classes for each category.
Program elements¶
These classes represent named program elements: packages (Package), compilation units (CompilationUnit), types (Type), methods (Method), constructors (Constructor), and variables (Variable).
Their common superclass is Element, which provides general member predicates for determining the name of a program element and checking whether two elements are nested inside each other.
It’s often convenient to refer to an element that might either be a method or a constructor; the class Callable, which is a common superclass of Method and Constructor, can be used for this purpose.
Types¶
Class Type has a number of subclasses for representing different kinds of types:
PrimitiveTyperepresents a primitive type, that is, one ofboolean,byte,char,double,float,int,long,short; QL also classifiesvoidand<nulltype>(the type of thenullliteral) as primitive types.RefTyperepresents a reference (that is, non-primitive) type; it in turn has several subclasses:Classrepresents a Java class.Interfacerepresents a Java interface.EnumTyperepresents a Javaenumtype.Arrayrepresents a Java array type.
For example, the following query finds all variables of type int in the program:
import java
from Variable v, PrimitiveType pt
where pt = v.getType() and
pt.hasName("int")
select v
You’re likely to get many results when you run this query because most projects contain many variables of type int.
Reference types are also categorized according to their declaration scope:
TopLevelTyperepresents a reference type declared at the top-level of a compilation unit.NestedTypeis a type declared inside another type.
For instance, this query finds all top-level types whose name is not the same as that of their compilation unit:
import java
from TopLevelType tl
where tl.getName() != tl.getCompilationUnit().getName()
select tl
You will typically see this pattern in the source code of a repository, with many more instances in the files referenced by the source code.
Several more specialized classes are available as well:
TopLevelClassrepresents a class declared at the top-level of a compilation unit.NestedClassrepresents a class declared inside another type, such as:A
LocalClass, which is a class declared inside a method or constructor.An
AnonymousClass, which is an anonymous class.
Finally, the library also has a number of singleton classes that wrap frequently used Java standard library classes: TypeObject, TypeCloneable, TypeRuntime, TypeSerializable, TypeString, TypeSystem and TypeClass. Each CodeQL class represents the standard Java class suggested by its name.
As an example, we can write a query that finds all nested classes that directly extend Object:
import java
from NestedClass nc
where nc.getASupertype() instanceof TypeObject
select nc
You’re likely to get many results when you run this query because many projects include nested classes that extend Object directly.
Generics¶
There are also several subclasses of Type for dealing with generic types.
A GenericType is either a GenericInterface or a GenericClass. It represents a generic type declaration such as interface java.util.Map from the Java standard library:
package java.util.;
public interface Map<K, V> {
int size();
// ...
}
Type parameters, such as K and V in this example, are represented by class TypeVariable.
A parameterized instance of a generic type provides a concrete type to instantiate the type parameter with, as in Map<String, File>. Such a type is represented by a ParameterizedType, which is distinct from the GenericType representing the generic type it was instantiated from. To go from a ParameterizedType to its corresponding GenericType, you can use predicate getSourceDeclaration.
For instance, we could use the following query to find all parameterized instances of java.util.Map:
import java
from GenericInterface map, ParameterizedType pt
where map.hasQualifiedName("java.util", "Map") and
pt.getSourceDeclaration() = map
select pt
In general, generic types may restrict which types a type parameter can be bound to. For instance, a type of maps from strings to numbers could be declared as follows:
class StringToNumMap<N extends Number> implements Map<String, N> {
// ...
}
This means that a parameterized instance of StringToNumberMap can only instantiate type parameter N with type Number or one of its subtypes but not, for example, with File. We say that N is a bounded type parameter, with Number as its upper bound. In QL, a type variable can be queried for its type bound using predicate getATypeBound. The type bounds themselves are represented by class TypeBound, which has a member predicate getType to retrieve the type the variable is bounded by.
As an example, the following query finds all type variables with type bound Number:
import java
from TypeVariable tv, TypeBound tb
where tb = tv.getATypeBound() and
tb.getType().hasQualifiedName("java.lang", "Number")
select tv
For dealing with legacy code that is unaware of generics, every generic type has a “raw” version without any type parameters. In the CodeQL libraries, raw types are represented using class RawType, which has the expected subclasses RawClass and RawInterface. Again, there is a predicate getSourceDeclaration for obtaining the corresponding generic type. As an example, we can find variables of (raw) type Map:
import java
from Variable v, RawType rt
where rt = v.getType() and
rt.getSourceDeclaration().hasQualifiedName("java.util", "Map")
select v
For example, in the following code snippet this query would find m1, but not m2:
Map m1 = new HashMap();
Map<String, String> m2 = new HashMap<String, String>();
Finally, variables can be declared to be of a wildcard type:
Map<? extends Number, ? super Float> m;
The wildcards ? extends Number and ? super Float are represented by class WildcardTypeAccess. Like type parameters, wildcards may have type bounds. Unlike type parameters, wildcards can have upper bounds (as in ? extends Number), and also lower bounds (as in ? super Float). Class WildcardTypeAccess provides member predicates getUpperBound and getLowerBound to retrieve the upper and lower bounds, respectively.
For dealing with generic methods, there are classes GenericMethod, ParameterizedMethod and RawMethod, which are entirely analogous to the like-named classes for representing generic types.
For more information on working with types, see the Types in Java and Kotlin.
Variables¶
Class Variable represents a variable in the Java sense, which is either a member field of a class (whether static or not), or a local variable, or a parameter. Consequently, there are three subclasses catering to these special cases:
Fieldrepresents a Java field.LocalVariableDeclrepresents a local variable.Parameterrepresents a parameter of a method or constructor.
Abstract syntax tree¶
Classes in this category represent abstract syntax tree (AST) nodes, that is, statements (class Stmt) and expressions (class Expr). For a full list of expression and statement types available in the standard QL library, see “Abstract syntax tree classes for working with Java and Kotlin programs.”
Both Expr and Stmt provide member predicates for exploring the abstract syntax tree of a program:
Expr.getAChildExprreturns a sub-expression of a given expression.Stmt.getAChildreturns a statement or expression that is nested directly inside a given statement.Expr.getParentandStmt.getParentreturn the parent node of an AST node.
For example, the following query finds all expressions whose parents are return statements:
import java
from Expr e
where e.getParent() instanceof ReturnStmt
select e
Many projects have examples of return statements with child expressions.
Therefore, if the program contains a return statement return x + y;, this query will return x + y.
As another example, the following query finds statements whose parent is an if statement:
import java
from Stmt s
where s.getParent() instanceof IfStmt
select s
Many projects have examples of if statements with child statements.
This query will find both then branches and else branches of all if statements in the program.
Finally, here is a query that finds method bodies:
import java
from Stmt s
where s.getParent() instanceof Method
select s
As these examples show, the parent node of an expression is not always an expression: it may also be a statement, for example, an IfStmt. Similarly, the parent node of a statement is not always a statement: it may also be a method or a constructor. To capture this, the QL Java library provides two abstract class ExprParent and StmtParent, the former representing any node that may be the parent node of an expression, and the latter any node that may be the parent node of a statement.
For more information on working with AST classes, see the article on overflow-prone comparisons in Java and Kotlin.
Metadata¶
Java/Kotlin programs have several kinds of metadata, in addition to the program code proper. In particular, there are annotations and Javadoc comments. Since this metadata is interesting both for enhancing code analysis and as an analysis subject in its own right, the QL library defines classes for accessing it.
For annotations, class Annotatable is a superclass of all program elements that can be annotated. This includes packages, reference types, fields, methods, constructors, and local variable declarations. For every such element, its predicate getAnAnnotation allows you to retrieve any annotations the element may have. For example, the following query finds all annotations on constructors:
import java
from Constructor c
select c.getAnAnnotation()
You may see examples where annotations are used to suppress warnings or to mark code as deprecated.
These annotations are represented by class Annotation. An annotation is simply an expression whose type is an AnnotationType. For example, you can amend this query so that it only reports deprecated constructors:
import java
from Constructor c, Annotation ann, AnnotationType anntp
where ann = c.getAnAnnotation() and
anntp = ann.getType() and
anntp.hasQualifiedName("java.lang", "Deprecated")
select ann
Only constructors with the @Deprecated annotation are reported this time.
For more information on working with annotations, see the article on annotations.
For Javadoc, class Element has a member predicate getDoc that returns a delegate Documentable object, which can then be queried for its attached Javadoc comments. For example, the following query finds Javadoc comments on private fields:
import java
from Field f, Javadoc jdoc
where f.isPrivate() and
jdoc = f.getDoc().getJavadoc()
select jdoc
You can see this pattern in many projects.
Class Javadoc represents an entire Javadoc comment as a tree of JavadocElement nodes, which can be traversed using member predicates getAChild and getParent. For instance, you could edit the query so that it finds all @author tags in Javadoc comments on private fields:
import java
from Field f, Javadoc jdoc, AuthorTag at
where f.isPrivate() and
jdoc = f.getDoc().getJavadoc() and
at.getParent+() = jdoc
select at
Note
On line 5 we used
getParent+to capture tags that are nested at any depth within the Javadoc comment.
For more information on working with Javadoc, see the article on Javadoc.
Metrics¶
The standard QL Java library provides extensive support for computing metrics on Java program elements. To avoid overburdening the classes representing those elements with too many member predicates related to metric computations, these predicates are made available on delegate classes instead.
Altogether, there are six such classes: MetricElement, MetricPackage, MetricRefType, MetricField, MetricCallable, and MetricStmt. The corresponding element classes each provide a member predicate getMetrics that can be used to obtain an instance of the delegate class, on which metric computations can then be performed.
For example, the following query finds methods with a cyclomatic complexity greater than 40:
import java
from Method m, MetricCallable mc
where mc = m.getMetrics() and
mc.getCyclomaticComplexity() > 40
select m
Most large projects include some methods with a very high cyclomatic complexity. These methods are likely to be difficult to understand and test.
Call graph¶
CodeQL databases generated from Java and Kotlin code bases include precomputed information about the program’s call graph, that is, which methods or constructors a given call may dispatch to at runtime.
The class Callable, introduced above, includes both methods and constructors. Call expressions are abstracted using class Call, which includes method calls, new expressions, and explicit constructor calls using this or super.
We can use predicate Call.getCallee to find out which method or constructor a specific call expression refers to. For example, the following query finds all calls to methods called println:
import java
from Call c, Method m
where m = c.getCallee() and
m.hasName("println")
select c
Conversely, Callable.getAReference returns a Call that refers to it. So we can find methods and constructors that are never called using this query:
import java
from Callable c
where not exists(c.getAReference())
select c
Codebases often have many methods that are not called directly, but this is unlikely to be the whole story. To explore this area further, see “Navigating the call graph.”
For more information about callables and calls, see the article on the call graph.