An Escaper that converts literal text into a format safe for inclusion in a particular
context (such as an XML document). Typically (but not always), the inverse process of
"unescaping" the text is performed automatically by the relevant parser.
For example, an XML escaper would convert the literal string "Foo<Bar>" into
"Foo<Bar>" to prevent "<Bar>" from being confused with an XML tag. When the
resulting XML document is parsed, the parser API will return this text as the original literal
string "Foo<Bar>".
As there are important reasons, including potential security issues, to handle Unicode
correctly if you are considering implementing a new escaper you should favor using UnicodeEscaper
wherever possible.
A UnicodeEscaper instance is required to be stateless, and safe when used concurrently
by multiple threads.
Several popular escapers are defined as constants in the class CharEscapers. To create
your own escapers extend this class and implement the #escape(int) method.
Returns the Unicode code point of the character at the given index.
Unlike Character#codePointAt(CharSequence, int) or String#codePointAt(int)
this method will never fail silently when encountering an invalid surrogate pair.
The behaviour of this method is as follows:
If index >= end, IndexOutOfBoundsException is thrown.
If the character at the specified index is not a surrogate, it is returned.
If the first character was a high surrogate value, then an attempt is made to read the
next character.
If the end of the sequence was reached, the negated value of the trailing high
surrogate is returned.
If the next character was a valid low surrogate, the code point value of the
high/low surrogate pair is returned.
If the next character was not a low surrogate value, then IllegalArgumentException is thrown.
If the first character was a low surrogate value, IllegalArgumentException is
thrown.
the Unicode code point for the given index or the negated value of the trailing high
surrogate character at the end of the sequence
Constructors
UnicodeEscaper()
publicUnicodeEscaper()
Methods
escape(int cp)
protectedabstractchar[]escape(intcp)
Returns the escaped form of the given Unicode code point, or null if this code point
does not need to be escaped. When called as part of an escaping operation, the given code point
is guaranteed to be in the range 0 <= cp <= Character#MAX_CODE_POINT.
If an empty array is returned, this effectively strips the input character from the
resulting text.
If the character does not need to be escaped, this method should return null, rather
than an array containing the character representation of the code point. This enables the
escaping algorithm to perform more efficiently.
If the implementation of this method cannot correctly handle a particular code point then it
should either throw an appropriate runtime exception or return a suitable replacement
character. It must never silently discard invalid input as this may constitute a security risk.
the replacement characters, or null if no escaping was needed
escape(String string)
publicabstractStringescape(Stringstring)
Returns the escaped form of a given literal string.
If you are escaping input in arbitrary successive chunks, then it is not generally safe to
use this method. If an input string ends with an unmatched high surrogate character, then this
method will throw IllegalArgumentException. You should ensure your input is valid UTF-16 before calling this method.
Returns the escaped form of a given literal string, starting at the given index. This method is
called by the #escape(String) method when it discovers that escaping is required. It is
protected to allow subclasses to override the fastpath escaping function to inline their
escaping test.
This method is not reentrant and may only be invoked by the top level #escape(String) method.
Scans a sub-sequence of characters from a given CharSequence, returning the index of
the next character that requires escaping.
Note: When implementing an escaper, it is a good idea to override this method for
efficiency. The base class implementation determines successive Unicode code points and invokes
#escape(int) for each of them. If the semantics of your escaper are such that code
points in the supplementary range are either all escaped or all unescaped, this method can be
implemented more efficiently using CharSequence#charAt(int).
Note however that if your escaper does not escape characters in the supplementary range, you
should either continue to validate the correctness of any surrogate characters encountered or
provide a clear warning to users that your escaper does not validate its input.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Class UnicodeEscaper (2.0.0)\n\nVersion latestkeyboard_arrow_down\n\n- [2.0.0 (latest)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.47.1](/java/docs/reference/google-http-client/1.47.1/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.46.3](/java/docs/reference/google-http-client/1.46.3/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.45.3](/java/docs/reference/google-http-client/1.45.3/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.44.2](/java/docs/reference/google-http-client/1.44.2/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.43.2](/java/docs/reference/google-http-client/1.43.2/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.42.3](/java/docs/reference/google-http-client/1.42.3/com.google.api.client.util.escape.UnicodeEscaper)\n- [1.41.8](/java/docs/reference/google-http-client/1.41.8/com.google.api.client.util.escape.UnicodeEscaper) \n\n public abstract class UnicodeEscaper extends Escaper\n\nAn [Escaper](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.Escaper) that converts literal text into a format safe for inclusion in a particular\ncontext (such as an XML document). Typically (but not always), the inverse process of\n\"unescaping\" the text is performed automatically by the relevant parser.\n\nFor example, an XML escaper would convert the literal string `\"Foo\u003cBar\u003e\"` into `\n\"Foo\u003cBar\u003e\"` to prevent `\"\u003cBar\u003e\"` from being confused with an XML tag. When the\nresulting XML document is parsed, the parser API will return this text as the original literal\nstring `\"Foo\u003cBar\u003e\"`.\n\nAs there are important reasons, including potential security issues, to handle Unicode\ncorrectly if you are considering implementing a new escaper you should favor using UnicodeEscaper\nwherever possible.\n\nA `UnicodeEscaper` instance is required to be stateless, and safe when used concurrently\nby multiple threads.\n\nSeveral popular escapers are defined as constants in the class [CharEscapers](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.CharEscapers). To create\nyour own escapers extend this class and implement the [#escape(int)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.UnicodeEscaper#com_google_api_client_util_escape_UnicodeEscaper_escape_int_) method. \n\nInheritance\n-----------\n\n[java.lang.Object](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html) \\\u003e [Escaper](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.Escaper) \\\u003e UnicodeEscaper \n\nInherited Members\n-----------------\n\n[Escaper.escape(String)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.Escaper#com_google_api_client_util_escape_Escaper_escape_java_lang_String_) \n[Object.clone()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#clone--) \n[Object.equals(Object)](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#equals-java.lang.Object-) \n[Object.finalize()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#finalize--) \n[Object.getClass()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#getClass--) \n[Object.hashCode()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--) \n[Object.notify()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#notify--) \n[Object.notifyAll()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#notifyAll--) \n[Object.toString()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#toString--) \n[Object.wait()](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#wait--) \n[Object.wait(long)](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#wait-long-) \n[Object.wait(long,int)](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#wait-long-int-)\n\nStatic Methods\n--------------\n\n### codePointAt(CharSequence seq, int index, int end)\n\n protected static int codePointAt(CharSequence seq, int index, int end)\n\nReturns the Unicode code point of the character at the given index.\n\nUnlike Character#codePointAt(CharSequence, int) or String#codePointAt(int)\nthis method will never fail silently when encountering an invalid surrogate pair.\n\nThe behaviour of this method is as follows:\n\n1. If `index \u003e= end`, IndexOutOfBoundsException is thrown.\n2. **If the character at the specified index is not a surrogate, it is returned.**\n3. If the first character was a high surrogate value, then an attempt is made to read the next character.\n 1. **If the end of the sequence was reached, the negated value of the trailing high\n surrogate is returned.**\n 2. **If the next character was a valid low surrogate, the code point value of the\n high/low surrogate pair is returned.**\n 3. If the next character was not a low surrogate value, then IllegalArgumentException is thrown.\n4. If the first character was a low surrogate value, IllegalArgumentException is thrown.\n\nConstructors\n------------\n\n### UnicodeEscaper()\n\n public UnicodeEscaper()\n\nMethods\n-------\n\n### escape(int cp)\n\n protected abstract char[] escape(int cp)\n\nReturns the escaped form of the given Unicode code point, or `null` if this code point\ndoes not need to be escaped. When called as part of an escaping operation, the given code point\nis guaranteed to be in the range `0 \u003c= cp \u003c= Character#MAX_CODE_POINT`.\n\nIf an empty array is returned, this effectively strips the input character from the\nresulting text.\n\nIf the character does not need to be escaped, this method should return `null`, rather\nthan an array containing the character representation of the code point. This enables the\nescaping algorithm to perform more efficiently.\n\nIf the implementation of this method cannot correctly handle a particular code point then it\nshould either throw an appropriate runtime exception or return a suitable replacement\ncharacter. It must never silently discard invalid input as this may constitute a security risk.\n\n### escape(String string)\n\n public abstract String escape(String string)\n\nReturns the escaped form of a given literal string.\n\nIf you are escaping input in arbitrary successive chunks, then it is not generally safe to\nuse this method. If an input string ends with an unmatched high surrogate character, then this\nmethod will throw IllegalArgumentException. You should ensure your input is valid [UTF-16](http://en.wikipedia.org/wiki/UTF-16) before calling this method.\n\n**Overrides** \n[Escaper.escape(String string)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.Escaper#com_google_api_client_util_escape_Escaper_escape_java_lang_String_)\n\n### escapeSlow(String s, int index)\n\n protected final String escapeSlow(String s, int index)\n\nReturns the escaped form of a given literal string, starting at the given index. This method is\ncalled by the [#escape(String)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.UnicodeEscaper#com_google_api_client_util_escape_UnicodeEscaper_escape_java_lang_String_) method when it discovers that escaping is required. It is\nprotected to allow subclasses to override the fastpath escaping function to inline their\nescaping test.\n\nThis method is not reentrant and may only be invoked by the top level [#escape(String)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.UnicodeEscaper#com_google_api_client_util_escape_UnicodeEscaper_escape_java_lang_String_) method.\n\n### nextEscapeIndex(CharSequence csq, int start, int end)\n\n protected abstract int nextEscapeIndex(CharSequence csq, int start, int end)\n\nScans a sub-sequence of characters from a given [CharSequence](https://docs.oracle.com/javase/8/docs/api/java/lang/CharSequence.html), returning the index of\nthe next character that requires escaping.\n\n**Note:** When implementing an escaper, it is a good idea to override this method for\nefficiency. The base class implementation determines successive Unicode code points and invokes\n[#escape(int)](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.UnicodeEscaper#com_google_api_client_util_escape_UnicodeEscaper_escape_int_) for each of them. If the semantics of your escaper are such that code\npoints in the supplementary range are either all escaped or all unescaped, this method can be\nimplemented more efficiently using CharSequence#charAt(int).\n\nNote however that if your escaper does not escape characters in the supplementary range, you\nshould either continue to validate the correctness of any surrogate characters encountered or\nprovide a clear warning to users that your escaper does not validate its input.\n\nSee [PercentEscaper](/java/docs/reference/google-http-client/latest/com.google.api.client.util.escape.PercentEscaper) for an example."]]