String Functions
UPPER(col) convert string to upper case
LOWER(col) convert string to lower case
INITCAP(col) convert the first letter of each word to uppercase and the rest to lowercase
REVERSE(col) reverse the string
SUBSTR(col, startIndex, endIndex) Gets substring of the input string from start to endIndex. Index begins at 0. Set endIndex to -1 to calculate till end of the string
SUBSTRING(col, beginIndex) / SUBSTRING(col, beginIndex, length)
Returns the substring of the input string starting at a 1-based beginIndex. When length is provided, returns a substring of that length.
Usage: SUBSTRING(col, beginIndex) or SUBSTRING(col, beginIndex, length)
Example: SELECT SUBSTRING('hello', 2) FROM myTable returns 'ello'
Example: SELECT SUBSTRING('hello', 2, 3) FROM myTable returns 'ell'
CONCAT(col1, col2, seperator) Concatenate two input strings using the seperator
TRIM(col) trim spaces from both side of the string
TRIM(end, characters, value)
Standard SQL trim function. Trims the specified characters from the specified end of the string. end can be BOTH, LEADING, or TRAILING.
Usage: TRIM('BOTH', 'xy', col)
Example: SELECT TRIM('LEADING', ' ', ' hello ') FROM myTable returns 'hello '
LTRIM(col) trim spaces from left side of the string
RTRIM(col) trim spaces from right side of the string
LENGTH(col) calculate length of the string
levenshtein_distance(string1, string2) Returns the Levenshtein edit distance between two strings
hammingDistance(string1, string2) Returns the Hamming distance between two strings of equal length. Returns -1 if the strings have different lengths.
Usage: hammingDistance(col1, col2)
Example: SELECT hammingDistance('karolin', 'kathrin') FROM myTable returns 3
STRPOS(col, find, N)
Find Nth instance of find string in input. Returns 0 if input string is empty. Returns -1 if the Nth instance is not found or input string is null.
strrpos(col, find) / strrpos(col, find, N)
Returns the index of the last occurrence of find in the input string. When N is provided, returns the Nth occurrence counting from the end of the string.
Usage: strrpos(col, find) or strrpos(col, find, N)
Example: SELECT strrpos('hello world hello', 'hello') FROM myTable
contains(col, substring)
Returns true if the input string contains the specified substring, false otherwise.
Usage: contains(col, substring)
Example: SELECT contains(playerName, 'john') FROM myTable
STARTSWITH(col, prefix)
returns true if columns starts with prefix string.
endsWith(col, suffix)
Returns true if the input string ends with the specified suffix, false otherwise.
Usage: endsWith(col, suffix)
Example: SELECT endsWith(playerName, 'son') FROM myTable
strcmp(string1, string2)
Compares two strings lexicographically. Returns 0 if equal, a value less than 0 if the first string is lexicographically less than the second, and a value greater than 0 if the first string is lexicographically greater.
Usage: strcmp(col1, col2)
Example: SELECT strcmp(name1, name2) FROM myTable
REPLACE(col, find, substitute)
replace all instances of find with replace in input
RPAD(col, size, pad)
string padded from the right side with pad to reach final size
LPAD(col, size, pad)
string padded from the left side with pad to reach final size
leftSubStr(col, length) / left(col, length)
Returns the leftmost length characters from the input string.
Usage: leftSubStr(col, length) or left(col, length)
Example: SELECT leftSubStr('hello', 3) FROM myTable returns 'hel'
rightSubStr(col, length) / right(col, length)
Returns the rightmost length characters from the input string.
Usage: rightSubStr(col, length) or right(col, length)
Example: SELECT rightSubStr('hello', 3) FROM myTable returns 'llo'
repeat(col, times) / repeat(col, separator, times)
Concatenates the input string to itself the specified number of times. When a separator is provided, it is placed between each repetition.
Usage: repeat(col, times) or repeat(col, separator, times)
Example: SELECT repeat('ab', 3) FROM myTable returns 'ababab'
Example: SELECT repeat('ab', ',', 3) FROM myTable returns 'ab,ab,ab'
split(col, delimiter) / split(col, delimiter, limit)
Splits the input string by the specified delimiter and returns an array of strings. Also available as stringToArray. When limit is provided, limits the number of resulting parts.
Usage: split(col, delimiter) or split(col, delimiter, limit)
Example: SELECT split('a,b,c', ',') FROM myTable returns ['a', 'b', 'c']
splitPart(col, delimiter, index) / splitPart(col, delimiter, limit, index)
Splits the input string by the specified delimiter and returns the element at the given index. The index is 0-based and supports negative values to index from the end. Returns "null" if the index is out of bounds.
Usage: splitPart(col, delimiter, index) or splitPart(col, delimiter, limit, index)
Example: SELECT splitPart('a,b,c', ',', 1) FROM myTable returns 'b'
Example: SELECT splitPart('a,b,c', ',', -1) FROM myTable returns 'c'
normalize(col) / normalize(col, form)
Normalizes a Unicode string. Without a form argument, uses NFC normalization. The optional form parameter can be NFC, NFD, NFKC, or NFKD.
Usage: normalize(col) or normalize(col, 'NFC')
Example: SELECT normalize(textCol) FROM myTable
CODEPOINT(col) the Unicode codepoint of the first character of the string
CHR(codepoint) the character corresponding to the Unicode codepoint
fromBytes(bytes, charsetName) Converts a byte array to a string using the specified character set encoding.
Usage: fromBytes(col, 'UTF-8')
Example: SELECT fromBytes(byteCol, 'UTF-8') FROM myTable
toBytes(col, charsetName) Converts a string to a byte array using the specified character set encoding.
Usage: toBytes(col, 'UTF-8')
Example: SELECT toBytes(stringCol, 'UTF-8') FROM myTable
fromUtf8(bytes) Converts a UTF-8 encoded byte array to a string.
Usage: fromUtf8(col)
Example: SELECT fromUtf8(byteCol) FROM myTable
fromAscii(bytes) Converts an ASCII encoded byte array to a string.
Usage: fromAscii(col)
Example: SELECT fromAscii(byteCol) FROM myTable
toAscii(col) Converts a string to an ASCII encoded byte array.
Usage: toAscii(col)
Example: SELECT toAscii(stringCol) FROM myTable
toBase64(bytes) Encodes binary data (byte array) to a Base64 encoded string.
Usage: toBase64(col)
Example: SELECT toBase64(byteCol) FROM myTable
regexpExtract(value, regexp) Extracts values that match the provided regular expression
regexpReplace(input, matchRegexp, replaceRegexp, matchStartPos, occurrence, flag) Find and replace a string or regexp pattern with a target string or regexp pattern
remove(input, search) removes all instances of search from string
toUUIDBytes(uuidString) Converts a UUID string to a 16-byte array representation. Returns null if the input is not a valid UUID.
Usage: toUUIDBytes(col)
Example: SELECT toUUIDBytes('550e8400-e29b-41d4-a716-446655440000') FROM myTable
fromUUIDBytes(bytes) Converts a 16-byte array to a UUID string representation.
Usage: fromUUIDBytes(col)
Example: SELECT fromUUIDBytes(uuidByteCol) FROM myTable
isJson(col)
Returns true if the input string is valid JSON, false otherwise.
Usage: isJson(col)
Example: SELECT isJson('{"key":"value"}') FROM myTable returns true
urlEncoding(string) url-encode a string with UTF-8 format
urlDecoding(string) decode a url to plaintext string
fromBase64(string) decode a Base64-encoded string to bytes represented as a hex string
toUtf8(string) decode a UTF8-encoded string to bytes represented as a hex string
isSubnetOf(ipPrefix, ipAddress) checks if ipAddress is in the subnet of the ipPrefix
Add Prefix, Suffix & Ngram UDFs provides prefix, suffix and ngram functionality for string manipulation
Last updated
Was this helpful?

