# URL Functions

This document describes the URL utility functions available as scalar functions. These functions allow you to extract, transform, and manipulate various components of a URL such as the protocol, domain, port, path, query parameters, and more.

### urlProtocol

**Description**: Extracts the protocol (scheme) from the URL.

**Syntax**:

```
urlProtocol(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The protocol (e.g., "http", "https") or null if the URL is invalid.

**Example**:

```
SELECT urlProtocol('https://example.com/path');
// Returns: "https"
```

### urlDomain

**Description**: Extracts the domain from the URL.

**Syntax**:

```
urlDomain(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The full domain (e.g., "[www.example.com](http://www.example.com)") or null if the URL is invalid.

**Example**:

```
SELECT urlDomain('https://www.example.com/path');
// Returns: "www.example.com"
```

### urlDomainWithoutWWW

**Description**: Extracts the domain from the URL while removing a leading "[www](http://www)." if present.

**Syntax**:

```
urlDomainWithoutWWW(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The domain without the "[www](http://www)." prefix (e.g., "example.com") or null if invalid.

**Example**:

```
SELECT urlDomainWithoutWWW('https://www.example.com/path');
// Returns: "example.com"
```

### urlTopLevelDomain

**Description**: Extracts the top-level domain (TLD) from the URL.

**Syntax**:

```
urlTopLevelDomain(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The top-level domain (e.g., "com", "org") or null if invalid.

**Example**:

```
SELECT urlTopLevelDomain('https://example.com/path');
// Returns: "com"
```

### urlFirstSignificantSubdomain

**Description**: Extracts the first significant subdomain from the URL based on standard TLD rules.

**Syntax**:

```
urlFirstSignificantSubdomain(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The first significant subdomain (for example, for "blog.example.com", returns "example") or null if invalid.

**Example**:

```
SELECT urlFirstSignificantSubdomain('https://blog.example.com');
// Returns: "example"
```

*Note: The logic considers common TLDs such as “com”, “net”, “org”, and “co”.*

### cutToFirstSignificantSubdomain

**Description**: Truncates the URL’s domain to include only the first significant subdomain and the top-level domain.

**Syntax**:

```
cutToFirstSignificantSubdomain(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

A truncated domain such as "example.com" (or a deeper hierarchy for non-standard TLDs) or null if invalid.

**Example**:

```
SELECT cutToFirstSignificantSubdomain('https://blog.example.com');
// Returns: "example.com"
```

### cutToFirstSignificantSubdomainWithWWW

**Description**: Similar to cutToFirstSignificantSubdomain but preserves a leading "[www](http://www)." if it is part of the URL.

**Syntax**:

```
cutToFirstSignificantSubdomainWithWWW(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The truncated domain with "[www](http://www)." preserved (if applicable) or null if invalid.

**Example**:

```
SELECT cutToFirstSignificantSubdomainWithWWW('https://www.blog.example.com');
// Returns: "www.example.com"
```

### urlPort

**Description**: Extracts the port number from the URL.

**Syntax**:

```
urlPort(url: String) → int
```

**Parameters**:

• url: URL string.

**Returns**:

The port number, or -1 if no port is specified or if the URL is invalid.

**Example**:

```
SELECT urlPort('https://example.com:8080/path');
// Returns: 8080
```

### urlPath

**Description**: Extracts the path component from the URL (excluding the query string).

**Syntax**:

```
urlPath(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The URL path (e.g., "/path/to/resource") or null if invalid.

**Example**:

```
SELECT urlPath('https://example.com/path?query=1');
// Returns: "/path"
```

### urlPathWithQuery

**Description**: Extracts the path component from the URL. (Note that this function uses the raw path as returned by the URI parser.)

**Syntax**:

```
urlPathWithQuery(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The URL path (which may include query encoding details) or null if invalid.

**Example**:

```
SELECT urlPathWithQuery('https://example.com/path?query=1');
// Returns: "/path"
```

### urlQueryString

**Description**: Extracts the query string from the URL without the leading ? and excluding the fragment.

**Syntax**:

```
urlQueryString(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The query string (e.g., "key=value\&key2=value2"), or null if absent or invalid.

**Example**:

```
SELECT urlQueryString('https://example.com/path?key=value#section');
// Returns: "key=value"
```

### urlFragment

**Description**: Extracts the fragment identifier from the URL (without the # symbol).

**Syntax**:

```
urlFragment(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The fragment (e.g., "section") or null if invalid or not present.

**Example**:

```
SELECT urlFragment('https://example.com/path#section');
// Returns: "section"
```

### urlQueryStringAndFragment

**Description**: Combines the query string and fragment identifier from the URL into a single string.

**Syntax**:

```
urlQueryStringAndFragment(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

A concatenated string such as "key=value#section", or null if neither component is present.

**Example**:

```
SELECT urlQueryStringAndFragment('https://example.com/path?key=value#section');
// Returns: "key=value#section"
```

### extractURLParameter

**Description**: Extracts the value of a specific query parameter from the URL. If multiple parameters with the same name exist, the first occurrence is returned.

**Syntax**:

```
extractURLParameter(url: String, name: String) → String
```

**Parameters**:

• url: URL string.

• name: Name of the parameter to extract.

**Returns**:

The value of the parameter, or an empty string if not found or if the URL is invalid.

**Example**:

```
SELECT extractURLParameter('https://example.com/path?page=1&lr=213', 'page');
// Returns: "1"
```

### extractURLParameters

**Description**: Extracts all query parameters from the URL as an array of name=value pairs.

**Syntax**:

```
extractURLParameters(url: String) → String[]
```

**Parameters**:

• url: URL string.

**Returns**:

An array of query parameters (e.g., \["page=1", "lr=213"]), or an empty array if no parameters exist.

**Example**:

```
SELECT extractURLParameters('https://example.com/path?page=1&lr=213');
// Returns: ["page=1", "lr=213"]
```

### extractURLParameterNames

**Description**: Extracts the names of all query parameters present in the URL.

**Syntax**:

```
extractURLParameterNames(url: String) → String[]
```

**Parameters**:

• url: URL string.

**Returns**:

An array of parameter names (e.g., \["page", "lr"]), or an empty array if no query parameters exist.

**Example**:

```
SELECT extractURLParameterNames('https://example.com/path?page=1&lr=213');
// Returns: ["page", "lr"]
```

### urlHierarchy

**Description**: Generates a hierarchy of URLs truncated at each level of the path. The base URL (scheme and host) is always included.

**Syntax**:

```
urlHierarchy(url: String) → String[]
```

**Parameters**:

• url: URL string.

**Returns**:

An array of URLs representing the hierarchical levels. For example, given the URL "<https://example.com/a/b/c>", it returns:

```
[
  "https://example.com",
  "https://example.com/a",
  "https://example.com/a/b",
  "https://example.com/a/b/c"
]
```

**Example**:

```
SELECT urlHierarchy('https://example.com/a/b/c');
// Returns: ["https://example.com", "https://example.com/a", "https://example.com/a/b", "https://example.com/a/b/c"]
```

### urlPathHierarchy

**Description**: Generates a hierarchy of the path segments from the URL. The protocol and host are excluded, and the root ("/") is not included.

**Syntax**:

```
urlPathHierarchy(url: String) → String[]
```

**Parameters**:

• url: URL string.

**Returns**:

An array of hierarchical path segments. For example, for the URL "<https://example.com/browse/CONV-6788>", it returns:

```
["/browse", "/browse/CONV-6788"]
```

**Example**:

```
SELECT urlPathHierarchy('https://example.com/browse/CONV-6788');
// Returns: ["/browse", "/browse/CONV-6788"]
```

### urlEncode

**Description**: Encodes a URL string into a URL-safe format. Spaces are replaced with +.

**Syntax**:

```
urlEncode(url: String) → String
```

**Parameters**:

• url: URL string to encode.

**Returns**:

The URL-encoded string or null if the input is invalid.

**Example**:

```
SELECT urlEncode('https://example.com/path with space');
// Returns: "https%3A%2F%2Fexample.com%2Fpath+with+space"
```

### urlDecode

**Description**: Decodes a URL-encoded string.

**Syntax**:

```
urlDecode(url: String) → String
```

**Parameters**:

• url: URL-encoded string.

**Returns**:

The decoded URL string or null if the input is invalid.

**Example**:

```
SELECT urlDecode('https%3A%2F%2Fexample.com%2Fpath+with+space');
// Returns: "https://example.com/path with space"
```

### urlEncodeFormComponent

**Description**: Encodes a string as a URL form component following RFC-1866 standards. Spaces are encoded as +.

**Syntax**:

```
urlEncodeFormComponent(url: String) → String
```

**Parameters**:

• url: String to encode.

**Returns**:

The encoded string or null if the input is invalid.

**Example**:

```
SELECT urlEncodeFormComponent('Hello World!');
// Returns: "Hello+World%21"
```

### urlDecodeFormComponent

**Description**: Decodes a URL form component encoded per RFC-1866 (decodes + to a space).

**Syntax**:

```
urlDecodeFormComponent(url: String) → String
```

**Parameters**:

• url: URL-encoded string.

**Returns**:

The decoded string or null if the input is invalid.

**Example**:

```
SELECT urlDecodeFormComponent('Hello+World%21');
// Returns: "Hello World!"
```

### urlNetloc

**Description**: Extracts the network locality from the URL. This includes user information (if any), the host, and the port.

**Syntax**:

```
urlNetloc(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

A string in the format username:password\@host:port (if user info is present) or null if the URL is invalid.

**Example**:

```
SELECT urlNetloc('https://user:pass@example.com:8080/path');
// Returns: "user:pass@example.com:8080"
```

### cutWWW

**Description**: Removes the leading "[www](http://www)." from the URL’s domain.

**Syntax**:

```
cutWWW(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The URL with the "[www](http://www)." prefix removed from the domain. If an error occurs, the original URL is returned.

**Example**:

```
SELECT cutWWW('https://www.example.com/path');
// Returns: "https://example.com/path"
```

### cutQueryString

**Description**: Removes the query string (including the ?) from the URL.

**Syntax**:

```
cutQueryString(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The URL without the query string. If an error occurs, the original URL is returned.

**Example**:

```
SELECT cutQueryString('https://example.com/path?key=value');
// Returns: "https://example.com/path"
```

### cutFragment

**Description**: Removes the fragment identifier (including the #) from the URL.

**Syntax**:

```
cutFragment(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The URL without the fragment. If an error occurs, the original URL is returned.

**Example**:

```
SELECT cutFragment('https://example.com/path#section');
// Returns: "https://example.com/path"
```

### cutQueryStringAndFragment

**Description**: Removes both the query string and the fragment identifier from the URL.

**Syntax**:

```
cutQueryStringAndFragment(url: String) → String
```

**Parameters**:

• url: URL string.

**Returns**:

The URL stripped of both the query string and fragment. If an error occurs, the original URL is returned.

**Example**:

```
SELECT cutQueryStringAndFragment('https://example.com/path?key=value#section');
// Returns: "https://example.com/path"
```

### cutURLParameter

**Description**: Removes a specific query parameter from the URL.

**Syntax**:

```
cutURLParameter(url: String, name: String) → String
```

**Parameters**:

• url: URL string.

• name: The query parameter name to remove.

**Returns**:

The URL with the specified query parameter removed. If an error occurs, the original URL is returned.

**Example**:

```
SELECT cutURLParameter('https://example.com/path?key=value&remove=this', 'remove');
// Returns: "https://example.com/path?key=value"
```

### cutURLParameters

**Description**: Removes multiple query parameters from the URL.

**Syntax**:

```
cutURLParameters(url: String, names: String[]) → String
```

**Parameters**:

• url: URL string.

• names: An array of query parameter names to remove.

**Returns**:

The URL with the specified query parameters removed.

**Example**:

```
SELECT cutURLParameters('https://example.com/path?key=value&remove=this&other=param', ARRAY['remove', 'other']);
// Returns: "https://example.com/path?key=value"
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/functions/url.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
