Usage
Trino works together with the Apache Hive metastore and S3 bucket.
Prerequisites
-
Deployed Stackable Apache Hive metastore
-
Accessible S3 Bucket
-
Endpoint, access-key and secret-key
-
Data in the Bucket (we use the Iris dataset here)
-
-
Optional deployed Stackable Secret-Operator for certificates when deploying for HTTPS
-
Optional for authorization: Deployed Stackable OPA-Operator
-
Optional Trino CLI to test SQL queries
Installation
In the following we explain or link the required installation steps.
Authentication
We provide user authentication via secret that can be referred in the custom resource:
authentication:
method:
multiUser:
userCredentialsSecret:
namespace: default
name: simple-trino-users-secret
These secrets need to be created manually before startup. The secret may look like the following snippet:
apiVersion: v1
kind: Secret
metadata:
name: simple-trino-users-secret
type: kubernetes.io/opaque
stringData:
admin: $2y$10$89xReovvDLacVzRGpjOyAOONnayOgDAyIS2nW9bs5DJT98q17Dy5i
alice: $2y$10$HcCa4k9v2DRrD/g7e5vEz.Bk.1xg00YTEHOZjPX7oK3KqMSt2xT8W
bob: $2y$10$xVRXtYZnYuQu66SmruijPO8WHFM/UK5QPHTr.Nzf4JMcZSqt3W.2.
The <user>:<password> combinations are provided in the stringData
field. The hashes are created using bcrypt with 10 rounds or more.
htpasswd -nbBC 10 admin admin
Authorization
In order to authorize Trino via OPA, a ConfigMap
containing Rego rules for Trino has to be applied. The following example is an all access Rego rule for testing with the user admin
. Do not use it in production!
apiVersion: v1
kind: ConfigMap
metadata:
name: opa-bundle-trino
labels:
opa.stackable.tech/bundle: "trino"
data:
trino.rego: |
package trino
import future.keywords.in
default allow = false
allow {
is_admin
}
is_admin() {
input.context.identity.user == "admin"
}
Users should write their own rego rules for more complex OPA authorization.
Trino
With the prerequisites fulfilled, the CRD for this operator must be created:
kubectl apply -f /etc/stackable/trino-operator/crd/trinocluster.crd.yaml
Insecure for testing:
Create an insecure single node Trino (v387) cluster for testing. You will access the UI/CLI via http and no user / password or authorization is required. Please adapt the s3
settings with your credentials (check examples/simple-trino-cluster.yaml
for an example setting up Hive and Trino):
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
name: simple-trino
spec:
version: 387-stackable0.1.0
hiveConfigMapName: simple-hive-derby
s3:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: minio-credentials
coordinators:
roleGroups:
default:
selector:
matchLabels:
kubernetes.io/os: linux
replicas: 1
config: {}
workers:
roleGroups:
default:
selector:
matchLabels:
kubernetes.io/os: linux
replicas: 1
config: {}
To access the CLI please execute:
./trino-cli-387-executable.jar --debug --server http://<node>:<http-port> --user=admin
Secure (https) for production:
Create a secure single node Trino (v387) cluster. This will disable the UI access via http and requires username and password from the secret above. Please adapt the s3
settings with your credentials (check examples/simple-trino-cluster-authentication-opa-authorization.yaml
for a full example setting up Hive, OPA, Secrets and Trino):
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
name: simple-trino
spec:
version: 387-stackable0.1.0
hiveConfigMapName: simple-hive-derby
opa:
configMapName: simple-opa
package: trino
s3:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: minio-credentials
authentication:
method:
multiUser:
userCredentialsSecret:
namespace: default
name: simple-trino-users-secret
coordinators:
roleGroups:
default:
selector:
matchLabels:
kubernetes.io/os: linux
replicas: 1
config: {}
workers:
roleGroups:
default:
selector:
matchLabels:
kubernetes.io/os: linux
replicas: 1
config: {}
To access the CLI please download the Trino CLI and execute:
./trino-cli-387-executable.jar --debug --server https://<host>:<https-port> --user=admin --password --insecure
If you use self signed certificates, you also need the --insecure
flag above which can be omitted otherwise.
S3 connection specification
You can specify S3 connection details directly inside the Trino specification
or by referring to an external S3Connection
custom resource.
To specify S3 connection details directly as part of the Trino resource, you add an inline connection configuration as shown below:
s3: (1)
inline:
host: test-minio (2)
port: 9000 (3)
pathStyleAccess: true (4)
secretClass: minio-credentials (5)
1 | Entry point for the connection configuration |
2 | Connection host |
3 | Optional connection port |
4 | Optional flag if path-style URLs should be used; This defaults to false
which means virtual hosted-style URLs are used. |
5 | Name of the Secret object expected to contain the following keys:
accessKey and secretKey |
It is also possible to configure the bucket connection details as a separate Kubernetes resource and only refer to that object from the Trino specification like this:
s3:
reference: my-connection-resource (1)
1 | Name of the connection resource with connection details |
The resource named my-connection-resource
is then defined as shown below:
---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Connection
metadata:
name: my-connection-resource
spec:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: minio-credentials
This has the advantage that the connection configuration can be shared across applications and reduces the cost of updating these details.
Test Trino with Hive and S3
Create a schema and a table for the Iris data located in S3 and query data. This assumes to have the Iris data set in the PARQUET
format available in the S3 bucket which can be downloaded here
Create schema
CREATE SCHEMA IF NOT EXISTS hive.iris
WITH (location = 's3a://iris/');
which should return:
CREATE SCHEMA
Create table
CREATE TABLE IF NOT EXISTS hive.iris.iris_parquet (
sepal_length DOUBLE,
sepal_width DOUBLE,
petal_length DOUBLE,
petal_width DOUBLE,
class VARCHAR
)
WITH (
external_location = 's3a://iris/parq',
format = 'PARQUET'
);
which should return:
CREATE TABLE
Query data
SELECT
sepal_length,
class
FROM hive.iris.iris_parquet
LIMIT 10;
which should return something like this:
sepal_length | class --------------+------------- 5.1 | Iris-setosa 4.9 | Iris-setosa 4.7 | Iris-setosa 4.6 | Iris-setosa 5.0 | Iris-setosa 5.4 | Iris-setosa 4.6 | Iris-setosa 5.0 | Iris-setosa 4.4 | Iris-setosa 4.9 | Iris-setosa (10 rows) Query 20220210_161615_00000_a8nka, FINISHED, 1 node https://172.18.0.5:30299/ui/query.html?20220210_161615_00000_a8nka Splits: 18 total, 18 done (100.00%) CPU Time: 0.7s total, 20 rows/s, 11.3KB/s, 74% active Per Node: 0.3 parallelism, 5 rows/s, 3.02KB/s Parallelism: 0.3 Peak Memory: 0B 2.67 [15 rows, 8.08KB] [5 rows/s, 3.02KB/s]
Monitoring
The managed Trino instances are automatically configured to export Prometheus metrics. See Monitoring for more details.
Configuration & Environment Overrides
The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
Do not override port numbers. This will lead to faulty installations. |
Configuration Properties
For a role or role group, at the same level of config
, you can specify: configOverrides
for:
-
config.properties
-
node.properties
-
log.properties
-
password-authenticator.properties
-
hive.properties
For a list of possible configuration properties consult the Trino Properties Reference.
workers:
roleGroups:
default:
config: {}
replicas: 1
configOverrides:
config.properties:
query.max-memory-per-node: "2GB"
Just as for the config
, it is possible to specify this at role level as well:
workers:
configOverrides:
config.properties:
query.max-memory-per-node: "2GB"
roleGroups:
default:
config: {}
replicas: 1
All override property values must be strings. The properties will be passed on without any escaping or formatting.
Environment Variables
Environment variables can be (over)written by adding the envOverrides
property.
For example per role group:
workers:
roleGroups:
default:
config: {}
replicas: 1
envOverrides:
JAVA_HOME: "path/to/java"
or per role:
workers:
envOverrides:
JAVA_HOME: "path/to/java"
roleGroups:
default:
config: {}
replicas: 1
Here too, overriding properties such as http-server.https.port
will lead to broken installations.