Usage

Requirements

Apache Hive Metastores need a relational database to store their state. We currently support PostgreSQL and Apache Derby (embedded database, not recommended for production). Other databases might work if JDBC drivers are available. Please open an issue if you require support for another database.

S3 Support

Hive supports creating tables in S3 compatible object stores. To use this feature you need to provide connection details for the object store.

Monitoring

The managed Hive instances are automatically configured to export Prometheus metrics. See Monitoring for more details.

Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Overriding certain properties, which are set by the operator (such as the HTTP port) can interfere with the operator and can lead to problems.

Configuration Properties

For a role or role group, at the same level of config, you can specify: configOverrides for the hive-site.xml. For example, if you want to set the datanucleus.connectionPool.maxPoolSize for the metastore to 20 adapt the metastore section of the cluster resource like so:

metastore:
  roleGroups:
    default:
      config: [...]
      configOverrides:
        hive-site.xml:
          datanucleus.connectionPool.maxPoolSize: "20"
      replicas: 1

Just as for the config, it is possible to specify this at role level as well:

metastore:
  configOverrides:
    hive-site.xml:
      datanucleus.connectionPool.maxPoolSize: "20"
  roleGroups:
    default:
      config: [...]
      replicas: 1

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.

For a full list of configuration options we refer to the Hive Configuration Reference.

Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

metastore:
  roleGroups:
    default:
      config: {}
      envOverrides:
        MY_ENV_VAR: "MY_VALUE"
      replicas: 1

or per role:

metastore:
  envOverrides:
    MY_ENV_VAR: "MY_VALUE"
  roleGroups:
    default:
      config: {}
      replicas: 1

Examples

Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our image registry. It should generally be safe to simply use the latest image version that is available.

Create a single node Apache Hive Metastore cluster using Derby:

---
apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
  name: simple-hive-derby
spec:
  version: 3.1.3-stackable0.1.0
  metastore:
    roleGroups:
      default:
        selector:
          matchLabels:
            kubernetes.io/os: linux
        replicas: 1
        config:
          database:
            connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true
            user: APP
            password: mine
            dbType: derby

To create a single node Apache Hive Metastore (v2.3.9) cluster with derby and S3 access, deploy a minio (or use any available S3 bucket):

helm install minio \
    minio \
    --repo https://charts.bitnami.com/bitnami \
    --set auth.rootUser=minio-access-key \
    --set auth.rootPassword=minio-secret-key

In order to upload data to minio we need a port-forward to access the web ui.

kubectl port-forward service/minio 9001

Then, connect to localhost:9001 and login with the user minio-access-key and password minio-secret-key. Create a bucket and upload data.

Deploy the hive cluster:

---
apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
  name: simple-hive-derby
spec:
  version: 3.1.3-stackable0.1.0
  s3:
    inline:
      host: minio
      port: 9000
      accessStyle: Path
      credentials:
        secretClass: simple-hive-s3-secret-class
  metastore:
    roleGroups:
      default:
        selector:
          matchLabels:
            kubernetes.io/os: linux
        replicas: 1
        config:
          database:
            connString: jdbc:derby:;databaseName=/stackable/metastore_db;create=true
            user: APP
            password: mine
            dbType: derby
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: simple-hive-s3-secret-class
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}
---
apiVersion: v1
kind: Secret
metadata:
  name: simple-hive-s3-secret
  labels:
    secrets.stackable.tech/class: simple-hive-s3-secret-class
stringData:
  accessKey: minio-access-key
  secretKey: minio-secret-key

To create a single node Apache Hive Metastore using PostgreSQL, deploy a PostgreSQL instance via helm.

PostgreSQL introduced a new way to encrypt its passwords in version 10. This is called scram-sha-256 and has been the default as of PostgreSQL 14. Unfortunately, Hive up until the latest 3.3.x version ships with JDBC drivers that do not support this method. You might see an error message like this: The authentication type 10 is not supported. If this is the case please either use an older PostgreSQL version or change its password_encryption setting to md5.

This installs PostgreSQL in version 10 to work around the issue mentioned above:

helm install hive bitnami/postgresql --version=10 \
--set postgresqlUsername=hive \
--set postgresqlPassword=hive \
--set postgresqlDatabase=hive

Create Hive Metastore using a PostgreSQL database

apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
  name: simple-hive-postgres
spec:
  version: 3.1.3-stackable0.1.0
  metastore:
    roleGroups:
      default:
        selector:
          matchLabels:
            kubernetes.io/os: linux
        replicas: 1
        config:
          database:
            connString: jdbc:postgresql://hive-postgresql.default.svc.cluster.local:5432/hive
            user: hive
            password: hive
            dbType: postgres