Usage

Since Apache Hdfs is installed in high-availability mode, an Apache Zookeeper cluster is required to coordinate the active/passive namenode.

Install the Stackable Zookeeper operator and an Apache Zookeeper cluster like this:

helm install zookeeper-operator stackable/zookeeper-operator
cat <<EOF | kubectl apply -f -
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  version: 3.5.8-stackable0.7.0
  servers:
    roleGroups:
      default:
        replicas: 3
        config: {}
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-znode
spec:
  clusterRef:
    name: simple-zk
    namespace: default
EOF

Once a Zookeeper cluster and the operator are up and running, you can create an Apache HDFS cluster like shown below. Please note that the version you need to specify is not only the version of Hadoop which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our image registry. It should generally be safe to simply use the latest image version that is available.

cat <<EOF | kubectl apply -f -
apiVersion: hdfs.stackable.tech/v1alpha1
kind: HdfsCluster
metadata:
  name: simple
spec:
  version: 3.3.3-stackable0.1.0
  zookeeperConfigMapName: simple-znode
  dfsReplication: 3
  log4j: |-
    # Define some default values that can be overridden by system properties
    hadoop.root.logger=INFO,console
    hadoop.log.dir=.
    hadoop.log.file=hadoop.log
    # Define the root logger to the system property "hadoop.root.logger".
    log4j.rootLogger=${hadoop.root.logger}, EventCounter
    # Logging Threshold
    log4j.threshold=ALL
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
  nameNodes:
    roleGroups:
      default:
        selector:
          matchLabels:
            kubernetes.io/os: linux
        replicas: 2
  dataNodes:
    roleGroups:
      default:
        selector:
          matchLabels:
            kubernetes.io/os: linux
        replicas: 3
  journalNodes:
    roleGroups:
      default:
        selector:
          matchLabels:
            kubernetes.io/os: linux
        replicas: 3
EOF

When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once.

Monitoring

The managed HDFS instances are automatically configured to export Prometheus metrics. See Monitoring for more details.

Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Overriding certain properties can lead to faulty clusters. In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.

Configuration Properties

For a role or role group, at the same level of config, you can specify configOverrides for the hdfs-site.xml and core-site.xml. For example, if you want to set additional properties on the namenode servers, adapt the nameNodes section of the cluster resource like so:

nameNodes:
  roleGroups:
    default:
      config: [...]
      configOverrides:
        core-site.xml:
          fs.trash.interval: "5"
        hdfs-site.xml:
          dfs.namenode.num.checkpoints.retained: "3"
      replicas: 2

Just as for the config, it is possible to specify this at role level as well:

nameNodes:
  configOverrides:
    core-site.xml:
      fs.trash.interval: "5"
    hdfs-site.xml:
      dfs.namenode.num.checkpoints.retained: "3"
  roleGroups:
    default:
      config: [...]
      replicas: 2

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.

For a full list of configuration options we refer to the Apache Hdfs documentation for hdfs-site.xml and core-site.xml

Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

nameNodes:
  roleGroups:
    default:
      config: {}
      envOverrides:
        MY_ENV_VAR: "MY_VALUE"
      replicas: 1

or per role:

nameNodes:
  envOverrides:
    MY_ENV_VAR: "MY_VALUE"
  roleGroups:
    default:
      config: {}
      replicas: 1

Some environment variables will be overriden by the operator and cannot be set manually by the user. These are HADOOP_HOME, HADOOP_CONF_DIR, POD_NAME and ZOOKEEPER.

Storage for data volumes

You can mount volumes where data is stored by specifiying PersistentVolumeClaims for each individual role group:

dataNodes:
  roleGroups:
    default:
      config:
        resources:
          storage:
            data:
              capacity: 128Gi

In the above example, all data nodes in the default group will store data (the location of dfs.datanode.name.dir) on a 128Gi volume.

By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a 1Gi large local volume mount for the data location.

Resource Requests

Stackable operators handle resource requests in a sligtly different manner than Kubernetes. Resource requests are defined on role or group level. See Roles and role groups for details on these concepts. On a role level this means that e.g. all workers will use the same resource requests and limits. This can be further specified on role group level (which takes priority to the role level) to apply different resources.

This is an example on how to specify CPU and memory resources using the Stackable Custom Resources:

---
apiVersion: example.stackable.tech/v1alpha1
kind: ExampleCluster
metadata:
  name: example
spec:
  workers: # role-level
    config:
      resources:
        cpu:
          min: 300m
          max: 600m
        memory:
          limit: 3Gi
    roleGroups: # role-group-level
      resources-from-role: # role-group 1
        replicas: 1
      resources-from-role-group: # role-group 2
        replicas: 1
        config:
          resources:
            cpu:
              min: 400m
              max: 800m
            memory:
              limit: 4Gi

In this case, the role group resources-from-role will inherit the resources specified on the role level. Resulting in a maximum of 3Gi memory and 600m CPU resources.

The role group resources-from-role-group has maximum of 4Gi memory and 800m CPU resources (which overrides the role CPU resources).

For Java products the actual used Heap memory is lower than the specified memory limit due to other processes in the Container requiring memory to run as well. Currently, 80% of the specified memory limits is passed to the JVM.

For memory only a limit can be specified, which will be set as memory request and limit in the Container. This is to always guarantee a Container the full amount memory during Kubernetes scheduling.

If no resource requests are configured explicitly, the HDFS operator uses the following defaults:

dataNodes:
  roleGroups:
    default:
      config:
        resources:
          cpu:
            max: '4'
            min: '500m'
          storage:
            data:
              capacity: 2Gi