PikoPong
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux
No Result
View All Result
PikoPong
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux
No Result
View All Result
PikoPong
No Result
View All Result
Home AWS

Amazon EKS Now Supports EC2 Inf1 Instances : idk.dev

June 16, 2020
in AWS
289 3
Amazon EKS Now Supports EC2 Inf1 Instances : idk.dev


Amazon Elastic Kubernetes Service (EKS) has quickly become a leading choice for machine learning workloads. It combines the developer agility and the scalability of Kubernetes, with the wide selection of Amazon Elastic Compute Cloud (EC2) instance types available on AWS, such as the C5, P3, and G4 families.

As models become more sophisticated, hardware acceleration is increasingly required to deliver fast predictions at high throughput. Today, we’re very happy to announce that AWS customers can now use the Amazon EC2 Inf1 instances on Amazon Elastic Kubernetes Service, for high performance and the lowest prediction cost in the cloud.

A primer on EC2 Inf1 instances
Inf1 instances were launched at AWS re:Invent 2019. They are powered by AWS Inferentia, a custom chip built from the ground up by AWS to accelerate machine learning inference workloads.

Inf1 instances are available in multiple sizes, with 1, 4, or 16 AWS Inferentia chips, with up to 100 Gbps network bandwidth and up to 19 Gbps EBS bandwidth. An AWS Inferentia chip contains four NeuronCores. Each one implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, saving I/O time in the process. When several AWS Inferentia chips are available on an Inf1 instance, you can partition a model across them and store it entirely in cache memory. Alternatively, to serve multi-model predictions from a single Inf1 instance, you can partition the NeuronCores of an AWS Inferentia chip across several models.

Compiling Models for EC2 Inf1 Instances
To run machine learning models on Inf1 instances, you need to compile them to a hardware-optimized representation using the AWS Neuron SDK. All tools are readily available on the AWS Deep Learning AMI, and you can also install them on your own instances. You’ll find instructions in the Deep Learning AMI documentation, as well as tutorials for TensorFlow, PyTorch, and Apache MXNet in the AWS Neuron SDK repository.

In the demo below, I will show you how to deploy a Neuron-optimized model on an EKS cluster of Inf1 instances, and how to serve predictions with TensorFlow Serving. The model in question is BERT, a state of the art model for natural language processing tasks. This is a huge model with hundreds of millions of parameters, making it a great candidate for hardware acceleration.

Building an EKS Cluster of EC2 Inf1 Instances
First of all, let’s build a cluster with two inf1.2xlarge instances. I can easily do this with eksctl, the command-line tool to provision and manage EKS clusters. You can find installation instructions in the EKS documentation.

Here is the configuration file for my cluster. Eksctl detects that I’m launching a node group with an Inf1 instance type, and will start your worker nodes using the EKS-optimized Accelerated AMI.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: cluster-inf1
  region: us-west-2
nodeGroups:
  - name: ng1-public
    instanceType: inf1.2xlarge
    minSize: 0
    maxSize: 3
    desiredCapacity: 2
    ssh:
      allow: true

Then, I use eksctl to create the cluster. This process will take approximately 10 minutes.

$ eksctl create cluster -f inf1-cluster.yaml

Eksctl automatically installs the Neuron device plugin in your cluster. This plugin advertises Neuron devices to the Kubernetes scheduler, which can be requested by containers in a deployment spec. I can check with kubectl that the device plug-in container is running fine on both Inf1 instances.

$ kubectl get pods -n kube-system
NAME                                  READY STATUS  RESTARTS AGE
aws-node-tl5xv                        1/1   Running 0        14h
aws-node-wk6qm                        1/1   Running 0        14h
coredns-86d5cbb4bd-4fxrh              1/1   Running 0        14h
coredns-86d5cbb4bd-sts7g              1/1   Running 0        14h
kube-proxy-7px8d                      1/1   Running 0        14h
kube-proxy-zqvtc                      1/1   Running 0        14h
neuron-device-plugin-daemonset-888j4  1/1   Running 0        14h
neuron-device-plugin-daemonset-tq9kc  1/1   Running 0        14h

Next, I define AWS credentials in a Kubernetes secret. They will allow me to grab my BERT model stored in S3. Please note that both keys needs to be base64-encoded.

apiVersion: v1 
kind: Secret 
metadata: 
  name: aws-s3-secret 
type: Opaque 
data: 
  AWS_ACCESS_KEY_ID: <base64-encoded value> 
  AWS_SECRET_ACCESS_KEY: <base64-encoded value>

Finally, I store these credentials on the cluster.

$ kubectl apply -f secret.yaml

The cluster is correctly set up. Now, let’s build an application container storing a Neuron-enabled version of TensorFlow Serving.

Building an Application Container for TensorFlow Serving
The Dockerfile is very simple. We start from an Amazon Linux 2 base image. Then, we install the AWS CLI, and the TensorFlow Serving package available in the Neuron repository.

FROM amazonlinux:2
RUN yum install -y awscli
RUN echo $'[neuron] n
name=Neuron YUM Repository n
baseurl=https://yum.repos.neuron.amazonaws.com n
enabled=1' > /etc/yum.repos.d/neuron.repo
RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
RUN yum install -y tensorflow-model-server-neuron

I build the image, create an Amazon Elastic Container Registry repository, and push the image to it.

$ docker build . -f Dockerfile -t tensorflow-model-server-neuron
$ docker tag IMAGE_NAME 123456789012.dkr.ecr.us-west-2.amazonaws.com/inf1-demo
$ aws ecr create-repository --repository-name inf1-demo
$ docker push 123456789012.dkr.ecr.us-west-2.amazonaws.com/inf1-demo

Our application container is ready. Now, let’s define a Kubernetes service that will use this container to serve BERT predictions. I’m using a model that has already been compiled with the Neuron SDK. You can compile your own using the instructions available in the Neuron SDK repository.

Deploying BERT as a Kubernetes Service
The deployment manages two containers: the Neuron runtime container, and my application container. The Neuron runtime runs as a sidecar container image, and is used to interact with the AWS Inferentia chips. At startup, the latter configures the AWS CLI with the appropriate security credentials. Then, it fetches the BERT model from S3. Finally, it launches TensorFlow Serving, loading the BERT model and waiting for prediction requests. For this purpose, the HTTP and grpc ports are open. Here is the full manifest.

kind: Service
apiVersion: v1
metadata:
  name: eks-neuron-test
  labels:
    app: eks-neuron-test
spec:
  ports:
  - name: http-tf-serving
    port: 8500
    targetPort: 8500
  - name: grpc-tf-serving
    port: 9000
    targetPort: 9000
  selector:
    app: eks-neuron-test
    role: master
  type: ClusterIP
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: eks-neuron-test
  labels:
    app: eks-neuron-test
    role: master
spec:
  replicas: 2
  selector:
    matchLabels:
      app: eks-neuron-test
      role: master
  template:
    metadata:
      labels:
        app: eks-neuron-test
        role: master
    spec:
      volumes:
        - name: sock
          emptyDir: {}
      containers:
      - name: eks-neuron-test
        image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/inf1-demo:latest
        command: ["/bin/sh","-c"]
        args:
          - "mkdir ~/.aws/ && 
           echo '[eks-test-profile]' > ~/.aws/credentials && 
           echo AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID >> ~/.aws/credentials && 
           echo AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY >> ~/.aws/credentials; 
           /usr/bin/aws --profile eks-test-profile s3 sync s3://jsimon-inf1-demo/bert /tmp/bert && 
           /usr/local/bin/tensorflow_model_server_neuron --port=9000 --rest_api_port=8500 --model_name=bert_mrpc_hc_gelus_b4_l24_0926_02 --model_base_path=/tmp/bert/"
        ports:
        - containerPort: 8500
        - containerPort: 9000
        imagePullPolicy: Always
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              key: AWS_ACCESS_KEY_ID
              name: aws-s3-secret
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: AWS_SECRET_ACCESS_KEY
              name: aws-s3-secret
        - name: NEURON_RTD_ADDRESS
          value: unix:/sock/neuron.sock

        resources:
          limits:
            cpu: 4
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 1Gi
        volumeMounts:
          - name: sock
            mountPath: /sock

      - name: neuron-rtd
        image: 790709498068.dkr.ecr.us-west-2.amazonaws.com/neuron-rtd:1.0.6905.0
        securityContext:
          capabilities:
            add:
            - SYS_ADMIN
            - IPC_LOCK

        volumeMounts:
          - name: sock
            mountPath: /sock
        resources:
          limits:
            hugepages-2Mi: 256Mi
            aws.amazon.com/neuron: 1
          requests:
            memory: 1024Mi

I use kubectl to create the service.

$ kubectl create -f bert_service.yml

A few seconds later, the pods are up and running.

$ kubectl get pods
NAME                           READY STATUS  RESTARTS AGE
eks-neuron-test-5d59b55986-7kdml 2/2   Running 0        14h
eks-neuron-test-5d59b55986-gljlq 2/2   Running 0        14h

Finally, I redirect service port 9000 to local port 9000, to let my prediction client connect locally.

$ kubectl port-forward svc/eks-neuron-test 9000:9000 &

Now, everything is ready for prediction, so let’s invoke the model.

Predicting with BERT on EKS and Inf1
The inner workings of BERT are beyond the scope of this post. This particular model expects a sequence of 128 tokens, encoding the words of two sentences we’d like to compare for semantic equivalence.

Here, I’m only interested in measuring prediction latency, so dummy data is fine. I build 100 prediction requests storing a sequence of 128 zeros. I send them to the TensorFlow Serving endpoint via grpc, and I compute the average prediction time.

import numpy as np
import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import time

if __name__ == '__main__':
    channel = grpc.insecure_channel('localhost:9000')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'bert_mrpc_hc_gelus_b4_l24_0926_02'
    i = np.zeros([1, 128], dtype=np.int32)
    request.inputs['input_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
    request.inputs['input_mask'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
    request.inputs['segment_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))

    latencies = []
    for i in range(100):
        start = time.time()
        result = stub.Predict(request)
        latencies.append(time.time() - start)
        print("Inference successful: {}".format(i))
    print ("Ran {} inferences successfully. Latency average = {}".format(len(latencies), np.average(latencies)))

On average, prediction took 5.92ms. As far as BERT goes, this is pretty good!

Ran 100 inferences successfully. Latency average = 0.05920819044113159

In real-life, we would certainly be batching prediction requests in order to increase throughput. If needed, we could also scale to larger Inf1 instances supporting several Inferentia chips, and deliver even more prediction performance at low cost.

Getting Started
Kubernetes users can deploy Amazon Elastic Compute Cloud (EC2) Inf1 instances on Amazon Elastic Kubernetes Service today in the US East (N. Virginia) and US West (Oregon) regions. As Inf1 deployment progresses, you’ll be able to use them with Amazon Elastic Kubernetes Service in more regions.

Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for Amazon Elastic Kubernetes Service, or on the container roadmap on Github.

– Julien



Source link

Share219Tweet137Share55Pin49

Related Posts

Building resilient services at Prime Video with chaos engineering : idk.dev
AWS

Getting started with Travis-CI.com on AWS Graviton2 : idk.dev

AWS Graviton2 processors deliver a major leap in performance and capabilities over first-generation AWS Graviton processors. They power Amazon...

September 24, 2020
Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda : idk.dev
AWS

Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda : idk.dev

When you want to optimize your Java application on AWS Lambda for performance and cost the general steps are:...

September 23, 2020
AWS adds a C++ Prometheus Exporter to OpenTelemetry : idk.dev
AWS

AWS adds a C++ Prometheus Exporter to OpenTelemetry : idk.dev

In this post, two AWS interns—Cunjun Wang and Eric Hsueh—describe their first engineering contributions to the popular open source...

September 23, 2020
Architecture Patterns for Red Hat OpenShift on AWS : idk.dev
AWS

Architecture Patterns for Red Hat OpenShift on AWS : idk.dev

Editor’s note: Although this blog post and its accompanying code make use of the word “Master,” Red Hat is...

September 22, 2020
Next Post
An Open Standard For JavaScript Functions — Smashing Magazine

Better Reducers With Immer — Smashing Magazine

Introducing the serverless LAMP stack – part 2 relational databases : idk.dev

Introducing the serverless LAMP stack – part 2 relational databases : idk.dev

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

A Beautiful and Minimal Open Source Mobile OS

A Beautiful and Minimal Open Source Mobile OS

November 10, 2020
Amazon Transcribe Now Supports Automatic Language Identification : idk.dev

Amazon Transcribe Now Supports Automatic Language Identification : idk.dev

September 16, 2020
Building a Self-Service, Secure, & Continually Compliant Environment on AWS : idk.dev

Building a Self-Service, Secure, & Continually Compliant Environment on AWS : idk.dev

July 28, 2020
OpenShift 4 on AWS Quick Starts : idk.dev

OpenShift 4 on AWS Quick Starts : idk.dev

September 3, 2020

Categories

  • AWS
  • Big Data
  • Database
  • DevOps
  • IoT
  • Linux
  • Web Dev
No Result
View All Result
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In