PikoPong
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux
No Result
View All Result
PikoPong
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux
No Result
View All Result
PikoPong
No Result
View All Result
Home AWS

Amazon Translate now supports Office documents : idk.dev

July 30, 2020
in AWS
289 3
Amazon Translate now supports Office documents : idk.dev


Whether your organization is a multinational enterprise present in many countries, or a small startup hungry for global success, translating your content to local languages may be an enduring challenge. Indeed, text data often comes in many formats, and processing them may require several different tools. Also, as all these tools may not support the same language pairs, you may have to convert certain documents to intermediate formats, or even resort to manual translation. All these issues add extra cost, and create unnecessary complexity in building consistent and automated translation workflows.

Amazon Translate aims at solving these problems in a simple and cost effective fashion. Using either the AWS console or a single API call, Amazon Translate makes it easy for AWS customers to quickly and accurately translate text in 55 different languages and variants.

Earlier this year, Amazon Translate introduced batch translation for plain text and HTML documents. Today, I’m very happy to announce that batch translation now also supports Office documents, namely .docx, .xlsx and .pptx files as defined by the Office Open XML standard.

Introducing Amazon Translate for Office Documents
The process is extremely simple. As you would expect, source documents have to be stored in an Amazon Simple Storage Service (S3) bucket. Please note that no document may be larger than 20 Megabytes, or have more than 1 million characters.

Each batch translation job processes a single file type and a single source language. Thus, we recommend that you organize your documents in a logical fashion in S3, storing each file type and each language under its own prefix.

Then, using either the AWS console or the StartTextTranslationJob API in one of the AWS language SDKs, you can launch a translation job, passing:

  • the input and output location in S3,
  • the file type,
  • the source and target languages.

Once the job is complete, you can collect translated files at the output location.

Let’s do a quick demo!

Translating Office Documents
Using the S3 console, I first upload a few .docx documents to one of my buckets.

S3 files

Then, moving to the Translate console, I create a new batch translation job, giving it a name, and selecting both the source and target languages.

Creating a batch job

Then, I define the location of my documents in S3, and their format, .docx in this case. Optionally, I could apply a custom terminology, to make sure specific words are translated exactly the way that I want.

Likewise, I define the output location for translated files. Please make sure that this path exists, as Translate will not create it for you.

Creating a batch job

Finally, I set the AWS Identity and Access Management (IAM) role, giving my Translate job the appropriate permissions to access S3. Here, I use an existing role that I created previously, and you can also let Translate create one for you. Then, I click on ‘Create job’ to launch the batch job.

Creating a batch job

The job starts immediately.

Batch job running

A little while later, the job is complete. All three documents have been translated successfully.

Viewing a completed job

Translated files are available at the output location, as visible in the S3 console.

Viewing translated files

Downloading one of the translated files, I can open it and compare it to the original version.

Comparing files

For small scale use, it’s extremely easy to use the AWS console to translate Office files. Of course, you can also use the Translate API to build automated workflows.

Automating Batch Translation
In a previous post, we showed you how to automate batch translation with an AWS Lambda function. You could expand on this example, and add language detection with Amazon Comprehend. For instance, here’s how you could combine the DetectDominantLanguage API with the Python-docx open source library to detect the language of .docx files.

import boto3, docx
from docx import Document

document = Document('blog_post.docx')
text = document.paragraphs[0].text
comprehend = boto3.client('comprehend')
response = comprehend.detect_dominant_language(Text=text)
top_language = response['Languages'][0]
code = top_language['LanguageCode']
score = top_language['Score']
print("%s, %f" % (code,score))

Pretty simple! You could also detect the type of each file based on its extension, and move it to the proper input location in S3. Then, you could schedule a Lambda function with CloudWatch Events to periodically translate files, and send a notification by email. Of course, you could use AWS Step Functions to build more elaborate workflows. Your imagination is the limit!

Getting Started
You can start translating Office documents today in the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (London), Europe (Frankfurt), and Asia Pacific (Seoul).

If you’ve never tried Amazon Translate, did you know that the free tier offers 2 million characters per month for the first 12 months, starting from your first translation request?

Give it a try, and let us know what you think. We’re looking forward to your feedback: please post it to the AWS Forum for Amazon Translate, or send it to your usual AWS support contacts.

– Julien



Source link

Share219Tweet137Share55Pin49

Related Posts

Building resilient services at Prime Video with chaos engineering : idk.dev
AWS

Getting started with Travis-CI.com on AWS Graviton2 : idk.dev

AWS Graviton2 processors deliver a major leap in performance and capabilities over first-generation AWS Graviton processors. They power Amazon...

September 24, 2020
Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda : idk.dev
AWS

Monitoring the Java Virtual Machine Garbage Collection on AWS Lambda : idk.dev

When you want to optimize your Java application on AWS Lambda for performance and cost the general steps are:...

September 23, 2020
AWS adds a C++ Prometheus Exporter to OpenTelemetry : idk.dev
AWS

AWS adds a C++ Prometheus Exporter to OpenTelemetry : idk.dev

In this post, two AWS interns—Cunjun Wang and Eric Hsueh—describe their first engineering contributions to the popular open source...

September 23, 2020
Architecture Patterns for Red Hat OpenShift on AWS : idk.dev
AWS

Architecture Patterns for Red Hat OpenShift on AWS : idk.dev

Editor’s note: Although this blog post and its accompanying code make use of the word “Master,” Red Hat is...

September 22, 2020
Next Post
How To Space And Kern Punctuation Marks And Other Symbols — Smashing Magazine

The Renaissance Of No-Code For Web Designers — Smashing Magazine

Getting the Most Out of Variable Fonts on Google Fonts

Getting the Most Out of Variable Fonts on Google Fonts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Introducing Amazon Honeycode – Build Web & Mobile Apps Without Writing Code : idk.dev

Introducing Amazon Honeycode – Build Web & Mobile Apps Without Writing Code : idk.dev

June 25, 2020
How to Clear Apt Cache on Ubuntu and Free Crucial Disk Space

How to Clear Apt Cache on Ubuntu and Free Crucial Disk Space

October 6, 2020
What’s New in the Well-Architected Operational Excellence Pillar? : idk.dev

What’s New in the Well-Architected Operational Excellence Pillar? : idk.dev

July 11, 2020
Deploying UiPath RPA Software on AWS : idk.dev

Deploying UiPath RPA Software on AWS : idk.dev

September 4, 2020

Categories

  • AWS
  • Big Data
  • Database
  • DevOps
  • IoT
  • Linux
  • Web Dev
No Result
View All Result
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In