Amazon S3 Integration

Abstract

You can upload content directly from Paligo to Amazon S3. Set up the Paligo to Amazon S3 integration so that Paligo can connect to Amazon S3, and then you can publish.

paligo-to-awss3-small.jpg

Paligo has Continuous Integration (CI) support for Amazon Web Services S3 (Amazon S3). This means you can create content in Paligo, such as PDFs or an HTML help center, and publish it to Amazon S3 so that it is instantly live to your end users.

When you publish to Amazon S3, your Paligo content is uploaded to an S3 bucket as a zip file. You can use a lambda function to automatically unzip the file into another bucket, and then you can use the unzipped content in your workflow or publish it through Amazon S3.

Before you can publish from Paligo to Amazon S3, you need:

  • An Amazon S3 account

  • Basic Amazon S3 knowledge and skills, including how to create an S3 bucket, set permissions.

  • A bucket in Amazon S3. Paligo will upload your published files to the bucket. The output is uploaded as a zip file.

To set up Paligo to publish to Amazon S3:

This will allow Paligo to publish your output as a zip file to your chosen AWS S3 bucket. You can also set up Amazon S3 so that it unzips the file automatically.

To set up the Paligo Amazon S3 integration, you need to have an:

  • Amazon Web Services account with read and write access to the S3 service

  • S3 bucket to receive the zipped content that Paligo will upload when you publish.

    Paligo can upload content directly to the root directory of the bucket or to a folder inside it.

When you have a bucket set up, you can connect Paligo to Amazon S3:

  1. Log in to Paligo via a user account that has administrator permissions.

  2. Select your profile name in the upper right corner to display a menu, and then select Settings.

  3. In the Settings view, select the Integrations tab.

  4. Find the Amazon S3 settings and select Add.

    Note

    Add is only available the first time you set up an integration. After that, Add is replaced by Change.

    Paligo displays the Amazon S3 integration settings.

    Paligo to Amazon Web Services S3 integration settings. They include AWS region, AWS key, AWS secret, AWS bucket name, and destination folder.
  5. Select the AWS region. This is the geographical location of the data center for your Amazon Web Services.

    To find out more, see https://docs.aws.amazon.com/general/latest/gr/rande-manage.html.

  6. Enter the AWS key and the AWS secret. These are the security access keys for your AWS account.

    For information on how to find the access key and the secret key, see https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html.

  7. Enter the name of the AWS bucket that is going to receive the published Paligo content. The content is uploaded to this bucket as a zip file.

  8. In the AWS destination folder field, enter the directory path for the folder that you want Paligo to upload the zip file to. This folder is inside the AWS bucket.

    For example:

    Folder 1/Folder 2/

    To upload the zip file directly to the root of the bucket, leave the AWS destination field empty.

  9. Check the Use non-unique file name box if you want Paligo to use a consistent file name for the zip file. Each time Paligo publishes content via the integration, the zip file name will be the same.

    If you want Paligo to generate different names for the zip file each time, clear the checkbox.

  10. Select Save.

  11. Select the Change button for the Amazon S3 integration.

  12. Select Test Settings to run a connection test. If the settings are correct, Paligo will show a green success notification.

    If the connection test fails, Paligo shows a red failure notification. Check each connection setting carefully, to make sure you have not made any mistakes and try again. If the connection test continues to fail, contact Paligo support for assistance.

When the connection is made, you can publish content from Paligo to the Amazon S3 bucket you specified in the integration settings.

When you have set up the Paligo Amazon S3 integration, you can publish content from Paligo to Amazon S3. The process is very similar to "regular" publishing. You create your publication and topics, and set up a layout for the type of output you want, such as PDF, HTML5, etc. Then you choose the publication settings and Paligo creates a zip file that contains your output content. The zip file is downloaded in your browser, and for Amazon S3, it is also uploaded to your chosen S3 bucket.

Note

You can set up a lambda function in Amazon S3 to automatically unzip the contents to another bucket.

To publish to Amazon S3, the integration settings need to be in place so that Paligo can connect to Amazon Web Services. When those are in place, and you have a publication and layout set up to create the output you want, you can publish to Amazon S3:

  1. In Paligo, select the options menu ( ... ) for the publication that you want to publish, and then select Publish.

    Publish document dialog showing settings for different output types, languages, profiling attributes, variables, and upload output.
  2. On the Publish document dialog, select the type of output you want, for example, HTML5.

  3. Choose the layout that you want to use for publishing. The settings in the layout are applied when Paligo generates the output.

  4. Choose the Languages to publish to. If you do not have any translations, you can only select the original/source language.

  5. If you have set filters (Profiling attributes) on topics or elements, and/or have used variables, choose which values to use for the publication. See Filters (Profiling) and Variables to learn how to use these features.

  6. In the Upload output section, check the Upload to Amazon S3 box. By default, Paligo will upload the output to the bucket and folder that are specified in the Paligo to Amazon S3 integration settings.

    Upload output settings. There are settings for Upload to GitHub, Upload via FTP, Upload to Bitbucket. and Upload to AWS S3. Upload to AWS S3 is selected.

    You can publish to a different bucket and/or folder if required. Select the Edit icon next to Upload to Amazon S3, and then select the bucket and/or folder on the Edit dialog.

    AWS S3 settings for a single publication. There are options for AWS branch name and Destination Folder.

    The settings you choose will only be used for this individual publishing process. Any future publishing will revert back to using the repository and folder that are defined in the integration settings.

  7. Select Publish document.

    Paligo generates the output, applying the settings from the layout and the Publication document dialog in the process. When the output is ready to use, it is downloaded in your browser as a zip file. The zip file is also uploaded to your chosen bucket and folder in Amazon S3.

Note

This content is designed for developers who understand Amazon S3 and know how to create lambda functions, create buckets, and set up permissions, IAM roles etc.

When you publish from Paligo to Amazon S3, Paligo creates a zip file that contains your content. The zip file is uploaded to a bucket in S3. You then have the choice to unzip the file manually, or you can use a lambda function to unzip the file automatically and place the unzipped content in another bucket. The unzipped content can then be used in your workflow or you can use AWS for hosting, with the content publicly available through a URL.

To unzip your content automatically, set up:

  • Another S3 bucket and it has to have public permissions (the default is non public).

    So you now have one S3 bucket to receive the Paligo files (we will call this the "zipped" bucket) and one S3 bucket for the unzipped files (the "unzipped" bucket).

  • A lambda function. When Paligo uploads a file to the "zipped" bucket, the lambda function should be triggered automatically. The function unzips the file from Paligo and places resulting files in the "unzipped" bucket.

Here is an example of a lambda function:

import json
import urllib.parse
import boto3
import zipfile
import mimetypes
import os
import re
from io import BytesIO


def unpack(event, context):    
    # debug the event    
    # print("Received event: " + json.dumps(event, indent=2))    
    
    # Get the bucket name from the event
    bucket=event['Records'][0]['s3']['bucket']['name'] 
   
    # Get the key name from the event
    key=urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')    

    try:
        # initialize s3 client, this is dependent upon your aws config being done
        s3=boto3.client('s3', use_ssl=False)

        # load the object
        bucketobj=s3.get_object(Bucket=bucket,Key=key)['Body'].read()        

        # the zip function needs a file object
        Fileobj=zipfile.ZipFile(BytesIO(bucketobj), 'r')

        filecount=0

        for name in Fileobj.namelist(): 
            path=re.match('^(.*?\/)out\/(.*?)$', name)
            if (path is None):                
                continue

            outkey=path[1] + path[2]
            handle=BytesIO(Fileobj.open(name, 'r').read())
            mimetype=mimetypes.guess_type(name)

            s3.upload_fileobj(
                handle,
                Bucket=os.environ['BUCKETNAME_OUTPUT'],
                Key=outkey,
                ExtraArgs={'ContentType': str(mimetype[0])}
            )

            filecount = filecount + 1

        return print('Uploaded {} files to {}'.format(filecount, os.environ['BUCKETNAME_OUTPUT']))

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}.'.format(key, bucket))
        raise e

Note

Note that this lambda function looks for content in the "Out" folder in the zip file that Paligo provides. Only files in the "Out" folder are added to the public-facing unzipped bucket.