Glacier Select Command Currently in Preview Mode

Glacier is the file archival solution provided by AWS. AWS S3 provides lifecycle policies that can be used to automatically archive S3 objects to Glacier. Glacier has advantages over other archival solutions in that the archive can be retrieved from Glacier within a few minutes.

In 2017, a new capability - Glacier Select - has been added to Glacier that allows querying the archive using SQL statements and retrieve only necessary data. Previously, the whole archive would need to be downloaded before any data can be accessed. Now, Glacier can be considered to be part of a data lake and soon will be queried using AWS Athena.

Glacier Select use the SelectParameters parameter as part of the query to retrieve the necessary data.

Below is an example of a lambda function that can be used to query data from Glacier.

            const AWS = require('aws-sdk');
            const glacier = new AWS.Glacier();
            var queryCsv = {
                "Type": "select", 
                "ArchiveId": "ID",
                "Tier": "Expedited",
                "SelectParameters": {
                    "InputSerialization": {"csv": {}},
                    "ExpressionType": "SQL",
                    "Expression": "SELECT * FROM archive WHERE _5='498960'",
                    "OutputSerialization": {
                        "csv": {}
                "OutputLocation": {
                    "S3": {"BucketName": "glacier-select-output", "Prefix": "1"}
            exports.handler = (event, context, callback) => {
                // partial retrieval archive from glacier            
                glacier.initiateJob({ vaultName : "myDemoVault", jobParameters : queryCsv }, function (err, data) {
                    if (!err) {