Monday, 13 January 2020

Bulk save emails from Gmail - overview

The following Google Apps Script is designed to bulk save emails (and associated attachments) from a specified Gmail label into a Google Drive folder, with relevant threads combined into a single PDF document. The tool is set to automatically schedule itself by default as the process can take well over an hour depending on the length/number of emails to save.
Screenshot of setup sheet for tool
Screenshot of setup sheet for tool

Welcome sheet
There is a Welcome sheet which requires 2 input items:
  1. The ID of the Google Drive folder where saved emails are to be stored.
  2. The name of the label in Gmail you have applied to your emails that will be saved.
Once the tool is started it will look through the Gmail label and count how many emails there are to be saved and add this number to the Welcome sheet. As it progresses the tool will update how many emails it has saved so you can follow it along.

Config sheet
This is hidden by default and can be used to tweak a few settings:
  1. The maximum number of emails to save - so you could lower this as you first try out the tool.
  2. Allowing for saved emails to have a BEST_Completed label applied so that the tool can track its own progress if you have a large number of emails to save for which it would need to run multiple times.
  3. Set the tool to run on a schedule - this is recommended to be on as a large number of emails to save will reach Google's quota limits. Plus it means you can leave it running whilst you focus on other tasks.
Log sheet
This is used to help with any debugging of the tool should something go wrong.


Download
Bulk save emails from Gmail download (please use 'Overview' > 'Make a copy' for your own version).


THE SCRIPTS
The following section details the specific Functions within the bulk email saving tool to give you a better idea of their tasks. Where possible I have linked to other blog posts that may help to show that particular task in more detail. The code itself can be found in this blog post.

1) initialStart.gs
Calls the ‘getSpreadsheet’ Function to get data from the spreadsheet. Then calls the 'driveFolderPopup’ Function to test if the specified Google Drive folder can be accessed from the given ID within the spreadsheet. Evaluates the Flag status from the ‘driveFolderPopup’ Function so if there are ‘No’ problems it can proceed to run the ‘getEmails’ Function to test that the specified Gmail Label can be reached.
If the ‘getEmails’ Function comes back ‘True’ then it checks there are emails within the label to be saved.
If there are emails to be saved the ‘startPopup’ Function is run to ask the user if they wish to proceed. If the user selects ‘Yes’ then the main ‘saveEmailtoDrive’ Function is called.

If the ‘driveFolderPopup’ Function returns ‘Yes’ it simply logs the result (as the popup Function will inform the user).
If the ‘getEmails’ Function returns zero email threads meaning there are no emails then the ‘noEmailsPopup’ Function is run to inform the user none have been found.


2) saveEmailToDrive.gs
Calls the ‘getSpreadsheet’ Function to get data from the spreadsheet. Gets the current date/time for recording when the script started (to ensure it does not run for longer than defined, later). Calls the ‘getEmails’ Function to gather the necessary emails.
If this is successful and returns a ‘True’ Flag then we check there are emails to be saved (numThreads greater than zero).
If numThreads is true then we set the Emails to save cell in the spreadsheet with the total number of emails found in the label.
We now check if the scheduleTrigger value in the ‘Config’ sheet is set to ‘Yes’ and if so run the ‘getScriptProperty’ Function to look for the ID within Script properties. We then check if this returned value is null.
If ‘getScriptProperty’ returns no value we can run the ‘createSchedule’ Function to create a Trigger to run the script. We set the status message on the spreadsheet to inform a schedule has been created.
If ‘getScriptProperty’ returns a value then we assume a schedule trigger already exists and we do not need to create one.
If the scheduleTrigger value in the ‘Config’ sheet is set to ‘No’ we perform no further actions.

Next we check if a schedule has been created so we only proceed if not - otherwise the script will start within the next 15 minutes as per Google’s schedule creation. So now a schedule has not been created we run the Function ‘runningText’ to update the status message on the spreadsheet to inform the script is now running. We set the runtimeReached variable to a default value of ‘No’ and get the necessary Google Drive folder.
Check if the moveLabels value in the ‘Config’ sheet is set to ‘Yes’ and run Function to ‘createGmailLabel’. Otherwise do nothing.
Now we create a loop to cycle through each of the email threads. Firstly we check that the runtime has not been reached before starting a new thread - so that we can safely exit the script rather than getting caught with a timeout.
The script now gets various aspects of the email and creates a new Google Doc, with which the ‘moveFile’ Function ensures the new Doc is moved from the default Drive root into the destination folder.

We check for attachments and push them into an empty array, ensuring we reset the hasAttachments variable to a default value of ‘false’ before proceeding, so there is no overhang between message threads.
The ‘getMessageDetails’ Function extracts specific information from the emails (eg Date, From, To) and appends them to the Google Doc.

The script then checks the length of the attachments array, if there are any it runs the ‘attachmentFolder’ Function to create a folder for saving the attachments into and returns its ID. We then run the ‘moveFile’ Function to also move the Google Doc into this new folder so that everything is together.
If there are no attachments we set the value of newFolderID to ‘0’ so it can be evaluated later in the script.

Eventually we run the ‘reduceSize’ Function which is designed to remove forwarded aspects of an email chain within the Google Doc, to prevent issues when converting to a PDF (large file sizes will fail to convert and will therefore remain as a Google Doc as a safety net).
Then we run the ‘convertDocToPDF’ Function which also ensures it is moved to the necessary new attachments folder if one has been created.

We check the status of the moveLabels variable again and if ‘Yes’ we add/remove the designated labels to/from the message.

Now we have finished with the message we update the value of the Emails saved cell in the spreadsheet by 1. We then get the current date/time and subtract the start date/time of the script so we can determine how long it has been running for. If the script has been running for longer than our allocated time we update the runtimeReached variable so this will be picked up at the start of the next loop and the whole message-saving process will not proceed.

Once finished looping through all of the threads we run the ‘getEmails’ Function to check if there are anymore. If there are no more emails we proceed with tidying up the Script Properties, deleting the Trigger and running the ‘successText’ Function to update the status message on the spreadsheet to inform the user the task has completed.
If the ‘getEmails’ Function does find messages still exist and that the runtime has been reached then it will not clear any Script Properties or Trigger so that it can all run again within the hour.

If the scheduleCreated variable from earlier in the script was ‘true’ we would have bypassed most of the previous steps and jumped straight towards the end of the script to run the ‘scheduledPopup’ Function which launches a HTML popup informing the user that the process will start automatically within the hour.


3) getEmails.gs
Gets the actual emails from Gmail so they can be processed elsewhere.

Looks for the specified Gmail label given in the spreadsheet (sourceLabel) and gathers the maximum number of specified threads given in the spreadsheet (maxThreads). A Try/Catch is used to set a Flag (labelFlag) to ‘Yes’ or ‘No’ to highlight if there was an issue accessing the label.
If the status of the labelFlag is ‘No’ (meaning there are no problems) then we count the number of email threads and create an array of data to return to the parent Function.
Else if the status of the labelFlag is something else (meaning there was a problem) then we set the previous array variable to ‘false’ and return this to the parent Function.


4) getSpreadsheet.gs
Get relevant details from the spreadsheet to pass to main script. Collates items into an array for returning to parent Function.


5) scriptProperties.gs
getScriptProperty
This function gets the value of a Script Property key-value that has been set.

createScriptProperty
This function is designed to create a Script Property key-value pair which is scoped to this script and hence provides a variable which can be called from anywhere else within it.

deleteScriptProperty
This function is designed to delete a Script Property key-value pair.


6) createSchedule.gs
This function creates the trigger schedule for the message from the given date/time. Uses ClockTriggerBuilder to create the Trigger.
We get the Trigger ID so that it can be passed to the ‘createScriptProperty’ Function and stored as a key-value pair for later use in the script. Finally we return the variable scheduleCreated as ‘true’ as an acknowledgement a Trigger has been created and that the script should wait until it is scheduled to run.
Blog post on creating a schedule.


7) createGmailLabel.gs
This code is designed to check if the ‘BEST_Completed’ label exists within Gmail, otherwise it will create it.

It searches through the users Gmail labels and pushes each name into an empty array. It then loops through the array looking for a match and updates the variable labelExists if one is found to ‘true’. After which it returns the gotten label to the parent Function.
If no match is found labelExists is set to ‘false’ and we create the label in Gmail in the next step.
Blog post on creating a Gmail label.


8) getMessageDetails.gs
Function to get specific message details and append into Doc, with a horizontal line between each message.


9) attachmentFolder.gs
Function to create folder for moving attachments into. Returns new folder ID to parent Function so that saved emails can then be moved into this folder and therefore have everything collated.


10) moveFile.gs
Function moves a file from one location to another by adding it, then removing any previous (Parent) locations so it does not exist in multiple places.


11) reduceSize.gs
Function to reduce size of Google Doc by removing forwarded messages contained within thread - highlighted by '>>' symbols when viewed in a Google Doc. This function is necessary as some threads contain a significant number of forwarded messages that create a large Google Doc which is then unable to be opened to create PDF version in the next Function.


12) convertDocToPDF.gs
Function to convert Google Doc into PDF, move into new folder if attachments exist, remove original Doc.
Blog post on converting Doc to PDF.


13) deleteTrigger.gs
Function deletes the Trigger. It cycles through the project Triggers looking for a TriggerID match which has been passed into the Function.
Blog post on deleting a Trigger.


14) reset.gs
Function to allow user to delete the spreadsheet data, project Trigger and Scope Property by selecting from Sheet menu. Useful if script crashes part-way through and need to reset bits before retrying without needing to go into Apps Script code, etc.


15) openHTML.gs
This script is designed to open the relevant HTML popup - of which there are 2 Functions within it.


logEvent.gs
Function is called when information needs writing to the ‘Log’ sheet that benefits following the progress of the tool and diagnosing user issues. It is called throughout the rest of the scripts.
Blog post on logging user actions.


onOpen.gs
Creates the necessary menu options within the spreadsheet for the user to select from.


popupDriveFolder.gs
Function tests that the Google Drive Folder exists and is reachable from the given ID in the spreadsheet and informs the user if not - by displaying a popup. It returns a folderFlag value of ‘Yes’ or ‘No’ to indicate if there was a problem so the parent Function can action as necessary.


popupGmailLabel.gs
Function displays a popup confirming to the user there was an issue accessing the given Gmail label.


popupNoEmails.html
HTML popup informing the user there are no emails to be saved. It is a HTML popup to prevent the script from pausing and because there is nothing else to do if no emails can be found.


popupScheduled.html
HTML popup informing the user the process will start automatically within the next 1 hour. It is a HTML popup to prevent the script from pausing - as at this point it has ran through and created the schedule, etc and so is providing a courtesy message which the user will see within a few seconds of selecting to run the tool.


popupStart.gs
Function displays an initial confirmation popup that the user wishes to run the tool and displays how many emails in the Gmail label it has found. Returns a value of ‘Yes’ or ‘No’ depending on if the user wishes to proceed, for the parent Function.


popupSuccess.html
HTML popup informing the user all emails have been saved. It is a HTML popup to prevent the script from pausing as the process may take a couple of hours so the user may not see it immediately which would cause the script to timeout and not exit correctly.


runningText.gs
Function to write script running text to 'Welcome' sheet to inform user script is currently running.


successText.gs
Function to write success text to 'Welcome' sheet to inform user script has completed.

4 comments:

  1. Great post!! I wonder, would there be a way to save the emails as pdf without the doc conversion first? I have seen an add on which does it, but I prefer to use scripts that I can tweak myself. This is the add on, just for reference:
    https://workspace.google.com/u/0/marketplace/app/save_emails_and_attachments/513239564707

    ReplyDelete
    Replies
    1. Thank you.

      My experience was that creating the Doc first was more forgiving - as the conversion process can timeout if the Doc is of a significant size. We also needed uneditable files at the end of the process.

      Kind regards
      Phil

      Delete
  2. Hello, I would like to improve this script by saving the formatted text of the mail in order to keep the color, highlight, format.
    Do you think it is feasible ?
    Thanks

    ReplyDelete
    Replies
    1. Hi

      It's been a few years since I've worked on this tool. I do remember exploring alternatives to getPlainBody (https://developers.google.com/apps-script/reference/gmail/gmail-message#getPlainBody()).

      They resulted in a HTML-tag-filled and overly complicated document however that was unreadable/unfriendly. I never got anything workable at the time I'm afraid.

      Kind regards
      Phil

      Delete