The Problem: Spam in the Google Form
TL;DR: the solution? Recaptcha plus Apps Script.
The Overflow Project had a simple need: a contact form at the bottom of the homepage. I wanted to keep things as simple as possible though; so instead of hooking up a CGI form handler to stash responses into a database or to cast them off into our email, I looked at leveraging Google Forms. As it turns out, if you copy the form action URL and get the correct field names (they’re something like entry.432312
) you can gin up the form on your own page and send the results over with an XHR.
This works great. Each submission gets dropped into a spreadsheet, and you can set up notifications and use the spreadsheet for immediately tracking contacts with folks. It’s simple and lightweight.
Unfortunately it’s also a plain-old webform that any spambot worth it’s salt can leverage. 5000 rows of “#1 rolex nfL j3rsey” later, it became clear that we needed some kind of a solution.
Enter Recaptcha
Recaptcha is really simple. Embed a little blob of javascript and some html and the recaptcha form is presented. You can add a callback to get the “response”, a code generated by the recaptcha process that can be used on the server to verify that the captcha was passed successfully. Over on the contact form side, I added a new required field, and then inserted it into my page as an input type=”hidden”
. On successful captcha; stick the response value in there and then enable the submit button. As an advantage, we’ll never even see any response that doesn’t bother to fill out that hidden field.
Now, on to validation. The recaptcha response needs to be validated on the server side, so we need to take the response they gave us and pass it off to a validation URL. If it’s good, we get back a JSON blob with { “success”: true }
.
Oh, Hello Apps Script
So now we’ve got big blob of recaptcha coming down with each request and filling up rows in our spreadsheet. How to validate it?
The start Apps Script code for a spreadsheet gets you started - it contains a method that iterates over ever row of the spreadsheet. A quick look in the docs and we find external APIs are supported, complete with JSON parsing. From there, it’s pretty easy to write something that iterates over every row, looks at the recaptcha response, and either deletes the row or keeps it:
|
|
The first time this is executed it prompts for permissions to access the spreadsheet and to make the URL fetch on your behalf, but after that you can just execute it from the script manager and it’ll delete rows with invalid recaptcha responses in the response field and keep rows that have no response or a valid one. (One note about the validation - each recaptcha is only valid once, for obvious reasons.)
Making It Magic
Now we can go to our Script Manager and execute this script to clean up all our submissions that are invalid. But that’s hardly magical. Instead, Apps Script has the ability be executed on installable Google Apps triggers, and Google Sheets has one for form submission.
(If I’d noticed this earlier, I’d probably have written the script to validate one form submission at a time, but since it was already working I decided it was probably fine to re-execute across the whole form for every submission - we don’t ever expect to have that many entries in total.)
With the installable trigger set to run on every submission, this now works like a champ. New entries appear in the sheet and are swiftly validated or deleted.
The Last Word
Apps Script seems terrifyingly powerful. I’m frightened to see what twisted workflows folks have built with it - the StackOverflow questions about libraries and the memories of my first job building apps out of Access, Word, and VBA send shivers down my spine.