Trove bots for all!

21 Jan 2018

I like Twitter bots. Not the evil, spammy, election-rigging type of Twitter bots. I like the experimental, artistic, political bots. The bots that surprise you, that make you laugh, make you think, or start you on a new journey. Most of all, I like the fact that in this ever more monetised world of massive online platforms we can carve out new spaces for expression through the creative use of code.

And anyone can do it.

In some of my undergraduate classes and workshops I ask participants to make simple Twitter bots using the site Cheap Bots Done Quick. Have a look at these four bots that were brought to life in my Random Acts of Meaning workshop at NLS8 last year. Cheap Bots Done Quick is very easy to use, but there’s also plenty of scope for creativity. For an excellent introduction to the possibilities, you should read Shawn Graham’s Programming Historian tutorial, ‘An Introduction to Twitterbots with Tracery’.

But bots can also be used to mobilise cultural heritage collections. Instead of languishing in catalogues and databases, collections can be set loose into spaces where people congregate. They can be given a life of their own, outside of the confines of the corporate website.

3 Aug 1933: 'KANGAROO ISLAND'S ANNIVERSARY FIRST SHIP ARRIVED 97 YEARS AGO' https://t.co/ZqsL3ed3zB
— TroveNewsBot (@TroveNewsBot) January 15, 2018

I created @TroveNewsBot back in 2013 to share Trove’s digitised newspapers. As well as tweeting random articles, @TroveNewsBot responds to queries — tweet some keywords at it and the bot will reply with the most relevant search result. @TroveBot followed soon after to liberate the contents of other Trove zones.

I also shared a simplified version of the @TroveBot code in the Trove Build-a-Bot repository (my kids were a bit obsessed with Build-a-Bear Workshop at the time). I hoped that Trove contributors might use it to create their own collection bots, and some did – @CurtinLibBot and @Kasparbot (from the NMA) have been busily tweeting since July 2013.

But the problem was that while the code itself was easy to configure, you still needed a server somewhere to run it on. That’s a significant hurdle to anyone who just wants to experiment.

Enter Glitch.

Glitch is sort of like a combination of Heroku and GitHub with a bit of JSFiddle thrown in. It lets you collaborate on code like GitHub. But it also runs your applications in the cloud like Heroku. Most importantly, it encourages education and experimentation, making it easy to share your projects in a readily remixable form. (Oh yeah, and it’s free!)

You can create bots (and all sorts of other things!) on Glitch without having to worry about setting up a server. So over the last few weeks I’ve been creating a collection of Trove bot starter kits for everyone to play with.

Trove bot starter kits

Here’s the current line-up:

trove-list-bot: share items from a Trove list (or lists)
trove-tag-bot: share Trove items with a particular tag
trove-collection-bot: share items in a collection
trove-title-bot: share newspaper articles from selected newspapers

Each bot comes with detailed instructions, so I won’t repeat them here. Once you’ve got your authorisation keys from Twitter, the rest is easy-peasy. Just click on one of the links to start.

Jump in and have a go!

Another interesting article! 1 Apr 1939, 'Let's Talk Of Interesting People': https://t.co/hHsRQeHMgH
— AWWbot (@AustWWBot) January 20, 2018

New Trove bots using the starter kits have been popping up just about every day. There’s @AustWWBot, created by @bonniewildie, tweeting out articles from the Australian Women’s Weekly. Perhaps you’d enjoy @CatsofTrove, by @lib_idol, keeping up the internet’s quota of cats by sharing the contents of a Trove list. Then there’s @DoSonTrove and @suthlib sharing items from the Dictionary of Sydney and the Sutherland Library.

Here’s a list of all the Trove twitter bots I know about – please tweet me any additions!

The folks from Glitch have even created a Trove page to share these bots and any other projects using the Trove API.

Hack my bots

There are lots of ways these simple bots could be improved and extended. While it’s great to see people making more bots, I’m also hoping that some will want to take things further and hack my code.

Once again Glitch makes this easy. You can remix projects multiple times (just choose ‘Remix this!’ from the dropdown menu on the project’s title). You can invite others to collaborate on a project. And of course you can share the results of your hacking for others to remix.

Screen capture of Glitch editor

Glitch comes with a browser-based editor that includes syntax highlighting and various other nifty features – try highlighting a word, then hit Cmd-D (or Ctrl-D) for multiple selections. Glitch also saves any edits automatically and relaunches your app so you can quickly pick up problems.

My bots are written in Python, which is a pretty friendly programming language for beginners. The inner workings of the bots are contained in a file called server.py. Just click on it in the Glitch sidebar to open it for editing.

Below you’ll find a few hints and suggestions for hacking your own Trove bots.

Looking behind the curtain

As you experiment, it’s a good idea to have Glitch’s Activity Log open. Just click on the ‘Logs’ button in the sidebar. If you happen to break something you should see an error message pop up in the Activity Log. And don’t worry — breaking things is an important part of learning to code!

You might need to scroll back a bit through the log to find the actual source of an error. Look at the example below. It might seem scary, but all it’s really saying is that there’s a problem with the indentation of your code on line 43 of server.py. Python uses indentation to group lines of code together into blocks – like functions of ‘if’ statements – so it will complain if you get it wrong.

Screen capture of Glitch activity log showing indentation error

You can also send your own messages to the activity log. In Python, this is done using the print command. If you look in server.py file of any of the bots you’ll see the line print message¹. This writes the tweet to the log before actually tweeting it.

The print command can be really useful when you’re trying to track down problems with your code. You might notice there’s also a line in server.py that reads print url. This writes the url that’s being used to retrieve data from the Trove API to the log. So if your code is failing and you suspect it’s got something to do with the data being delivered by Trove, you can grab the url, paste it into a new browser window, and inspect the result.

Here’s an example from the activity log for @TroveTribuneBot, showing the url and formatted tweet.

Screen capture of Glitch activity log showing message

Both url and message are variables — they’re containers that have been assigned values by the code. You can use print to view the value of any variable to make sure it has the value you expect.

Shh, I’m experimenting…

While you’re playing around with your bot’s code you probably want to stop it from trying to tweet. All you need to do is find the line of code that calls the tweet() function and add a # sign at the beginning of the line. So you’ll end up with something like:

# tweet(message)

What does the # do? In Python, a # indicates a comment, so by adding it at the start of the line, we’ve told Python to ignore what follows. Obviously, just delete it (and any spaces you’ve inserted after it) when you’re ready to tweet again. As noted above, the content of the tweet is sent to the app’s activity log before tweeting, so with tweeting disabled, you can experiment as much as you like and still view the results in the log.

Inject some personality

The bots have a rather limited vocabulary. Tweets of random items say something like ‘Another interesting item!’, while new items are announced with ‘Another new item!’. Bor-ing. Why not teach your bot some new phrases?

Open up server.py and look for the prepare_message() function. In Python functions are defined using the def keyword, so just find the line that starts with something likedef prepare_message(item).

Now look for where the message variable is set. In trove-title-bot you’ll see something like:

message = 'Another interesting article! {}: {}'

This provides the basic template for the tweet — the pairs of curly brackets are just placeholders for the item details and url, which are inserted later on. Remember variables are just containers, we can change them whenever we want.

If you’re experimenting with trove-tag-bot or trove-list-bot you’ll see that message is set to one of two values, depending on whether it’s a ‘random’ or ‘new’ tweet:

if message_type == 'new':
    message = 'New item tagged \'{}\'! {}: {}'
elif message_type == 'random':
    message = 'Another Trove item tagged \'{}\'! {}: {}'

You can edit any of these templates to include phrases of your choosing. How about:

message = 'Wacko! More goodness from Trove! {}: {}'

message = 'OMG I didn\'t expect to find this in Trove!  {}: {}'

It’s just a matter of clicking on the text in Glitch and changing it, but there are a few things to be wary of:

Keep the basic structure (including those curly brackets)
Be careful with apostrophes
Beware the Twitter character limit
Extra curly brackets for tags

Keep the basic structure

Don’t change the message = part of the line, and make sure the curly brackets are still there!

message = 'Message goes here in quotes! {}: {}'

Be careful with apostrophes

We’re using single quotes around our message, so if you include a single quote inside the message it will break the code. You can either ‘escape’ the apostrophe using a backslash, or use double quotes around the whole message:

message = 'OMG I didn\'t expect to find this in Trove!  {}: {}'

message = "OMG I didn't expect to find this in Trove! {}: {}"

Beware the Twitter character limit

Keep your new phrase under about 50 characters, that should mean you’re always within the 280 character limit.

Although the code truncates item titles to 200 characters, it doesn’t do any other checking for length. One way of improving the code might be to create a new function that checks the length of a message before tweeting.

Extra curly brackets for tags

The trove-tag-bot template is slightly different — the first set of curly brackets is where the tag itself is inserted. You can change the text around the tag, just make sure you keep the three sets of curly brackets, for example:

message = 'More \'{}\' from Trove to you! {}: {}'

You should also count the tag in your 50 character limit.

Mixing things up

So we’ve seen how easy it is to change your bot’s default message, but it’s still going to be tweeting the same message every time. Let’s mix things up a bit.

This time, instead of manually adding a new phrase into the message variable, we’re going to make a random selection from a list of phrases and automatically add it to our tweet.

Let’s break down the tasks:

Think of some suitable phrases
Create a list containing the phrases
Select one of the phrases at random
Insert the selected phrase into our tweet

Think of some phrases

First of all you need to think of a few suitable phrases – remember to keep them under about 50 characters. For this example, let’s use:

‘Hey check this out!’
‘Trove never ceases to surprise!’
‘Look what I found!’
‘More Trove goodness!’

Create a list

The message variable we met above was a ‘string’ – it just contained text. Other variable types include integers, floats, lists, and dictionaries. In Python a ‘list’ is a variable that contains umm… a list of things. In other programming languages, lists can be called ‘arrays’.

Python lists can contain just about anything – strings, numbers, even other lists. In this case we’re going to create a list of strings. Here’s how to create a list called phrases that contains our examples.

phrases = ['Hey check this out!', 'Trove never ceases to surprise!', 'Look what I found!', 'More Trove goodness!']

As you can see, lists are created by using square brackets, with individual members separated by commas. Here’s some other lists:

[1, 5, 73, 42]
['apple', 'pecan', 36, 'rhubarb', 3.14]

Select a phrase at random

Python includes lots of different modules that extend its core functionality. When you need to use one of these modules, you import it into your code. Near the top of server.py you’ll see a number of import statements, including import random. This makes the random module available in our code to help us do useful randomish stuff.

The random module includes a function called choice() that selects a random item from a list. So choosing one of our phrases at random is as simple as:

selected_phrase = random.choice(phrases)

As you can see, the way we call a function within a module is to use a dot – random.choice() calls the choice() function inside the random module. The choice() function expects a list (or something similar) and so we give it our phrases list. The function gives back one of the phrases which we then store in the selected_phrase variable.

Insert the phrase in our tweet

Let’s go back to the default message for random tweets:

message = 'Another interesting article! {}: {}'

What we want to do is replace ‘Another interesting article!’ with our selected phrase. There are various ways we could do this, but let’s just put in another pair of curly brackets to be the placeholder for our text.

message = '{} {}: {}'

Python strings are not just bits of text, they have lots of cleverness built-in. This includes the format() method that we use to fill in our placeholders.²

In prepare_message() you should see a line something like:

message = message.format(details, item['troveUrl'].replace('ndp/del', 'newspaper'))

This takes the message string and uses the format() method to insert the values details and item['troveUrl'].replace('ndp/del', 'newspaper') in place of the curly brackets.³ Our default template had two placeholders, so we supply two values. The order of the values we give to format match the order of the placeholders they’re replacing – so the details variable replaces the first set of curly brackets.

But now we have three pairs of curly brackets, with the first one corresponding to our selected phrase. All we need to do is add the selected_phrase variable to format():

message = message.format(selected_phrase, details, item['troveUrl'].replace('ndp/del', 'newspaper'))

So now we’re supplying three values to insert in the three template slots.

But what about tags?

As we saw above, the trove-tag-bot messages include the tag itself. Of course you could just hard code the tag name into your phrases, but for something more re-usable just include a placeholder in your phrases. For example:

phrases = ['Look what I found tagged {}!', 'More gold from the {} tag!']

Change the default message as above:

message = '{} {}: {}'

And then call the format() method on your selected phrase to add the tag, before you pass the whole thing on to the message:

message = message.format(selected_phrase.format(TAG), details, item['troveUrl'].replace('ndp/del', 'newspaper'))

The new function

Here’s the complete prepare_message() function from trove-title-bot incorporating all the steps above:

def prepare_message(item):
    # Our list of phrases
    phrases = ['Hey check this out!', 'Trove never ceases to surprise!', 'Look what I found!', 'More Trove goodness!']
    # Select one
    selected_phrase = random.choice(phrases)
    # Placeholder for our phrase
    message = '{} {}: {}'
    details = None
    if item['zone'] == 'article':
        date = arrow.get(item['date'], 'YYYY-MM-DD')
        details = '{}, \'{}\''.format(date.format('D MMM YYYY'), truncate(item['heading'].encode('utf-8'), 200))
    if details:
        # Insert our selected phrase into the tweet
        message = message.format(selected_phrase, details, item['troveUrl'].replace('ndp/del', 'newspaper'))
    else:
        message = None
    return message

This will be a little different if your bot tweets new as well as random items, but the steps are basically the same.

Would you like keywords with that?

All four starter kits work with a particular slice of Trove – a list, a tag, a collection, or a newspaper. But you want want to have more fine-grained control over your bot’s selections. One way of doing that is to add a few keywords to the mix.

For example, @CaddieBrain used the trove-title-bot kit to create @NTTimesGazette, sharing articles from the Northern Territory Times and Gazette. The problem was, while the articles were published in the NT, they weren’t about the NT – @CaddieBrain asked if there was a way of making the results more local.

20 Jul 1878, 'LOCAL COURT, PALMERSTON. Wednesday, 17th July. (Before Mr E. W. Price, S.M.)': https://t.co/fJj8WHTmta #DarwinNT
— Northern Territory Times and Gazette (@NTTimesGazette) January 21, 2018

One way of doing this is to think of some local place names and add them into the query the bot uses to get data from the Trove API. So the steps would be:

Construct a search using the Trove web interface that returns the results you want
Insert our query into the bot’s API url

Construct a search

In this case I’d start with a search limited to the Northern Territory Times and Gazette. Then I’d start adding some place names to the search box – let’s try “Port Darwin” or “Pine Creek”.

Note the double quotes around the names to make Trove treat them as phrases, and the ‘OR’ to show we’d be happy with one or the other or both. You could add as many ‘OR’ clauses as you want.

That seems to work pretty well. So all we need to do is copy the contents of the search box – ie the whole "port darwin" OR "pine creek" bit.

Insert our query

Open up server.py and look for the tweet_random() function. You should see a line that sets the url variable:

url = 'http://api.trove.nla.gov.au/result/?q=+&zone=newspaper&l-category=Article&{}&encoding=json&n=1&s={}&key={}'.format(titles, start, API_KEY)

Look for the q=+ int the url. The q is the query parameter. At the moment we’re not using it, so there’s just a + to indicate a blank query. What we want to do is replace the + with the contents of the Trove search box. Something like:

url = 'http://api.trove.nla.gov.au/result/?q="port darwin" OR "pine creek"&zone=newspaper&l-category=Article&{}&encoding=json&n=1&s={}&key={}'.format(titles, start, API_KEY)

To be extra careful you might want to replace the spaces around OR with plus signs, though I think Python will do this automatically.

url = 'http://api.trove.nla.gov.au/result/?q="port darwin"+OR+"pine creek"&zone=newspaper&l-category=Article&{}&encoding=json&n=1&s={}&key={}'.format(titles, start, API_KEY)

That’s it!

Bonus points for extra reusability

The approach above works perfectly well, but rather than directly editing the url when we want to change our query, it might be better to store it with the other configuration settings in .env.

Just open .env and add a new line:

QUERY="\"port darwin\"+OR+\"pine creek\""

Note that I’m using backslashes to escape the double quotes inside the query.

Settings in the .env file are added to the application’s ‘environment’. To grab the query from the environment we have to add this line to server.py:

QUERY = os.environ.get('QUERY')

This line saves the query to a variable named QUERY.

Now we can use the format() method on the url string to add in our query.

url = 'http://api.trove.nla.gov.au/result/?q={}&zone=newspaper&l-category=Article&{}&encoding=json&n=1&s={}&key={}'.format(QUERY, titles, start, API_KEY)

See how I’ve replace the contents of the q parameter with curly brackets? The format() method will replace them with the contents of QUERY.

The rise of the hybrid bot

If you compare the code of the different bot starter kits you’ll see that they have a lot in common. With a bit of experimenting, you should be able to mix and match various approaches to create hybrid bots.

For example, @follysantidote used the trove-tag-bot kit to make a bot that shared items with the tag ‘queensland’. But @follysantidote wanted to do something a bit different – to only tweet tagged items that came from the Canberra Times. How? The answer was to create a hybrid tag/title bot. @BotCBR_QLD was born!

Another Trove item tagged 'Queensland'! 9 Feb 1988, 'Qld's first female QC': https://t.co/Qpjdvx5WRB
— Canberra on Queensland Bot (@BotCBR_QLD) January 20, 2018

Assuming that the tag bot is up and running, only a couple of minor tweaks are required.

First find the title identifier of the newspaper you’re interested in. See the trove-title-bot documentation for more information on this.

Open up server.py and find the tweet_random() and tweet_new() functions. Look for the lines that set the url variable.

url = 'http://api.trove.nla.gov.au/result/?q=+&l-publictag={}&zone=all&encoding=json&n=1&s={}&key={}'.format(TAG, start, API_KEY)

Change the zone parameter to newspaper, and add &l-title=?? to the end of the url, replacing ?? with the id of your newspaper. The id of the Canberra Times is 11, so in this case we’d change the url to:

url = 'http://api.trove.nla.gov.au/result/?q=+&l-publictag={}&zone=newspaper&encoding=json&n=1&s={}&key={}&l-title=11'.format(TAG, start, API_KEY)

Adding the l-title parameter is the equivalent of checking one of the title facets in the web interface – it limits results to that particular newspaper.

Bonus points

Using the keywords example above, you should be able to work out how to store the newspaper title id in the .env file.

More possibilities?

There’s plenty of other bot recipes on Glitch to experiment with. If you already know some Javascript and don’t want to play around in Python, have a look at Stefan Bohacek’s node.js bots on Glitch.

If you need help with your Trove bots, either tweet at @wragge or ask a question in the Trove Bots help forum.

Remember, you don’t have to be a coder to make your own Trove bot.

Try giving it a go. I had never done anything like this before today!
— Folly's Antidote (@follysantidote) January 19, 2018

Trove bots for all!

In Python 3 you’d write this as print(message) ↩
Functions built-in to things like strings are generally called methods. ↩
In case you’re wondering, replace() is another string method that I’m using to update the url supplied by the API to something that matches the current Trove url format. ↩

Tim Sherratt

Historian and hacker

Like this? Support me on Patreon.