The Code Review: Data Models

The Great Feather Migration

Jun 23, 2022

So, as I mentioned in the last issue, I ended up hitting the quota for my Twitter API keys. 2 million tweets pulled from the API can apparently go by fast. So I set up two more developer accounts and attempted to switch the keys over, but I’m still waiting on the approval I need to up their quotas to 2M and give me access to the V1.1 API. Just had to send more info to them today in hopes that I get the approval I need. And if not, at least my account will roll over soon. Just might need to get a bit creative with the way I approach Twitter data pulls. (Update: In the middle of writing this, I got emails telling me that both accounts were approved! I will be able to expand what I’m doing considerably now! Yay!)

So I started on the piece I could: migrating Feather over to use the Django backend and Oauth2 authentication. I started by looking at what functionality I needed and decided to start by building data models.

Now, I have done things a few different ways. Sometimes, I’ll only store the ID of a tweet or Twitter account and sometimes I’ll store a bit more info. But I use a bit more than that with Feather, so I ended up creating a Twitter module in my code that had a few different entity models:

TwitterAccount - this holds the basic information of a given account. Twitter ID, name, username, bio, and profile pic URL
Tweet - the tweet object contains the created timestamp, the text of the tweet, a reference to the author of the tweet, and a reference to the tweet that the tweet is a response to, if any. (that last bit is remarkably tricky to determine. I can determine if it’s one of the responses to a given top-level tweet, but figuring out responses to other responses is a lot trickier. I’m still figuring out the best way to do that)
Likes - a model containing the reference to a tweet and a given user account that likes that tweet
Retweets - a model containing a reference to a tweet and the given account that retweeted it
TweetCollection - a collection of tweets and a name.

Most of those probably seem pretty reasonable. The TweetCollection is something I plan on using in the future as a way to reuse groups of tweets.

The main thing missing for now: relationships between Twitter accounts. I’ve got a bit of that modelled in the unfollow module, but I’m planning on adapting everything to use a shared database, so I’ll migrate those pieces over shortly.

This is enough of a data architecture to start building up some of the functionality. So I then moved on to figure out what functionality I needed to implement in order to populate the database appropriately.

For this, I went with Celery tasks. I have tasks I can use to pull tweet information, pull account information, and fetch some engagement data. These tasks can fire quickly and I’ll set them up to use exponential backoff to retry if they hit their rate limits.

That gives me pretty much everything I need to pull in data from Twitter.

The next module I added is a Feather module. The purpose of this module is to allow me to create endpoints (Django “views”) that will be specific to what I need to pull for Feather. Although, as I was working on this, I realized that I do have a potential issue that I’ve got to figure out and it relates to the short-term tokens that the OAuth2 setup uses. If I’m storing that access token on the front-end and refreshing it when needed, and then passing it to the backend, that’s great in most cases.

Except the one that I need to implement, where I schedule a daily run to refresh. The way I’ve got it set up on Feather, each day I pull the data for every Twitter account that has authenticated. This allows me to produce daily and weekly reports (TweetCollections) that get emailed to users based on their subscription preferences. So if I stuck with the Blitz/Quirrel implementation, that’s not a big deal. I could even set up a daily job that would refresh the token on every day shortly before I ran the job to pull the data.

But I’m going to be losing Quirrel at the end of July. So I need to make sure I’m not relying on it to do anything. Which brings me back to the issue at hand. I need to figure out a better way to do this.

I’m using the Tweepy library for accessing the Twitter API via Python. It does have support for Oauth2 authentication flows, but I don’t use them because I do the authentication on the front-end. The current solution I’m looking at is to store more data in the backend database for each client. Instead of loading values from environment variables to instantiate the Tweepy client, I’m going to try pulling them from the client record associated with the API key I’ve generated. If I can get that working, I can store the refresh token in addition to the access token and just take care of the refreshes whenever I need to. I’ll be able to schedule it every 1.5 hours or so to make sure the tokens are refreshed constantly and they will be ready whenever they are needed.

That’s the next technical challenge I’m planning on addressing. Once I can validate that this part is working, I can then continue with the rest of the migration. Once I’ve got the Twitter functionality migrated, I can then move on to the rest of the jobs that I scheduled via Quirrel, such as the sending of emails. Those parts should be relatively trivial to migrate over. Then the Feather migration will be completed. I’ve just got to get that part finished by the end of July.

Hopefully, I’ll be ready to migrate the whoshouldIunfollow Twitter bits over to the shared module and start testing with users at the beginning of next week. That’s when I plan on working on the next bits of the marketing plan I mentioned in the last issue, getting the users who signed up to beta test involved with the process and testing in public. And at least as importantly, seeding the database with data, which will improve the performance of future users, assuming that they will tend to be in similar social networks.

For the next non-code issue of the SaaS Factory, is there anything you’d like me to dig into? I don’t have anything specific in mind as of this writing, so if there’s something you’re interested in learning about, now’s a great time to ask!

And on a completely unrelated note, I’ll drop the latest episode of How To Scale Yourself here. This was an incredible conversation with someone who has shipped products and is continuing to innovate. Had a blast talking with Ross about the future of work/education, the importance of failing, and project-based learning. Enjoy!

Scale Your Journey - Featuring Ross Jones

Idea Supply Chain

The Code Review: Data Models

The Great Feather Migration