You can check out the source of all the code in this tutorial in its entirety on github. Also, if somehow you’re reading this and you’re an angry UCLA administrator, please email me privately if you have any issues with what I’m doing. Otherwise, enjoy the post.
One of the seasonal headaches I face as a UCLA student is enrolling for classes. I thought it would get easier as each quarter progressed. Yet here I am, senior standing and all, still worrying about whether or not I can get into the classes I need in order to graduate on time.
As a CS major, it is only fair that I use what skills I have in order to beat, or at the very least game this system. This post is the first in a series of posts documenting how I gamed the UCLA enrollment process. I’ll begin this tutorial with a guide on how to build a simple scraper -> notification system. In future posts, I’ll work my way up to building a fully automated course checker dashboard -> auto-enrollment system.
Before I begin my post, I just want to address a minor concern I have. People constantly ask me to run my scripts to check for their classes and in the past, I have. From now on I will not perform this service for the following reasons:
I’m running these processes on my own machine and can’t afford to scale up past a few users. I released a simple site in the past and I had a log of over 3000 courses to check within a week of me releasing the tool. That was no bueno so I shut it down right away. I will gladly reopen such a service if I (or you) can find a way that doesn’t hurt my wallet.
Courtesy to UCLA. This goes hand-in-hand with the last point. When it was just me and a few others using the tool, I could afford to poll the registrar for updates every 15 seconds. As the size grew, I had to increase the lag period from 15 seconds up to 2 minutes because of the mass traffic I was sending UCLA’s servers. I could have very well kept the 15 second lag and still made those requests, but that’s just wrong. I want to beat the system, but not to death.
There’s no use in having this tool if everyone is using it. Unfortunately, the class enrollment frenzy at UCLA is a zero sum game. There are winners and there are losers. However, I have documented my process for anyone to reproduce on their own. So without further ado…
You can very well follow along from your mac/windows/linux laptop, but you can’t turn off your laptop without killing the process. You can always create a free server (micro instance) on Amazon’s EC2, but I won’t cover how to do that since you can easily find tutorials online. I suggest prototyping on your laptop and deploying to a server once you get the hang of things.
Sending yourself an email/SMS notification:
What’s the use of knowing if a class is open without somehow being notified of it? The first thing we need to build is a good, simple way of sending out notifications.
First, you’re going to need to an easy way to send email using python. I’m going to use smtplib.
We create our function send_mail with two parameters: a subject and message text. We also have to specify a sender and the recipients. Since this tutorial assumes you’re building this application for yourself, then this is easy. You can just hard-code in the values without having to manage any forms and databases.
def send_mail(subject, text): sender = 'firstname.lastname@example.org' receivers = ['email@example.com', 'firstname.lastname@example.org'] message = 'Subject: %s\n\n%s' % (subject, text)
We also need to instantiate an smtp client to send the mail and do some logging to verify that the email is actually being sent.
try: smtpObj = smtplib.SMTP('localhost') smtpObj.sendmail(sender, receivers, message) print 'Successfully sent email' except SMTPException: print 'Error: unable to send email'
And that’s it! smtplib comes bundled with python so all you have to do is just make sure you import it and you’re set. Pretty simple, right?
Scraping the registrar HTML:
So the idea behind a scraper is that you can request a page, get the source html, find the attribute you want to extract, and pass this information along to the notification system. In this tutorial I’m going to be checking for an enrollment restriction for STATS 100A to be lifted. Normally, you need to be a stats major or minor to enroll in the class, but after some arbitrary date (nobody knows when) the restriction gets lifted and is open as long as you meet the prereqs. You could follow these exact steps to check for a course or waitlist opening with just minor modifications.
So anyways we’re going to need three things: a way to make these requests for the html, a way to parse the html into its elements, and a way to query these elements for certain identifying attributes. We can use the python libraries requests, html5lib, and lxml respectively to achieve exactly these three things. Before you begin coding, make sure to pip install these libraries.
import html5lib, lxml, lxml.cssselect import requests
Now we need to find the page with the information we’re looking for. Here’s the page for stats and just as a check, here’s the page for classics. We are looking for the enrollment restrictions data and after inspecting the element on both pages, I’m assuming this is the one:
<span id="ctl00_BodyContentPlaceHolder_subdet_lblEnrollRestrict">STATISTICS MAJORS OR MINORS</span>
The key bit of information here is the span’s id attribute. This seems to be the unique identifying factor for enrollment restrictions across the registrar. Sweet. Part of the awesomeness of lxml is the ability to query parsed html for certain css attributes. All we have to do is just take that id and throw on a ‘#’ in front of it. Boom, we now have a css selector. Now we just try to match the selector to the parsed html page. Putting this all together we have so far:
def check_registrar(): base_url = 'http://www.registrar.ucla.edu/schedule/subdet.aspx?' reg_url_stats = base_url + 'srs=263303210&term=13F&session=' reg_url_classics = base_url + 'srs=148043200&term=13F&session=' r = requests.get(reg_url_stats) raw_html = r.text page = html5lib.parse(raw_html, treebuilder='lxml', namespaceHTMLElements=False) css_query = '#ctl00_BodyContentPlaceHolder_subdet_lblEnrollRestrict' selector = lxml.cssselect.CSSSelector(css_query) match = selector(page) status = match.text print status
Now we just have to take this status and do something meaningful with it. In this case, I want to know when the enrollment is not strictly limited to stats majors and minors.
if status == 'STATISTICS MAJORS OR MINORS': print 'Still blocked' return False else: print 'Good to go bob!' send_mail('STATS 100A IS NOT RESTRICTED.', 'GET ON URSA!') return True
And that’s it! Now if you call check_registrar it will do a single check for an opening and send you an email/text if there’s an opening. Not bad for < 40 lines of code!
Putting it ALL together:
So now we have a checker -> notifier system in place. We just need to automate it and have some way to monitor the progress. The first thing we need is a way to stop notifying once the class is open or the enrollment restriction is lifted. We don’t want a notification overload. The second thing we need is a way to create a lag between each check, as a courtesy to UCLA’s servers. I would say a 30 second delay is sufficient. The last thing we need is a sanity check mechanism. That is, we need a way to make sure the process is running without constantly checking the computer or error logs. I decided that 60608 seconds (or every 8 hours) is a good interval to send a notification. This way, I’ll know periodically throughout the day that the process is running fine. If I stop getting notifications, then I know something’s up and I need to go back and change things.
def run_checker(lag, status_update): class_is_open = False time_of_status_check = time.time() while not class_is_open: print 'checking' class_is_open = check_registrar() time.sleep(lag) current_time = time.time() if(current_time - time_of_status_check > status_update): send_mail('Checker status update', 'Still running fine!') time_of_status_check = current_time
All we need to do now is just call the function with some parameters, for example:
And we’re good to go. Now just fire up your terminal and cd to the directory with this python script in it and run this command
screen python scriptname.py
Hit Ctrl-A then Ctrl-D. This basically allows you to run your script behind a screen, detach yourself from that screen, and exit the shell without killing the process. You can always go back and kill the process using the kill
And we’re done with this segment! As I mentioned earlier, you can check out the source in its entirety on github. The entire code snippet is only about 50 lines long!