Video Games

Scraping Steam for Data using Python + BeautifulSoup

As promised in yesterday’s blog post about analyzing public Steam numbers, here are the juicy technical details behind scraping a website using Python, and a Python library called BeautifulSoup.

Python Logo

The Method

I chose to use Python because I’ve been using it for a little under two years to do number crunching, as well as building a few automation scripts for work. It’s very lightweight, very easy to read, and quite a mature language.

That said, you can probably do this in whatever you feel like, but my approach consisted of the following steps:

  1. Poke the Steam & Game Stats page and get the HTML page that is served up to the browser
  2. Parse the HTML code and pull out specific numbers that would be useful for analysis
  3. Open a specified CSV file, and add lines to the file with all of the relevant data
  4. Close file, standby for next script run

What Was Used

The script is a very small file (33 lines!), and uses the following:

And you can take a look at the Gist itself to see the full script, but I am going to use this post to explain some of the methodology behind the script, to help people who want to learn about writing in Python and scraping web pages!

Alright, shut up, explain your code.

Of course!

8
steampage = BeautifulSoup(urllib.urlopen('http://store.steampowered.com/stats/?l=english').read())

This gets the ball rolling for the scraper. We use urllib to open a connection to the Steam & Game Stats page, and then read it with the BeautifulSoup library. If you are unfamiliar, I know I was, BeautifulSoup is a very powerful Python library that makes it super easy to navigate, search, and modify the parsed code you receive from websites.

In short: read the code of a webpage using BeautifulSoup, and you get all kinds of methods to chop and screw it to your liking.

10
11
timestamp = time.time()
currentTime = datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')

I wanted to use a consistent timestamp when recording data into the CSVs because it would allow me to group results in a sane manner. It uses the current time (of the script running) and formats it into YYYY-MM-DD HH:MM:SS so that when imported into Google Sheets, it would preserve the actual date and time aspects.

13
top100CSV = open('SteamTop100byTime.csv', 'a')

You’ll see two open(…) lines in my script, and both of them point to a specific CSV. This is where I dumped all of my data. The second parameter (‘a’) made sure I was opening and adding to the CSV, rather than continuously overwriting.

15
16
17
18
19
20
21
for row in steampage('tr', {'class': 'player_count_row'}):
    steamAppID = row.a.get('href').split("/")[4]
    steamGameName = row.a.get_text().encode('utf-8')
    currentConcurrent = row.find_all('span')[0].get_text()
    maxConcurrent = row.find_all('span')[1].get_text()
 
    top100CSV.write('{0},{1},"{2}","{3}","{4}"\n'.format(currentTime, steamAppID, steamGameName, currentConcurrent, maxConcurrent))

This simple looking for loop pulls out the Steam ID for the game, the English game name (as listed on the Top 100 list), the number of concurrent players (as of the script reading the page), and the peak concurrent seen throughout the day (I forget why I wanted this.) It also adds that information on a new line inside of the CSV.

In addition, this loop shows you the simplicity of the power behind BeautifulSoup. Let me break it down into smaller pieces because each one uses different BeautifulSoup methods.

15
for row in steampage('tr', {'class': 'player_count_row'}):

When I dug through the Steam & Game Stats source code, I realized that every game was listed inside of a table row with the class player_count_row. Upon seeing the pattern, I simply asked BeautifulSoup to iterate through every single block or table rows using that class, and as they are all uniform in their markup, can consistently pull out the information we need.

16
    steamAppID = row.a.get('href').split("/")[4]

With BeautifulSoup, you can make direct references to markup (as seen above), and then grabbing parameters within the markup itself (like ‘href’.) I did this to grab the URL of each Steam game, break it apart based on where the forward slashes (‘/’) were, and pulled out the app ID that was nestled inside of the URL.

17
    steamGameName = row.a.get_text().encode('utf-8')

Like the above markup parameter grabbing, get_text() is a very neat function in BeautifulSoup that allows you to grab the text for a link. The Steam & Game Stats page uses the game name itself as the link text, so it was a breeze to add to our collection of data.

18
    currentConcurrent = row.find_all('span')[0].get_text()

Nabbing the current and peak concurrent users is the same procedure, so I only need to explain it once. The find_all() function from BeautifulSoup finds the markup that is specified. It takes every single instance of that markup found and creates an array that can be referenced for easy modification or evaluation.

With the same methods and functions, I managed to easily find the current and peak concurrent users for Steam altogether.

Simple, right?

What else?

There’s nothing else, really! Those 33 lines were more than enough to collect lines upon lines of data inside of two CSV files.

There’s plenty of work that went into the analysis of that data, but that’s for another day.

Thanks for reading! As always, happy to answer questions or take feedback, so leave comments or yell at me on Twitter!

An Analysis of Activity on Steam

I have wanted to flex my analysis muscles for quite some time and thought I’d get some practice on publicly available data: the Steam & Game Stats page.

Steam Logo

For the unaware: Steam is one of the largest gaming platforms available for PCs, and they have a wonderful stats page that lists the current number of concurrent users for each game, as well as the current number of concurrent users in general. I’ve always wondered what actual activity levels were, ever since seeing this page, and so I decided to scrape that page for a week or two and see what sort of data I could get.

NOTE: If you’re interested in the technical details behind the scraping, I will provide that in a separate post in the near future.

The data I have managed to mine from this page is trivial at best, but I had a ton of fun learning how to build the scraper (uses Python, BeautifulSoup, and a handy dandy cronjob) as well as figuring out the required Google Sheets equations to put it all together.

Some Fun Facts About Steam Activity

On average, 19.95% of concurrent players on Steam are playing games.

Here is a visualization of what Steam activity levels have looked like from March 7, 2014 to March 19, 2014:

On average, 1/5th of the Steam “concurrent” user base is actually playing a game. However, I will concede that this is not dead-on accurate, as I only have numbers for top 100 games at any given moment, but is reliable for estimations because games in the top 90-100 levels account for 0.02% of the user base, meaning any additional users playing non-top-100 games will be relatively insignificant in their effect.

There was a big spike, on March 9, 2014 at 4:00pm EDT (16:00) when the number of Counter-Strike: Global Offensive players ballooned up to a staggering 111,893 players. Presumably for the start of a tournament. (Haven’t dug into this one too much.)

Dota 2 dominates games played on Steam, accounting for an average of 5.58% of the concurrent player base.

Not a real surprise, given the popularity of Dota 2, but the second most played game is Counter-Strike: Global Offensive, sitting at a distant 1.66% average, about three times smaller than Dota 2. This translates to an average of about 400,000 concurrent players, with the highest I’ve recorded at 673,018 concurrent players on March 15, 2014 at 10:00am EDT.

In fact, most of the highest numbers of concurrent players in Dota happens in the mornings from 9:00am to 11:00am.

However, might be indicative of the real struggle for MOBAs to fight against the titan amongst gods, League of Legends, which boasted an impressive peak of 7.5 million concurrent users in January 2014.

Hard to gain ground on such an entrenched competitor, but they’re definitely doing their best.

160 games have been a part of the Steam Top 100 between March 7, 2014 and March 19, 2014

While that sounds like a lot, we have to remember that the Steam catalogue currently sits at over 3,000 titles and growing, it’s pretty safe to say that breaking into the top 100 is no easy feat.

Further analysis that I could do as time goes on is to get a breakdown of the genres being represented in the Top 100 list, which would also provide a decent idea of what is and what is not popular on Steam. Not to mention that this is an extremely small sample size, it would be more worthwhile to get this data over a period of a year to make it really meaningful.

What’s Next?

Well, this is a big pile of data, and it’s growing by the hour. This is great, but what can I really do with this data?

For starters, the original goal was to figure out if there was a link between the digital marketing behaviours of publishers and the level of concurrent players on Steam, as well as growth or decline in player base from that activity (or lack thereof.) I will have to explore whether or not that is still possible to figure out, as there are a lot of marketing activities that are either harder to track down or even attribute towards the success of a game.

Secondly, I will have to step up the data storage game a bit to make it much more accessible. Currently, a Python script scrapes the Steam & Game stats page, adding a line to two separate CSV files with all the relevant data. I’d like to transition this into an actual database (probably MySQL) and maybe make it open to the public to poke at and do their own analysis.

Lastly, I’m really not sure. It was a fun side project in the first place, and I feel like it was a great learning experience and a fantastic way to brush up on my analytical skills.

Have any ideas or want access to the data? Give me a shout, I’m happy to share!

Gaming Links of the Week: May 19 to May 26

Frozen Synapse

Oh hello, didn’t see you come in there.

Lame jokes aside, I’m doing my best to get back into regular writing. It’s quite long overdue. To get the ball rolling, I’ve decided to commit to collecting interesting reads focused on game design and development because it knocks out two birds with one stone: it helps develop myself for my day job, and it provides reading fodder for people interested in joining the games industry.

Why Frozen Synapse Costs Money – Paul Taylor of Mode 7 Games, the developers behind Frozen Synapse, walk through the logic behind making the iOS release of Frozen Synapse a paid app, rather than following the trend of free-to-play in the mobile space.

OPINION: Paul makes an incredibly good argument, but as the free-to-play market continues to evolve, I am sure we’ll see plenty of examples of well-designed, AND well-monetized games. Especially in the mobile arena.

Nintendo grabs money, control from fans promoting its games on Youtube – The Penny Arcade Report summarizes the debacle surrounding Nintendo using YouTube’s copyrighted content system to claim all the advertising dollars from popular YouTubers publishing Let’s Play videos for Nintendo games.

OPINION: Nintendo have their heads up their asses. Perhaps they don’t need any promotional help with their games, but I am a firm believer that Let’s Play videos are one of the best ways to get your game some exposure and love from potential fans.

Why every developer should play Aliens: Colonial Marines – A writer from Novy PR discusses why playing through Aliens: Colonial Marines is a set of lessons on what NOT to do in game development.

OPINION: Aliens: Colonial Marines is a complete disaster, and these lessons are a must-read. However, any reasonable game designer or developer who wants to create a quality product wouldn’t have done any of these in the first place.

Hands on with Runescape 3: a brave new world – Nick Wilson from PCGamesN describes his experience with the up and coming Runescape 3, built in HTML5, coming this summer.

OPINION: This was a post that I almost didn’t include, but Runescape is a soft spot for myself because of all of the time I’ve spent playing in the past, and I really want to see how far they can push the in-browser MMORPG with the new Runescape.

Letting the Player Find the Fun – Ben Serviss discusses the power of ‘Discoverability’ in gaming and provides a few ideas on doing discoverability better in games today.

OPINION: It seems almost cliché to make fun of today’s games because of their hand-holding nature, but it’s sadly true. I miss the days of delight when finding a not-so-obvious secret and learning to play a game just by… playing the game.

Unlock your creative potential: 7 steps to becoming a game designer – Ethan Levy provides, via Slideshare presentation (it’s rather long!), what is involved in being a game designer, and 7 concrete steps that aspiring game designers can take to actually become game designers.

OPINION: A rather long watch, and not entirely perfect, but it’s always good to take in the opinions of other, more experienced game designers and learn from what they’ve learned.

Welp, that was fun! I’m constantly on the lookout for more interesting reads in the gaming world, so this is going to be a fun post to continue.

Thanks for reading!

Octocat Attacks: Our entry for the GitHub Game Off

Near the end of October, a blog post from GitHub caught my eye, entitled: GitHub Game Off.

In short, GitHub was running a competition for game developers to build their games, host their code on GitHub, and have it loosely based on a git concept (forking, branching, etc). We were free to build it however we want as long as it could be open source. As a life long gamer, it’s always been a dream to build my own game, and that’s a dream that’s been also shared by awesome guy Wayne Sang.

We had been toying around with the idea of building out a game idea that Wayne had several months ago, and before Game Off, we had decided to build something smaller to get us acclimated with each other’s style and capabilities. GitHub Game Off presented itself as an opportunity to finally make this happen with real deadlines and actual work needing to be produced.

That game? Octocat Attacks.

Octocat Attacks Title Screen

You can view the source code here, and the playable version of the game here.

Most of the rest of this post is going to talk about the development side of things, as there were quite a few things I learned along the way.

Creating the Concept

When Wayne agreed to build a game for the competition, we sat down and hammered out a concept pretty quickly. I suggested that we use Flash, as it was probably the fastest way to get up and running with a game especially with established libraries already available, and that we make a puzzle game because “it’s far easier to build a silly puzzle game than a full blown action game!

Just for the record, I was going to eat those words.

We sat down for several hours to hammer out the concept: it would be match-3 style puzzle, it would be about a giant alien attacking Earth, and various countries coming together to build separate parts of a robot to defend against the alien. The loose association with git was that each country was essentially working on its own “branch” of the master robot repo, and completing a level was that particular country “pushing” their part toward the final product. Each round was timed, and your score affects the quality of the piece that is created, with three different possible tiers in quality, which also affected your final battle with the alien.

I also did some research around the best Flash library to use to build games, and I landed upon what seemed like the most developed and easiest to get started with: Flixel.

There were a handful of other engines available, but Flixel was really far along in development and actually powered games I had heard of (like Canabalt!) and so I ran with it. Just as a side note, once you start using it, Flixel really feels like it was built more for twitch-based games rather than puzzle games, I was lucky to find the Flixel Power Tools set which extends the capabilities of Flixel even further, allowing me to take care of some of the issues I was having with sprites in Flixel.

Starting to Code

Once my environment was set up, I began to write a few test games just to get a feel of Flixel and Actionscript.

Have I mentioned that I haven’t really touched code in a serious way since 2009? Have I also mentioned that I haven’t touched ActionScript since 2005?

Granted, I was very familiar with programming in the first place, so the learning curve wasn’t very steep for me, but it was one thing to be figuring out what I can and can’t do with Flixel, and it was a completely different beast trying to do it while learning ActionScript 3.

However, I got a prototype up and running relatively quickly. According to my records, we started brainstorming on October 27th, and I had a prototype with a 5×7 board full of temporary game pieces that could switch places on October 31st. I was rapidly iterating on the first prototype, creating 90% of the game mechanics by November 12th: puzzle piece generation, piece movement (swapping places), match checking and clearing, and empty space refilling. Nothing was 100% as it should be for a completed game, but it was a very quick start.

Around this time, Wayne chipped in with his awesome pixel art, and the game was finally starting to come together.

Refactoring Mania

Regardless of the level of stability, I was rather unhappy with how game pieces were being moved around and being checked as matches, so I spent a good week refactoring everything. And I mean everything.

Suddenly the game became less jittery and resource intensive, I had created a queue for the checking and clearing of pieces, but I was still being plagued by my code to animate the refilling of pieces on the board. It was a problem that I am still having trouble with to this day, and I feel like I’ve smashed my head against it enough times that I may need to refactor the entire thing to provide a different approach.

Oh well, that’s what branches are for, right?

The important lesson I have is that, and I didn’t know this because I am a complete newbie, Flash and ActionScript 3 runs code synchronously (I think.) This was a problem with the initial way I was refilling pieces, because I essentially had a for loop that would check every single spot on the board, and if there was no sprite within that spot, it would start the animation to move all pieces above the empty square downward to fill the empty spot and create the new piece. However, if you have two (or more) empty squares on top of each other, the new piece creation happens simultaneously and you have stacked pieces in the same square.

As you can tell, I am not very experienced with puzzle game animations!

Octocat Emerges

On November 19th, Wayne sent me an email where he sketched out the alien: he had taken the Octocat, of GitHub fame, and turned it into the alien monster attacking the Earth. It shot lasers from its eyes, it was adorable, and I think it gave me a bit of extra motivation to see this project go all the way.

Alien Octocat

Aw, aren’t you a horrible little creature?

Final Stretch

It feels like I’ve left out a lot of details, but that’s because the entire month felt like a blur. I was constantly trying to fix our animation problem while building out the HUD (score, timer, etc) and the functionality to power the HUD. Wayne was churning out all of the necessary art assets for the game, and it was starting to come together.

Eventually, we decided that we weren’t going to complete the game in time, and we were alright with that. We were both willing to continue working on the game at a more leisurely pace after the deadline had passed, and that’s one of the side projects I am really looking forward to.

At the end of the day, the v0.1 build of Octocat Attacks, as in the one we’re submitting to GitHub Game Off, is a very incomplete game. We have an incomplete puzzle engine, no audio, and our content is far from complete.

However, we got it out there. We took the effort to start our project and bring it this far, and we’re continuing to push on. I’m very happy with how the game looks right now thanks to the hard (and amazing) work that Wayne has put into his pixel art, and our game is functional, which is a lot more than I expected when we started!

It’s been a really fun and educational experience to build a game from scratch, and you better believe that Wayne and I are going to continue developing games.

Octocat Attacks Stage Select
Octocat Attacks Gameplay

“We do enforce this policy.”

I run a very small, very casual video games channel on YouTube called The Blundercast – I just record, edit, and post silly little moments that happen while I play games. It’s very much a labour of love, but I do happen to monetize a few videos just for a bit of coffee money here and there.

Most recently, I posted a video where I played Scribblenauts Unlimited and had fun on a mission.

I did attempt to monetize this video but was abruptly stopped by YouTube.

YouTube:
We may consider your video(s) for further review provided you verify that you are authorized to commercially use all of the elements of your content. This includes all video, images, music, video game footage, and any other audio or visual elements.

Fair enough, I’ve run into this before. I explained:

Me:
This video is a video where I have fun with a small portion of the Scribblenauts Unlimited game. It was created solely for the purposes of entertainment and education and is all done in fair use.

Makes sense to me, you learn about the game and you can enjoy watching me make an ass of myself on the internet. However, it got rebuffed with a request for information regarding formal permission and/or terms that would allow me to post the video.

I reached out to WB Games, the publisher of Scribblenauts Unlimited, to get this permission, and got this response in a few hours:

WB Games Support:
WB does not provide formal permission to post videos on YouTube or similar sites. Generally we don’t mind fan videos so long as you’re using legal copies of the game, are not being posted to make a profit (through advertising or other means), and are in good taste.

Hmm… not being posted to make a profit? What about the hundreds of videos that do just that on YouTube? Do they all have a standing agreement with WB Games that allows them to post and profit off their videos? Or are they in danger of having WB enforce their policies on them?

So I asked to clarify, especially with regards to YouTube partners, and got this response:

WB Support:
WB does not give out any formal permission. We also do enforce this policy.

And now we’re back at square one.

I understand you want to protect your game, but we’re giving you free marketing at no cost. I’m not entirely sure why you would be against that?

It is a silly place.

The Humble Indie Bundle #5: 5 Awesome Reasons To Buy

I put together a quick and dirty overview of the lineup for The Humble Indie Bundle #5: Amnesia, Psychonauts, Limbo, Superbrothers: Sword & Sorcery EP, and Bastion.

For those unfamiliar with The Humble Indie Bundle: it’s a really great initiative to bring together an assortment of high-quality, cross-platform, independently developed games and letting the consumer set the price they pay for the bundle. Did I mention it’s DRM-free and Steam unlockable as well?

As a consumer, you set the price level of the bundle and then decide how to allocate the money between the three entities: the developers involved, charity, and the Humble Bundle team themselves.

Overall, it’s an amazing initiative for games that you may not have otherwise played, so definitely check it out and pick it up – you’re getting five amazing games, you help support charities, and you have the ultimate power as a consumer.

Happy gaming!

Don’t Mess with Gabe

Gabe from Penny Arcade about Ocean Marketing:

‎I have a real problem with bullies. I spent my childhood moving from school to school and I got made fun of everyplace I landed. I feel like Paul is a bully and maybe that’s why I have no sympathy here. [...] I will personally burn everything I’ve made to the fucking ground if I think I can catch them in the flames.

Gabe just became one of my favourite people. More so. Seriously, don’t mess with Gabe, or any other Gabe for that matter.

Games I’m Playing!

One of those things that I get asked semi-often: What games are you playing?

I don’t have all that much time for video games, but I do set aside from some time for gaming because it’s a fantastic outlet for me during really stressful times.

That said, check out my new Games I’m Playing page to take a look at what I am playing!

Touch Screen Gaming is Different

I play a lot of games. It should come as no surprise that a lot of the games I play are also for the iOS, especially considering that I used to run an iOS Game Review site.

Lately, I have been playing a lot of FIFA 11. I’ve been playing the English Premier League as Arsenal and have been completely obliterating everyone in my way. The game itself is really fun, save for a few headache inducing moments, and I genuinely enjoy playing 2-3 matches during my commute.

However, the game causes me pain. Literally.

The game is controlled through a virtual joystick and buttons. This causes my hand to contort into a weird angle and gives me a lot of wrist pain, which is amplified by the fact that I am pretty sure I have carpal tunnel in these bad boys.

This post isn’t a complaint about my pain, rather a request: iOS game designers, or even touch screen game designers in general, please find more fitting ways to control your games on a touch screen device. A non-tactile joystick can become extremely aggravating, mentally and physically, with prolonged usage. It can be non-responsive, it can go the wrong direction, and I often find myself just letting go to let it re-orient itself. It’s difficult to make a sports game without a joystick, I know, but I am sure there is a way.

Touch screen gaming is different from handheld consoles, so let’s try to break convention here and build a more exciting control scheme, shall we?

(If someone wants to recommend sports games to me that use a great way to control the players, please do so in the comments!)