DJOAuth2 — An OAuth 2.0 Server Implementation for Django
We’re happy to announce the release of DJOAuth2, Locu’s OAuth 2.0 server implementation! Unlike current Django alternatives, we deliver a complete, working, drop-in Django application that is fully compliant with the final OAuth specification. Inspired by and built on top of existing implementations (shout out to HiiDef’s oauth2app), DJOAuth2 features:
- A robust test suite with >90% code coverage. The test suite is run with every change of code thanks to Travis CI.
- Sane default choices: the OAuth specification allows for many different types of client-server interactions. DJOAuth2 supports a secure, pragmatic subset of these interactions in order to provide a great user experience and a familiar code flow for developers.
- Tools for easily adding OAuth protection to existing API endpoints without having to change internal logic.
- Great documentation, both on a high-level and as part of the code! A lot of implementation decisions are rooted in rules of the specification; we’ve done our best to write highly-commented code for maintainability and clarity.
We wrote the first version of DJOAuth2 in the fall of 2012 to bring familiar OAuth-based delegation of resources to Locu’s customers and certain partners through our developer API. Since then, we’ve spent months testing and running the code in production and writing tests to ensure that the implementation is as safe and spec-compliant as possible.
Why use OAuth?
First, it’s helpful to understand what OAuth 2 is designed to be:
An open protocol to allow secure authorization in a simple and standard method from web, mobile and desktop applications.
OAuth gives you and your users much greater control over how third party apps can access your users’ private data and/or act on their behalf. Rather than using password-based authentication (which would require the user to share their account information), OAuth describes different types of token exchange. These tokens can be limited in “scope” such that your users only give third parties access to relevant data.
OAuth isn’t just great for your customers to delegate their account safely: it’s also very easy for developers to implement. It helps that there is an official specification and that its adoption is widespread across companies like Facebook, Google, Twitter, and others. Although many companies have locked in their implementations at different draft versions of the spec, the interactions are all nearly the same and there are many existing resources to help explain the protocol. This means that developers can more easily use your API, leading to greater exposure and more reasons for users to sign up with your service.
Why use DJOAuth2?
You should use DJOAuth2 if you’re running Django 1.4 – 1.6.2 on Python 2.7 and you want to implement the server side of the OAuth 2.0 protocol. DJOAuth2 benefits from being spec compliant, fully tested, and used in production. You’ll save time by not having to re-implement the specification and all of its intricacies. DJOAuth2 is safe and sane by default, supporting the standard OAuth flows with which other developers are most familiar. Moreover, we have dedicated a large amount of effort to avoid known security issues found with other OAuth server implementations.
P.S. Want to work on exciting systems and products that change the way that small businesses interact with their customers and grow their businesses? We’re hiring! Check out our jobs page to learn more and apply today.
Burgernomics 101: Burgers and Obesity
The foods a person eats say a lot about them. What about the foods a community eats: can we learn something about a group of people by studying the foods that are available to them? To answer this question, we analyzed Locu’s collection of millions of local merchants and restaurants to get a picture of communities based on the food they put on the table. To start off, we focused on America’s staple food, the burger, in order to see what it can teach us about obesity. What we found surprised us:
- The skinniest counties in America have the largest of per-capita burger offerings.
- An abundance of cheap burgers is a signal of an obese county.
- Counties with more diverse burger offerings (ostrich, anyone?) tend to see less obesity.
The Dataset, and A Burger Popularity Contest
We looked at 176,356 burgers offered in the 15 most populous states. Our analysis spanned more than 600 types of burgers (grouped by common terms), including pizza burgers and ostrich burgers. The median price of a burger was $8.49. Here are some of the most popular ones:
If Burgers Could Talk, What Would They Tell Us?
You may have heard of the Big Mac Index published by The Economist, which uses McDonald’s Big Mac prices to measure purchasing power parity between nations. We were curious how far we could extend burgernomics, and analyzed how burger availability, access, and diversity relate to adult obesity trends.
Adult obesity and burger availability: not what you’d expect. We placed the counties in our dataset into four groups (equal-sized quartiles for the descriptive statistics nerds out there) by their adult obesity rates. In each group, we measured the number of restaurants and burgers available for each 10,000 people. The bars in the chart below represent the median county’s burgers per ten thousand people, and the error bars represent the 25th and 75th percentiles.
It turns out that counties with higher obesity rates have lower per-capita burger offerings! (We saw an even stronger trend version of this trend in overall per-capita restaurant availability.) One reason for this potentially counterintuitive finding is that low overall food availability might co-occur with high obesity.1 We do not yet know which way, if any, the causality between food availability and obesity goes, but it does suggest we can improve our understanding of the world through the lens of food.
Obesity and cost. Below, we look at the least obese (green) and most obese (blue) counties, and compared the price distributions of their burgers. The chart makes it clear that cheap burgers are more popular in high obesity counties. While we can’t say that one caused the other, it certainly feels like the wrong incentive mechanism for a community to stay healthy.
Let them eat brie. Not all burgers were created equal. We’ve seen burgers ranging from a typical McSnackieSnack to a novel black bean vegan patty with mint-infused mango chutney. We wondered if the diversity of burger options available in an area has any relationship to the obesity rates in that location. Using Locu’s detailed menu data, we categorized up to 600 different types of burgers (e.g., turkey, pepper jack, Hawaiian, truffle, etc.) offered in each county based on the same name and keyword classification we used to rank the most popular burgers above.
The results below are arguably the most striking in just how large a difference we see between the least obese and most obese counties. We see that the least obese counties see the largest variety of burgers, whereas the most obese counties see little variety. This suggests that food diversity might be an interesting signal for other socioeconomic factors.
With Locu’s data, we were able to knock down the idea that high burger availability is a signal for high obesity rates: in fact, the opposite seems to be true. We also found that burger price and diversity are good markers of obesity in a population. Greater and more diverse burger options are characteristic of less obese communities, which lends itself to a more general hypothesis that greater and more diverse food options are correlated with healthier communities.
In the future, we want to explore what these findings mean for food deserts, where healthy food options are few and limited. Given how much of an effort we’ve made to eradicate such deserts in the face of research suggesting such efforts might not be effective, we hope that the nuance our massive collection of price lists offers can shed more light on the topic.
We’re at the start of a longer journey to measure the impact of food access and how it relates to other socioeconomic factors. What we really want to answer is how and why. For example, how and why do greater percentages of cheap burgers and high obesity occur together? Is one directly driving the other or are they both an artifact of a third variable such as income or education? As Locu collects more data and observes changes in prices and availability over time, we’re excited to see how our data can help us understand social challenges such as poverty and obesity.
We’d love to hear your thoughts and takes on some of the trends we’ve noted. What other trends and issues would you like to see investigated?
1 The food availability we are evaluating pertains to dining options offered at restaurants and does not take into account other food options provided by supermarkets, convenience stores, etc.
Five Suggestions for Data Teams
This post was first published on Wired, read it here.
When we usually talk about our work, we focus on how we build dead-simple products for local businesses to help them attract more customers online. All of the product ‘magic’ for our local merchant customers and big publishing partners leads back what we are at heart: a data company. We’ve got a pretty heavy-weight web structured data extraction pipeline, some serious machine learning powering it, and an API that shares our cleaned structured data and real-time insights from it. There are a few things we have learned along the way, and in this post, we’d like to share some of them with you.
1. Invest in Infrastructure
Data engineering for one-off projects is often a dirty practice. Our scripts directory contains many one-off scripts we wrote initially to deliver data to our early partners. The scripts were designed to work exactly once, outputting the proprietary format a partner wanted, with all of the special-casing required to clean things up to meet a partner’s specific needs.
As we grew, we’ve starting to see the same use-cases over and over. We now employ a data quality pipeline that makes it easier to avoid special cases and helps find ‘data bugs’. We’ve build a robust API that partners can use to access the data. And we now have internal tools that allow us to quickly iterate on insights for merchants, classifiers for data extraction, or regressions to predict crowd worker quality.
Some of these investments involve taking a step back and figuring out what open source tools have already solved your problem. We did quite a bit of interprocess communication using databases before we started using Celery for asyncronous task processing. We handrolled a few maps and reduces before setting up SLURM and Spark as substrates for our distributed computation needs. After we did things in an ad-hoc manner a few times, we learned what sort of infrastructure we’d need to speed up development in the future.
Other use cases require you to build out your own infrastructure. Hopefully you open source it, as we’ll discuss next.
2. Most of Your Tooling Isn’t Proprietary: Open Source It
We are a young company, and so our open source contributions to date have involved small upstream commits to other organizations’ projects. But in building up our tooling, we’ve generated quite a few internal systems and libraries ranging from data serving infrastructure to layout engines that we couldn’t find in the ecosystem.
It’s easy to think of internal tools as a competitive advantage, but keeping your tools secret is rarely worth it. The engineers that built the tools are worth it, though, as is the open source ecosystem that helped get you as far as it did. Allowing engineers to share their contributions with the world and giving back to the community generally makes everyone happy, and it’s rarely at the expense of some competitive edge.
You don’t have to be a complete open book: We have tons of proprietary intellectual property in our algorithms, machine learning models, and design. It’s unlikely you will see us release too many trained machine learning classifiers, but the infrastructure that helped us train those classifiers will soon be open.
We’re excited about a number of open source releases on the horizon: Kronos, our logging and timeseries storage engine, Jia, our metrics visualization tool, and Xenia, our library for hosting classifiers as a service, are just some of the infrastructure we’re planning on releasing.
3. It’s Never a 100% Game
Data bugs have a very human aesthetic. They have many sources: Typos, accidental duplication, or just being tired. You can fix existing issues, but you’ll never be done.
As we crawl more data, our classifiers become more accurate. As we train, monitor and incentivize our crowd workers, they make less mistakes. As we get more experience, our data quality pipeline catches more errors.
But it’s not a 100% game. Label 1 million messages as spam using a classifier with a respectable 99.99% precision, and you’ll mislabel 100 of them. Crowd workers who have a 100% quality track record will share their accounts with someone who is just learning the ropes, and you’ll see a few unexpected mistakes.
Despite these odds, it’s important to strive for perfect data, even if you know it is not completely attainable. As we encounter new special cases, we fix them and work to make sure that they don’t happen again. Part of this involves having empathy when people are frustrated by existing mistakes. Mistakes, though usually explainable, will understandably cause people unexpected frustration. Make peace with the fact that people will point out potential issues, fix them quickly and thank them for helping you improve your data or algorithms.
4. Product Launches are Data Launches
As a data company, you have to recognize the data people. When to do this isn’t often obvious, since data team contributions are often less visible and heroic than those of the folks building your product. Good data infrastructure, design, and algorithms are rarely seen, and it’s the mistakes that get you noticed.
Look deep into your product launches, and you will likely find a new internal API, a new data cleanup process, or an entirely new piece of serving infrastructure that exposes something a data person made. Their effort likely started and ended before the finishing touches you put on the interface that displays the data. Like design and information architecture, you can’t throw together a scalable storage system, and you don’t build a high-precision classifier at the last minute.
When you applaud the folks who built the product at your end-of-week meeting, make sure to recognize the data team that helped out. Some amount of statistics and magic went into the launch, and your pride in the people that helped get you there should show that.
5. Some Data Should be Secret, but Much of it Should be Shared
It’s often easy to guard your data like a treasure, protecting it from the prying eyes of curious minds. We’ve fought this instinct at every turn in exchange for some pretty wonderful rewards. Sharing data empowers others to build on and, more importantly, link data to enable new applications that were previously not possible.
While sharing your core asset sounds counterintuitive for a startup from a business point of view, it doesn’t have to be. Here are are two examples of some of the benefits from our experience:
Anyone can sign up for our API. Keeping our API open to the world has resulted in lots of interesting relationships. Other startups come to us for data, giving us insight into the forefront of what consumers and merchants want. Academics and artists have also approached us about collaborations, and asking them to sign up and start hacking on our data has made these interactions ridiculously easy to establish.
Don’t think that putting together an API will be hassle-free. Don’t put too little thought into how to get the most of your data launch. After you’ve taken these precautions, jump into data sharing in various forms: you’ll likely be pleasantly surprised.
Here’s a free sixth suggestion, since counting is hard: Everyone’s a data person.
The data scientists and engineers in your company pride themselves on ROC curves and 99th percentile latencies. But they aren’t the only ones making your data what it is.
Here’s a concrete example from one of our amazing full-stack engineers, Peter. When we sponsored the 2012 TechCrunch Disrupt hackathon, Peter helped put together an awesome example visualization to show off our Merchant Insights API that allowed users to explore cities by the menu items served in their restaurants and bars. That visualization, on the surface, helped us look at our data in new ways, and resulted in a fun blog post on the prevalence of PBR and hipsters.
But as we dug deeper into Peter’s implementation, we found that he was doing some data cleaning before displaying the data. Our item prices had some interesting changes in formatting which he was normalizing to better visualize pricing trends. These changes informed our data quality pipeline, and we’ve now got automatic normalization and alerts in place that benefit everyone consuming the data.
Start viewing everyone in your company as a data person, and you’ll start learning what you need to do to make their lives better.
We started writing a thank-you list to recognize everyone at Locu who has in some way improved our data or data infrastructure, but we had to stop ourselves because the list encompassed almost everyone on our Team page. Want to hack on some fun data, engineering, and design challenges with Locu? Apply to join the team!
Go Locu: Locu joins GoDaddy
At Locu, our mission is to help local businesses thrive by better connecting merchants with customers. We’re happy to announce our next step in furthering this mission: Locu will be joining the GoDaddy family.
We’ve been close partners with GoDaddy since earlier this year, and we’re excited to join them to continue building the best experience for local businesses to be found online. As the world’s top platform for small businesses, GoDaddy brings unparalleled leadership and expertise for us to jointly transform the way small businesses use technology and to help them grow.
Current Locu customers will not be impacted by this change; we’ll be continuing to provide more of the ease, simplicity and “magic” that our customers have come to know from us. We are excited to share more details of what’s to come in the near future.
Announcing the Locu Publisher Platform
We often hear from the online publishing community about how difficult it can be to keep local data current and accurate, and we have created a solution.
We are thrilled to announce the launch of our new Locu Publisher Platform (LPP), which enables publishers large and small to easily integrate data like menus, service lists, and insights that Locu has already amassed for more than a million businesses worldwide.
Locu currently powers local data for many well known partner sites including Yelp, OpenTable, and TripAdvisor, and we want any publisher to be able to access and utilize our data to engage their audience. The best part is that it’s completely free to use.
What you need to know about the Locu Publisher Platform:
- Value-Added Data: enrich your audience’s experience with up-to-date local business data like menus, services lists, and insights.
- 100% Free: no data licensing or setup fees.
- Keep All Revenue: keep 100% of revenue from ads that you serve around Locu content.
- Maintain Branding: customizable widget lets you apply the look and feel of your site to Locu content.
- Easy Implementation: no software to install; data hosted and maintained by Locu.
So what’s the catch you ask? Well, there is no catch actually. Our customers are merchants who want their data shown on sites across the Web, and your audience wants to know what products and services local businesses are offering. So get those menus and service lists up on your site today!
Get started for free at http://locu.com/publisher-platform.
Locu Digs Deep into PBR Data: Hipsters Unphased
Hipsters and PBR, two peas in a pod. That’s what everyone says, anyway, but is it true? At Locu, we’ve done a bit of research about these things.
While more than 15,000 merchants have signed up for Locu since we launched over the holidays, we’ve indexed nearly one million venues since Locu started. We do this by crawling thousands of merchant websites a day through a technology pipeline that we’ve previously described. As we near a million menus, we’ve learned our fair share of menu-related factoids. Our aggregate price list data allows us to answer lots of important questions for merchants, but also a few fun ones along the way.
In this post, we’ll see what our data can tell us about the fascinating world of PBR. Keep an eye on the Locu Blog for an upcoming series of deeper data analyses.
2013: The Year of Local Business Software
Happy New Year! We at Locu are excited to continue our quest to help local merchants run more successful businesses, leveraging technology and data.
It’s been great to see so many restaurants and local businesses use Locu to create and share beautiful menus and price lists on their websites, on mobile and on Facebook. Just as 2012 was coming to an end (and less than three months after launching Locu), the 10,000th business joined: Pig & Finch in Leawood, Kansas.
In the last couple of weeks we’ve been hard at work to make Locu even better for local businesses, like Pig & Finch. We will be rolling out several exciting new features in the coming weeks - leveraging the latest technology developed by Locu’s team.
Jeremy Levine puts it well in his recent interview in the WSJ:
The innovations in software UI, online distribution and application development will drive massive new software offerings for small businesses. Historically, only the world’s largest corporations have been able to afford most productivity-enhancing software, but now even the smallest businesses will have a chance to get in on the action.
We couldn’t agree more. Have a great 2013 everyone!
7 Reasons to Nominate Locu for Crunchies “Best Technology Achievement” Award 2012
Few people are aware of the huge technology achievement that has happened here at Locu this year – our engineering team has been killing it in 2012, building the world’s largest real-time repository of local business offerings data and I think they deserve the nomination for the 2012 Crunchies “Best Technology Achievement”. Here’s why!
The Human Side of Crowdsourcing
We often hear about the wonderful benefits that crowdsourcing brings to businesses, and Locu certainly benefits from the flexibility and power of human computation. What’s often not discussed is the human element of crowd work, and the effects crowdsourcing has on crowd workers, both good and bad. This weekend, we received a wonderful story from two of our hundreds of contractors that we wanted to share with you (with Ryan and Desiree’s permission of course).
Crowd Task Decomposition: How Locu Breaks it Down for the Crowd
by Matt Greenstein and Adam Marcus
A lot of what we do at Locu is automated by machines, but a lot of it also requires a human touch. When we require a human touch at scale, we crowdsource the work to a set of trusty crowd workers. In this post, we’ll shed some light on how Locu designs tasks for our crowd workers. Through these tasks, we work with crowd workers all over the world to perform structured data extraction on price lists like menus.