Derrick Harris

Why big data has some big problems when it comes to public policy

[Commentary] For all the talk about using big data and data science to solve the world’s problems -- and even all the talk about big data as one of the world’s problems -- it seems like we still have a long way to go.

Most obstacles standing in the way of using data science to solve society’s problems have little to do with the data itself.

It’s easier to gather and easier to analyze than ever before. Rather, the problem is that data scientists and researchers -- even those who really care about tackling important issues -- can often have a difficult time overcoming the much more powerful forces fighting against them: fear, politics and the law.

And although they’re all distinct in some ways, they’re also very closely connected.

Netflix uses data for a lot more than just recommendations

Netflix is famous for the way it uses algorithms to determine what programs or movies its members might want to watch, but data plays a much broader role inside the company’s streaming service than just informing recommendations.

In a blog post, the company explained how it analyzes data to do everything from optimizing playback quality to identifying poorly translated subtitles. The post, written by Netflix ‎director of streaming science and algorithms Nirmal Govind, highlights several areas in which better algorithms could improve the Netflix experience, focusing largely on how to ensure the best-possible playback in any given situation --, at least, how to ensure users are getting the playback quality they expect.

However, the most interesting use of data Govind discussed might be how Netflix is using natural-language processing and text analysis to improve the actual quality of the movies and shows it streams.

New Microsoft privacy framework lets lawyers, developers and their code speak the same language

Microsoft Research has developed a new framework for automatically figuring out which lines of code inside massive systems might conflict with corporate privacy policies.

It’s an important goal in today’s technology world where ever-present threats of data breaches and lawsuits, as well as the specter of looming government regulation, have smart companies preparing for whatever might come their way. The really novel thing about Microsoft’s framework is that it was designed to bring together teams of personnel that might never interact directly otherwise, so that the compliance process is faster and less prone to errors.

The system involves a high-level language called Legalease, which lets lawyers and policy employees encode corporate privacy policies into a machine-readable format, and a tool called Grok that inventories big data systems and checks them against those policies. “Ultimately, the truth about what’s happening with this data is in the code,” researcher Saikat Guha explained.

But with millions of lines of code (a fair amount of which changes daily) in a product such as Bing -- on which the Microsoft Research project was prototyped -- it can be difficult to figure out what data is being stored where, how it’s being used as part of any given job and whether that usage complies with privacy rules.

Guha and his team hope the new framework though will speed the process and make it more accurate by letting all of these steps occur in parallel.

Meet the Minnesota company pulling petabytes of data from the field

If you know what to look for, those endless rows of corn that paint the Midwest in summer are full of a lot more than just cattle feed and future Doritos.

They’re full of data. Now, it’s not news that data science can and should be applied to agriculture.

The field of precision agriculture has received a lot of attention over the past few years thanks to advances in sensors and computer vision technologies, and Silicon Valley venture capitalists are now lining up to fund startups that can apply the power of predictive models to all that data. For one, it proves that you don’t need a Silicon Valley connection to make it big in the data business. It also highlights just how much data we’re talking about when it comes to quantifying the farm. Hint: It’s a lot.

If the numbers are any indication, the product, which is delivered as a cloud service and is only a few years old (it’s actually an affiliate of an older company called Superior Edge) seems to work. Farm Intelligence is managing about a million acres of land right now (most of its users have at least 1,000 acres), but Kickert expects that number will be well into the eight-figure range soon enough. And as the technology advances, it’s figuring out ways to capture even more data about each one of those acres.