MothBabyDiv.jpg

Notes

A (Constructive) Critique of Data as Labor

The Quality Problem

The Inequality Problem

The Meaning Problem

The Disaggregate Value Problem.

14377164540_454c0f577b_o.jpg

In March I attended the RadicalxChange conference in Detroit, where “data as labor” was celebrated as a core tenet of a nascent social movement. This idea gained traction following Jaron Lanier’s 2013 book Who Owns The Future?, which proposed a Ted Nelson-inspired digital infrastructure for micropayments on the internet. In such a system, you could be directly compensated for the value created by your data. The idea has been explored further in Lanier’s collaboration with Glen Weyl, which led to a chapter in Radical Markets on the subject.

Contrary to the norms of the internet for the last couple decades, the new radical liberals claim, information should not be free. Some have even envisioned “data strikes” as a tactic towards this end.

I sympathize with this perspective. Surely if our data is generating wealth, those who produce the data should see commensurate compensation . As Lanier points out, there is no real technological innovation in a platform like Facebook, and without our data it would be worthless.

What most impressed me about the community that convened at RadicalxChange — aside from the wide ranging brilliance and lucid visions of its attendees — was its openness to self-critique and its rejection of ideological dogma.

As a contribution to that emerging cultural norm and in the hopes that we can get closer to a viable proposal, I’d like to share some concerns I have with the data-as-labor argument as I have seen it laid out. In the spirit of constructive critique I’ll outline them here.

The Quality Problem

One core argument made by the data-as-labor movement is that paying for information will incentivize higher quality information. That is true, but only if the compensation is directly linked to the quality of that information. Unless we adequately quantify the quality of data, compensation for data creates incentives to flood the network with high quantities of low-quality data. People would game the system. It is not obvious how this would work. How does one measure the quality of a restaurant review in a way that can’t be gamed? Even if markets were set up so that different platforms could bid for restaurant review data, they would need to be able to determine their quality in a manner that scales without introducing new biases.

(It's worth noting that a similar disconnect occurs today in news media – sensationalism pulls at least as much revenue as deep investigative journalism – with similarly detrimental results to the information ecosystem.)

The Inequality Problem

The data-as-labor movement tends to use the value created by training data for algorithms as a preferred example. Perhaps the most common (and value-generating) kind of AI today is a recommendation algorithm, like those that recommend products on Amazon. They are trained by exhibited preference data, e.g. the person who bought running shoes also looked at sunscreen. In an egalitarian world this would be fine, but in a world of multi-dimensionally quantified value and wealth inequality, isn’t the preference data of wealthy people worth more than the data of low-income people simply by virtue of their spending power? And if that’s the case, would wealthy people not be paid more for what is otherwise the same data? How would such a system avoid perpetuating or even exacerbating existing inequalities?

The Meaning Problem

This issue is more philosophical, but I think it has major implications for the human psyche. In a world where every action has an associated data artifact for sale, are we not reducing human existence to a never-ending transactional nightmare? Does the data of who and how I love have a price tag on it? Does that change the way we think about love? Community? Meaning? What if a world of data-as-labor looks like hyper-neoliberalism, in which anything and everything is commoditized? Maybe we don’t want to economize metaphysical values.

The Disaggregate Value Problem

What if all of these problems are the cost of a program that ultimately isn’t worth much in the disaggregate? No one seems to know how much disaggregate data would be worth in a market-based data ecosystem. What if my data is only worth $20 a month? That nominal amount simply isn’t enough to rebuild a middle class, as Lanier suggests it might, nor is it enough to warrant the major infrastructural changes required to create such a system. While some project the returns may increase over time as the value of data is realized, this remains one of the biggest unknowns of data-as-labor and perhaps the most clear gap in the underlying argument.


I've pointed out the problems I see with the data-as-labor solution; it requires quantifying data quality, it might exacerbate inequality, it might make us go crazy by putting a price tag on literally everything, and for the average person the payout may be little more than pocket cash. On top of all that, it's just a really complicated idea to implement.

For example, not all data is the same. Nor is it always clear that the data itself is the valuable asset this model assumes. In some cases, such as the training data for algorithms that go on to automate some value-generating process, there is clearly value in the data. In other instances, such as for digital advertisers, the data is merely a conduit for targeting the right ad at the right person – ultimately the targeted individual's attention is the valuable asset. Data brokers could have all of our data, but if we never use the internet to see an ad it's not worth anything to advertisers.

I appreciate the appeal of a world in which every individual data point has an associated value that modulates according to context and makes its way back to its originator every time that value is realized. As I think Lanier admits, however, this level of granularity and traceability basically requires rebuilding the internet. Even if that infrastructure were built, the system would demand a massive amount of overhead for a payout that, again, may not amount to much.

The big upside to the granularity and traceability of the data-as-labor solution (which includes paying for data as well as being paid by it) is that it could become more widely feasible for creators like authors, artists, and musicians to make a living off the intellectual property they produce. Lanier makes the case that building a system for compensation of the activities that are uniquely human will buffer us against the tide of automation. Whether or not that's true, I'm content to accept that a world that incentivizes more creative and intellectual output is a more beautiful and interesting world to live in.

In a later post I'll explore some ideas that attempt to ameliorate the problems listed here while preserving this upside. For now I’d love to hear how others interpret these problems (if they are indeed problems) and if there are any dynamics my thinking may have missed.


Thanks to Nick Vincent for helping me refine some of these points!

Kasey KlimesComment