Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sent to Prison by a Software Program’s Secret Algorithms (nytimes.com)
139 points by gscott on May 1, 2017 | hide | past | favorite | 57 comments


Haha, I just looked up the product sheet for Compas CORE, the software at issue:

http://www.equivant.com/assets/img/content/Risk-Needs-Assess...

It has a section titled "Make Defensible Decisions":

> Fully web-based and Windows compliant, COMPAS is applicable to offenders at all levels from non-violent misdemeanors to repeat violent felons. COMPAS offers separate norms for males, females, community and incarcerated populations.

That has ... nothing to do with defensibility. Defensibility is being able to say, "Here's the evidence and the reasoning I used to arrive at this conclusion."

What I think is especially frustrating is that in all three risk scales the product offers -- risk of new violent crime, risk of general recidivism, and pretrial risk -- they're dealing in probabilities that don't work on an individual level. That's why they're called "norms." So, if you have 1,000 offenders who match a certain profile, you might be able to fairly accurately predict that 40% of that population are going to reoffend. But being able to predict which offenders constitute the 40% and which constitute the 60% is a totally different thing. There is simply no way to take these probabilities that apply to groups of people and apply them in any fair way to individuals.


Wait, does that mean they use entirely separate models for males and females? Does that in turn imply that they explicitly factor in your gender when deciding whether to send you to prison or not? Isn't that on its face not okay?


Would it be ok if we swapped gender with race, religion or any other protected class?

There's your answer.


Note that gender is not, in Constitutional case law, in the same tier of protected classes as race and religion (the last two are subject to strict scrutiny, the first to intermediate scrutiny.)

Even where the same justification applies, a distinction that survives intermediate scrutiny may not survive strict scrutiny.


My bad. Is sex included though?


Not saying it's right but isn't that exactly what we do with insurance for driving, health etc?


Many types of data are prohibited for various types of insurance decisions. This varies greatly, from general rules such as banning price discrimination due to race or sex, to specific topics like the ACA[1]'s prohibition against denying coverage due to "pre-existing conditions".

With modern data analysis methods, it is (sometimes) possible to infer prohibited data from other collections of data. For example, various machine learning techniques used on economic values such as income, home value, and proximity to good schools might reveal something about recidivism rates, but it's probably finding a confounding variable such as redlining[2] or other types of institutional bias[3].

This is a problem in both legal decisions and insurance, and transparency in how decisions are made is vital in both. That said, the impact on insurance is probably[4] a relatively minor change in price, which is very different from a Boolean decision about someone's freedom. Even in the unlikely case where the statistical biases are known and accounted for, it still isn't appropriate to over-interpret results about populations as if they apply to any particular individual.

[1] Affordable Care Act ("Obamacare")

[2] https://en.wikipedia.org/wiki/Redlining

[3] https://www.youtube.com/watch?v=qXQA6D4JC0A

[4] for definitions of "probable" somewhere between "an educated guess" and "meh; whatever"


Good answer. Btw in the EU it has only recently been made illegal to discriminate on sex for car insurance rates.


Yes, it is what we do with insurance. The crux of the matter is that driving is a privilege, incarceration is a restriction on/revocation of a person's rights.

Healthcare is a good demonstration of where lines are blurred (at least in the United States). Some view it as a right, others don't.


driving is a privilege? Is that really the core of your argument? How do you distinguish between rights and privileges?


Not really. It's true that insurance companies charge more to certain people because they are statistically more likely to, for example, be in a car accident than another person or because a home in a particular location is more at risk to wildfires.

But that's where the similarities end.

Insurance companies have a much more and higher quality data than these companies do. More important, the ways the insurance companies price their product is public and subject to competition. If an insurance company realizes that a specific segment of the market is less risky than previously thought they can make a lot of money by dropping the price and underwriting policies for that group.

These courtroom applications use secret algorithms, and frankly, have little incentive to improve them.


Good point. I think competition is the crux of the matter here. Alas I don't think the shop-around model would work for sentencing.


Consider:

- Write the algorithm out.

- Extract all the decision points.

- Pay legal interns to collect all the data, apply the data to the decision points.

- Produce a report.

- Produce a recommendation.

- The judge sentences based on that process.

And now tell the defense that they have no right to any knowledge about the above process. Not the decision points, and not what data was fed to what decision, and not even what the data was.

If you can't release that information to the defense because of trade secrets, then, it seems to me, it's not appropriate as a sentencing tool.

Of course, IANAL.


This same stuff is used to decide things like probation length, if someone gets a parole, and what level of severity a sex offender is set at. All of this stuff should be freely available online if it is used in any legal decision making process. As it stands even the police who use it don't know how it decides anything.


You could do automatic differentiation. Take each input, race, sex, age, ... and see how the output changes. If one field has a huge swing it might be a good basis for claiming variable X based discrimination.


Sex would have a major impact. At least in the US our justice system is extremely sexist, far more than it is racist. I once saw sex offender guidelines that automatically put the default sex offender sentence as medium for males and low for females (the default was used before an official evaluation was completed, and I would bet some money that the official evaluations discriminated based on sex).

While I don't disagree our legal system is also racist, the racism pales in comparison to the current level of sexism.


That's not a bug, it's a feature! User jawns posted some of their marketing copy elsewhere in this thread:

"COMPAS offers separate norms for males, females, community and incarcerated populations"

And they call that "defensible decisions"! In other words, they want to make sure the software doesn't rock the boat. If there's a bias, they want to preserve it so it doesn't look like their algorithms are "weird".


> In other words, they want to make sure the software doesn't rock the boat. If there's a bias, they want to preserve it so it doesn't look like their algorithms are "weird".

Isn't that what judges do when they follow precedent?


No, and judges don't always follow precedent (they theoretically have to when it is binding, but no precedent is binding on the highest court, even it's own.)

When they follow precedent, they are not matching outcomes to outcomes to avoid looking wierd, they are applying what was previously established as the formal decision rule.

That's a very different thing that attempting to preserve biases that may not even reflect the formally-annoubced decision rules in the cases that produced them, but may represent silent deviations from the formal rules.


Legal precedents tend not to explicitly single out one gender, race or religion.


And if there's not a small number of perturbations, then it's the proprietary combination of a large number of small, individual perturbations. But either way, since it's proprietary, it's not practically refutable.

Not to mention you'd never get access to do the differentiation, as it would rely on access to the proprietary process; the company would have to be compelled to cooperate.


It's worth remembering that there are already algorithms used in determining how long a prison sentence should be, such as how much time off an inmate gets for good behavior, and whether there are mitigating factors about the conviction that would reduce the inmate's chances of early parole. Keeping track of these calculations for all the inmates isn't a trivial task in that the Bureau of Prisons has a separate complex in Texas to handle it: the Designation and Sentence Computation Center https://www.bop.gov/inmates/custody_and_care/sentence_comput...

California's rule for `good behavior == half the prison sentence` [0], like any algorithm implemented in the real world, is ostensibly based on data analysis (studies of recidivism) and cost trade-offs. So is the 3-strikes-law [1], which vastly simplifies the cost and complexity of sentencing judgments, while making moral tradeoffs.

And like in software systems, sometimes judicial algorithms have severe bugs. The one that recently comes to mind is how sex offenders face penalties beyond prison because of alleged recidivism worries. Apparently, the belief that sex offenders are at a greater danger of recidivism was based off of an unsupported statistic found in a magazine that ended up being cited in a pivotal Supreme Court decision [2]

[0] http://www.latimes.com/local/crime/la-me-ff-early-release-20...

[1] https://en.wikipedia.org/wiki/Three-strikes_law

[2] https://www.nytimes.com/2017/03/06/us/politics/supreme-court...


* Closed source

* Private, for-profit company

Either one of those determining a person's sentence is a potential for danger. The combination is a nightmare.


> It's worth remembering that there are already algorithms used in determining how long a prison sentence should be

They aren't secret algorithms, and both the algorithms themselves and the data being fed into them are subject to review and challenge by the people (and their counsel) subjected to them.


I worry that articles like this are driven by an unwillingness to grapple with how screwed up the whole system is.

Sure, opaque decisions made by software are creepy. But mostly, they faithfully reproduce the decisions humans (judges and legislators) were already making, and are in fact trained on those decisions. Algorithms are an easy no-guilt target to write about, but I worry that it obscures deeper sentencing issues.


Say in a trial an expert witness is called -- based on the data, was this fire intentionally set or an accident? That expert might be a poor one, but the other side has the chance to cross examine and expose that expert as being unreliable.

In the case of this software, though, there is no cross examination. Say the secret sauce is simply a data set and Bayesian reasoning. How do we know if the data set is representative? If it is just an opaque box, that is unacceptable.


The arson example is an interesting one.

As far as I can tell, pretty much all of arson science was fundamentally broken for at least 30 years. What were the basic principles of the field - that arson fires burned hotter than others, that puddle marks indicated accelerants, that multiple-origin burns indicate arson - are all wrong.

Multiple people were executed for arson homicides on this evidence. Some were executed or even convicted after these principles were rejected. Nor was this a "science marches on" thing. The assumptions had simply been made based on observation and intuition, with no formal testing ever.

So this is the sort of thing I mean. It's not a poor expert who can be cross-examined. It's an entire field producing entirely broken results for decades using unquestioned best practices. You can't cross-examine "are even textbooks in your field a complete pack of lies?" and win.

I understand the concern, I'm not comfortable with this, but I'm also not convinced that black box software is actually much different from what we have now. The grounds for conviction and sentencing are often completely opaque to the defense already.

http://www.newyorker.com/magazine/2009/09/07/trial-by-fire

https://books.google.com/books?id=GSJ7Ja95oegC&pg=PA382&lpg=...


Also the United States Sentencing Guidelines, which suggest (pretty vast) numbers of factors to be considered by Federal judges at the time the sentence is first imposed:

http://www.ussc.gov/


Even without the computer aided decision making program they say he would have been sentenced anyway but that's not the point. The point is every program has bugs and if you have a closed source program that wrongly decides the fate of say 0.01% of people we need to be able to know these things. While judges and doctors and so on are supposed to rely on their judgement in addition to these programs there is potential for over reliance. I would not mind the closed source aspect so long as they pay 3rd party testers and auditors and release information about error rate but I have no idea if they actually do this. Interesting court case


How do we know that the algorithm is at all appropriate? How do we even know it's an algorithm at all and not a mechanical turk? The algorithm could be to hand off the case files to some minimum wage college student and have them come up with some nice charts and graphs. After Theranos, this isn't even a hypothetical. It is clearly an injustice to use a secret algorithm regardless of outcome. But once again, corporate profit must come before justice, it seems.


I think that the problem is very simple:

a) The person involved has to obey the law.

b) The computer-program dictates the law and punishment.

c) In a free country, a person should be able to follow the law.

So: The person must be able to know the law, and thus must be able to know the program.

If the program is secret in any way, the person can never obey it.

So: if the punishment is based on this program alone, the person should be let free.


The program dictates the punishment, not the law. The program can be secret and have no impact on the ability of the person to follow the law which remains public.


It seems to me that we are at a critical tipping point in becoming a dystopian society. If we are not careful we are going to fall right off the edge. Crazy times we live in.


The obvious flipside: Sent to Prison at The Discretion of a Human Being.


The point (as made clear in the article) is that a Human Being's decisions can at least, in principle, be challenged: "So you think the defendant poses a high risk of violence, recidivism or pretrial risk -- why exactly?"

Not so with the "decisions" that were made by the proprietary algorithm described in the article.


Reminds me of this 2011 study of how time relative to lunch break seemed to have an impact on judges' leniency: https://economix.blogs.nytimes.com/2011/04/14/time-and-judgm...


This has become pretty proverbial for questioning human judges' objectivity, but another interpretation has been suggested: possibly different kinds of matters were commonly scheduled for morning and afternoon sessions (for example, maybe defendants represented by counsel, or those expected to enter a plea bargain, were commonly seen before lunch).

I forgot where I read this point, but it might imply that the judges' hunger is not necessarily the only factor leading to the different outcome.


This got passed around from SlateStarCodex, not sure where else it got traction.

But yes: a followup study found that judges schedule open and shut cases for near lunch so they won't get held up, and more ambiguous cases for open periods. Since "open and shut" usually means "guilty", the original study found higher conviction/sentence rates for near-lunch cases. Remove cases where judges can set their own case schedule and you see way less of this.


I thought it was probably on SSC and I even searched there, but didn't manage to find the article. Thanks for the summary!


Thanks for this. I think I remember at the time that there were still questions about the study. And generally when a study matches my sentiment of "Of course that's the case" I'm inclined to treat it with skepticism.

Priceonomics had more detail in a 2014 post, which mentions that the Israeli Courts Research Decision did indeed reject the findings on the basis that the study didn't fully consider the mundane details of how cases were scheduled: https://priceonomics.com/justice-isnt-blind-its-cranky-by-5p...

The study's authors, according to Priceonomics, had argued against this explanation but they also conceded that they were unable to account for other factors, such as lawyers with multiple clients who decide which order to present them. And how prisoners without lawyers would typically come up right before the break.


It would still be very interesting to study judges' consistency and inter-rater reliability.

There was someone's blog post that suggested doing empirical studies of trials by jury to see how often juries tended to agree with each other (with the suggestion that if there's a very high random component, it's harder to think of the juries' decisions as all reflecting a notion of justice). I've now also forgotten who proposed that.


The other flipside: a non-secret, audit-able algorithm.


I found this paper[1] to be a particularly good source of information on the legal landscape of using algorithms in the judicial system.

[1] Barocas, Solon, and Andrew D. Selbst. "Big data's disparate impact." (2016). APA https://www.accmeetings.com/AM16/faculty/files/Article_461_D...


The issue raised by this article is discussed in a broader context in the excellent book by Cathy O'Neil: "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy". The dangers of proprietary, secret algorithms making judgements at critical junctures (e.g whether a person is sentenced or not) is raised in the introduction of the book.

The book also gives copious concrete examples of these dangers. In particular, the book describes the LSI-R (Level of Service Inventory-Revised) questionnaire and how it effectively pinpoints race even though it does not actually ask for the person's race, which is illegal.


I think I've seen this Black Mirror episode.


We don't have access to the sentencing algorithms in the minds of judges, either.


They don't keep their reasoning secret and sell their sentancing recommendations either.


There's reasoning and there's justification. It's entirely possible that the reason a judge does something is not revealed by the justification they present.


But we either elect them or elect the reps that appoint them, and we have (some) recourse to make them explain their reasoning, which they usually offer openly.


And judges' results are checked by appeals courts. If a judge is found to, consistently, base their decisions on some illegal bias, they can be impeached and discharged of their duties. It's unfortunate for their victims, but it's small scale.

When you move biases from one person into a whole system, be encoding them in some poorly considered model and executing it at scale, then the process of correcting it can be far more difficult (though problems should also arise much faster so it shouldn't persist as long, but that's only in theory, not practice).


As a technologist, I feel there are problems we can solve using computer, and there are problems we cannot solve using computer.

I am a big fan of smarter computation, but when it comes to legal judgement, I defer to a human. We all have heard of bizarre rulings before (I don't have to remind everyone the Stanford rape scandal last year), but the human involvement in making a judgement call is what makes justice system precious.

I am a big fan of kicking out Donald Trump out of office. I would describe this secret algorithm as Donald Trump. Some data were collected, not sure how much, how authentic, and how much bias has been introduced. We just know some answer is produced. The algorithm might be as simple as tossing a coin. If I can't trust the leader of our government, how can I trust a machine, whether secret or not, making judgement call, when human is prone to making poor and irrational judgement call.

So why human if human are prone to make mistake and to make unfair judgement? Because there should be humanity in justice. Yes, Lady Justice is blind-folded, but that doesn't mean we can't show compassion or anger.

Is there a real correlation of crime rate and number of years in prison? I heard many said criminals are likely to commit crime again because either the criminals have no other skills to depend on, or they have mental illness that prevented them from obeying laws. So if the dataset says 90% of recurrence rate, are we going to sentence people longer? Then why not lock the person up or go for an immediate execution if we want peace?

You see, the purpose we want to add to a jail senetence is correction. This is not an ideal talk. There are many convicts do turn out right and fine if they are given the chance to redeem themselves. We shouldn't be begging for a safe prison, a prison with staff ready to help, because those should be a requirement of a jail.

I can't help but to remind myself Futurama, there are robot judges (one of the cops is also a robot). We should fear people trying to robotized our humanity. If judging can depend on data, then raising a kid from infant to adulthood could be done using algorithms too. We just need lots of data, lots of simulations.


Hypothetically speaking: what if the algorithm had hardwired names and probabilities? {name:"Loomis", prob_recidivism:0.99} ? Then you're just throwing him into jail because a lot of other Loomises turned out to be jerks.

He must be allowed to verify the validity of the algorithm. The current situation is just 1 step above secret laws.


If the prison system actually focused on reform, perhaps history and this algorithm would be a largely trailing indicator of repeat offense


[flagged]


Please stop posting unsubstantively like this.

https://news.ycombinator.com/newsguidelines.html


My main issue with this algorithm is that it purports to predict THE DEFENDANT'S actual risk to the community. That's impossible to do.

Instead, it seems like it should provide data-centric trends based on objective data and metrics, such as age, sex, race, socio-economic metrics, housing, charges, peers, gang affiliations, state, city, ZIP, block, historical recidivism, charge severity, marital status, child status, and any other obtainable piece of data. From this, you would be able to generate concrete statistical evidence which could be used to supplement, not replace, the standard factors considered by the Court. This data could be used by prosecution or defense to counteract any intentional bias introduced by those running the numbers.

Even then, however, this data would be generalized, treated as a predictor, and applied as prophecy to the particular defendant in question. It seems like it should be inadmissible.


Here's the problem with saying include "age, sex, race, socio-economic..." as being /objective/ is that it literally repeats existing stereotypes and biases.

There's plenty of evidence that there is a huge amount of bias due to race, being poor means you can't afford good lawyers, etc. So now what happens is the information hat feeds into the algorithm is going to be something like "<X> currently receives longer sentences" leads it to just repeat the existing behaviour, but now it claims to be a machine and so unbiased, so judges defer to it.

I feel that as a defendant i should be able to ask them to re-run the algorithm with a changed race and a changed zip code so we can get a comparison and measure of effective algorithmic bias.


My suggestion was to 1) not use any data or algorithms, or 2) use an objective algorithm to figure out if poor white males with criminal records but with a new child are likely to be repeat violent offenders.

I did not suggest to use a single variable, nor did I suggest that it be used to determine guilt, explicitly dictate sentencing, etc. If there is going to be any algorithmic "fact" finding, it should involve plugging in all available data for a defendant and trending it against all available data. Not cherry picking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: