Main

January 20, 2018

Facebook copes, categorizes content

But with the company's vast reach has come another kind of problem: Facebook is becoming too big for its computer algorithms and relatively small team of employees and contractors to manage the trillions of posts on its social network.

Earlier Wednesday, Mark Zuckerberg, the company's chief executive, acknowledged the problem. In a Facebook post, he said that over the next year, the company would add 3,000 people to the team that polices the site for inappropriate or offensive content, especially in the live videos the company is encouraging users to broadcast.

"If we're going to build a safe community, we need to respond quickly," he wrote. "We're working to make these videos easier to report so we can take the right action sooner -- whether that's responding quickly when someone needs help or taking a post down." He offered no details on what would change.

Continue reading "Facebook copes, categorizes content" »

January 19, 2018

Geoffrey Hinton, capsule networks and Sara Sabour, holding a two-piece pyramid puzzle, are researching a system that could let computers see more like humans at a Google laboratory in Toronto

Geoffrey Hinton and Sara Sabour, holding a two-piece pyramid puzzle, are researching a system that could let computers see more like humans at a Google laboratory in Toronto.

But as Mr. Hinton himself points out, his idea has had its limits. If a neural network is trained on images that show a coffee cup only from a side, for example, it is unlikely to recognize a coffee cup turned upside down.

Now Mr. Hinton and Sara Sabour, a young Google researcher, are exploring an alternative mathematical technique that he calls a capsule network. The idea is to build a system that sees more like a human. If a neural network sees the world in two dimensions, a capsule network can see it in three.

Mr. Hinton, a 69-year-old British expatriate, opened Google's artificial intelligence lab in Toronto this year. The new lab is emblematic of what some believe to be the future of cutting-edge tech research: Much of it is expected to happen outside the United States in Europe, China and longtime A.I. research centers, like Toronto, that are more welcoming to immigrant researchers.

January 17, 2018

Conference on Fairness, Accountability, and Transparency (FAT*) 2018

Conference on Fairness, Accountability, and
Transparency (FAT*)

Program for 2018 is out.
Conference on Fairness, Accountability, and
Transparency (FAT*) is from Conference on Fairness, Accountability, and
Transparency in Machine Learning (FATML).

January 15, 2018

Auditing algorithms for bias

So much for the idea that bots will be taking over human jobs. Once we have AIs doing work for us, we'll need to invent new jobs for humans who are testing the AIs' results for accuracy and prejudice. Even when chatbots get incredibly sophisticated, they are still going to be trained on human language. And since bias is built into language, humans will still be necessary as decision-makers.

In a recent paper for Science about their work, the researchers say the implications are far-reaching. "Our findings are also sure to contribute to the debate concerning the Sapir Whorf hypothesis," they write. "Our work suggests that behavior can be driven by cultural history embedded in a term's historic use. Such histories can evidently vary between languages." If you watched the movie Arrival, you've probably heard of Sapir Whorf--it's the hypothesis that language shapes consciousness. Now we have an algorithm that suggests this may be true, at least when it comes to stereotypes.

Aylin Caliskan said her team wants to branch out and try to find as-yet-unknown biases in human language. Perhaps they could look for patterns created by fake news or look into biases that exist in specific subcultures or geographical locations. They would also like to look at other languages, where bias is encoded very differently than it is in English.

"Let's say in the future, someone suspects there's a bias or stereotype in a certain culture or location," Caliskan mused. "Instead of testing with human subjects first, which takes time, money, and effort, they can get text from that group of people and test to see if they have this bias. It would save so much time."


See also Princeton Researchers discover AI bias and
Science, 2017. DOI: 10.1126/science.aal4230


See

Continue reading "Auditing algorithms for bias" »

January 5, 2018

Princeton researchers discover why AI become racist and sexist Study of language bias has implications for AI as well as human cognition

Princeton researchers discover why AI become racist and sexist
Study of language bias has implications for AI as well as human cognition.

-- An algorithm that can actually predict human prejudices based on an intensive analysis of how people use English online.

The Common Crawl is the result of a large-scale crawl of the Internet in 2014 that contains 840 billion tokens, or words. Princeton Center for Information Technology Policy researcher Aylin Caliskan and her colleagues wondered whether that corpus--created by millions of people typing away online--might contain biases that could be discovered by algorithm. To figure it out, they turned to an unusual source: the Implicit Association Test (IAT), which is used to measure often unconscious social attitudes.

Using the IAT as a model, Caliskan and her colleagues created the Word-Embedding Association Test (WEAT), which analyzes chunks of text to see which concepts are more closely associated than others. The "word-embedding" part of the test comes from a project at Stanford called GloVe, which packages words together into "vector representations," basically lists of associated terms. So the word "dog," if represented as a word-embedded vector, would be composed of words like puppy, doggie, hound, canine, and all the various dog breeds.

The idea is to get at the concept of dog, not the specific word. This is especially important if you are working with social stereotypes, where somebody might be expressing ideas about women by using words like "girl" or "mother." To keep things simple, the researchers limited each concept to 300 vectors.

-- ANNALEE NEWITZ

Continue reading "Princeton researchers discover why AI become racist and sexist Study of language bias has implications for AI as well as human cognition" »

January 1, 2018

A6: Auditing Algorithms: Adding Accountability to Automated Authority

Auditing Algorithms: Adding Accountability to Automated Authority is A6.

December 19, 2017

Bicycle lanes are crowdsourced big data

1 TWG / Coruscation home-brew
2 NYC DOT official
3 NYC Bikemaps

4 Medium NYU data
5 Lane ~~~~~
4 OuvosTech Ouvos.

Data:


Others of interest:


  1. Spiderbikemap's Spiderbikemaps' new_york_with_background [PDF]

  2. cyclinguk.org/guide/make-tube-map-cycle-network

September 18, 2017

Grammarly helps in three areas

Grammarly helps in three areas: basic mechanics, such as spelling, grammar and sentence structure; secondly, clarity, readability and ambiguity. Finally, an area that the company is still developing is effectiveness, which is context-specific suggestions or flagging things like gendered or aggressive language. In the future, the app could do thinks like ask if a joke in your writing is appropriate.


Behind the scenes the service is processing loads of data--in April, it suggested 14 billion improvements across its service.

The tool parses text, breaking it up into phrases and sentences. It applies various algorithms to analyze the text using technology such as natural language processing and machine learning.

August 27, 2017

Algorithms that Facebook's censors use to differentiate between hate speech and legitimate political expression.

The algorithms that Facebook's censors use to differentiate between hate speech and legitimate political expression.

Julia Angwin, ProPublica, and Hannes Grassegger, special to ProPublica, June 28, 2017,
deconstruct their own clickbait:


A trove of internal documents reviewed by ProPublica sheds new light on the secret guidelines that Facebook's censors use to distinguish between hate speech and legitimate political expression. The documents reveal the rationale behind seemingly inconsistent decisions. For instance, Higgins' incitement to violence passed muster because it targeted a specific sub-group of Muslims -- those that are "radicalized" -- while Delgado's post was deleted for attacking whites in general.


Hoffman said the team also relied on the principle of harm articulated by John Stuart Mill, a 19th-century English political philosopher. It states "that the only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others." That led to the development of Facebook's "credible threat" standard, which bans posts that describe specific actions that could threaten others, but allows threats that are not likely to be carried out.

Eventually, however, Hoffman said "we found that limiting it to physical harm wasn't sufficient, so we started exploring how free expression societies deal with this."

Continue reading "Algorithms that Facebook's censors use to differentiate between hate speech and legitimate political expression." »

July 18, 2017

Facebook doesn't tell users everything it really knows about them

Facebook's page explaining "what influences the ads you see" says the company gets the information about its users "from a few different sources."

What the page doesn't say is that those sources include detailed dossiers obtained from commercial data brokers about users' offline lives. Nor does Facebook show users any of the often remarkably detailed information it gets from those brokers.

-- Julia Angwin, ProPublica.

July 15, 2017

Cheat Sheets for AI, Deep Learning, BigStats

Becoming Human, by Stefan Kojouharov.

July 7, 2017

AirBnB personalises, tunes search results

Airbnb learned over time that machine learning could be used to offer this personalization, Mike Curtis said. Airbnb introduced its machine learned search ranking model toward the end of 2014 and has been continuously developing it since. Today Airbnb personalizes all search results.

Airbnb factors in signals about the guests themselves, as well as guests similar to them, when offering up results.

For example, guests provide explicit signals in their search -- the length of stay, the number of bedrooms they need. But as they examine their search results, they may show interest in similar, desirable attributes that the guests themselves might not even notice.

"There's a bunch of other signals that you're giving us based on just which listings you click on," Curtis says. "For example, what kind of setting is it in? What kind of decor is in the house? These are things Airbnb can use to feed into the model to come up with a better prediction of which listings to show you first."

The company pulls well over a hundred signals into the search rank model, Curtis says, and then the machine learning algorithm figures out how all the signals interact, to produce personalized search rankings

March 29, 2017

Adsense for the masses

Major publishers admit to 'advertiser-friendly' skew:

If you want to take something good and make it less good, there's no more reliable method than to chop it up into tiny bits and then recombine them. A door made of particleboard isn't as strong as one made of solid pine. An MP3 of a song lacks the sonic richness of a high-fidelity record. A hamburger may or may not be as delicious as a rib-eye, depending on your personal taste, but it's definitely likelier to contain fecal bacteria and pink slime.

The global advertising industry is currently experiencing its own version of food poisoning from tainted ground beef. Johnson & Johnson, Verizon, and AT&T are among the giant marketers that have stopped buying ad space on Google's ad network and on YouTube in response to reports of ads appearing alongside hate speech, ISIS recruiting propaganda, and other objectionable content. Racing to contain the boycott, Google issued an apology on Tuesday and said it is taking steps to ensure greater "brand safety" in the future. Those steps include "taking a tougher stance on hateful, offensive and derogatory content," changing the default settings for ad campaigns, and giving marketers new controls allowing them to exclude specific websites or types of content from their campaigns.

Continue reading "Adsense for the masses" »

February 1, 2017

Flatiron Institute

Computers have been a fixture for decades in astrophysics and many other fields of science. But typically, the computer programs are written by graduate students, often abandoned after they finish their programs. "Those people aren't great coders, for the most part," Mr. Simons said.

At the Flatiron Institute, a good fraction of the staff will be professional computer programmers, producing software not only for the in-house scientists but also available for anyone else who needs it.

"These are really interesting questions, and we can think longer than the three-year grant cycle. They can tackle tough questions and put the time in that's necessary."


-- Marilyn Simons.

January 30, 2017

Propublica: breaking the black box what Facebook knows about you

Propublica's breaking the black box what Facebook knows about you.

January 26, 2017

Cambridge Analytica's psychographic profiling for behavioral microtargeting for election processes

Understand personality, not just demographics. OCEAN model: Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism.

In a 10 minute presentation at the 2016 Concordia Summit, Mr. Alexander Nix discusses the power of big data in global elections. Cambridge Analytica's revolutionary approach to audience targeting, data modeling, and psychographic profiling has made them a leader in behavioral micro-targeting for election processes around the world.

Cambridge's voter data innovations are built from a traditional five-factor model for gauging personality traits. The company uses ongoing nationwide survey data to evaluate voters in specific regions according to the OCEAN or CANOE factors of openness, conscientiousness, extroversion, agreeableness and neuroticism. The ultimate political application of the modeling system is to craft specific ad messages tailored to voter segments based on how they fall on the five-factor spectrum.

The number-crunching and analytics for Mr. Trump felt more like a "data experiment," said Matthew Oczkowski, head of product at Cambridge Analytica, who led the team for nearly six months.

Continue reading "Cambridge Analytica's psychographic profiling for behavioral microtargeting for election processes" »

August 3, 2016

Apple acquired Turi, a machine learning software startup

Apple has acquired Turi (the former GraphLab and Dato), a machine learning software startup. The startup formerly went by the names GraphLab and Dato.

Carlos Guestrin, cofounder and chief executive of the startup, serves as Amazon professor of machine learning at the University of Washington. Amazon has brought on machine learning talent in the years since 2012, when Guestrin got the position. The startup employed some former Microsoft employees, so it's fascinating to see Apple acquire it.

Apple has acquired several other machine learning startups recently, including Emotient, Perceptio, and VocalIQ.

Turi's competitors included Databricks, H2O, and Neo4j, among others. But because Turi has added artificial intelligence into its technology, it faced competition in that area as well.

July 31, 2016

Future is mining cloud data

The next big competition in cloud computing also involves artificial intelligence, fed by loads of data. Soon, Mr. Kurian said, Oracle will offer applications that draw from what it knows about the people whose actions are recorded in Oracle databases. The company has anonymized data from 1,500 companies, including three billion consumers and 400 million business profiles, representing $3 trillion in consumer purchases.

"Most of the world's data is already inside Oracle databases," said Thomas Kurian, , Oracle's president of product development

That's the kind of hold on people's information that perhaps only Facebook can match. But Mark Zuckerberg doesn't sell business software. At least, not yet.

July 8, 2016

Culture Digitally Facebook trending its made of people but we should-have-already-known-that/

Culturedigitally: facebook trending its made of people but we should have already known that.

July 2, 2016

Facebook makes the news

According to a statement from Tom Stocky, who is in charge of the trending topics list, Facebook has policies "for the review team to ensure consistency and neutrality" of the items that appear in the trending list.

But Facebook declined to discuss whether any editorial guidelines governed its algorithms, including the system that determines what people see in News Feed. Those algorithms could have profound implications for society. For instance, one persistent worry about algorithmic-selected news is that it might reinforce people's previously held points of view. If News Feed shows news that we're each likely to Like, it could trap us into echo chambers and contribute to rising political polarization. In a study last year, Facebook's scientists asserted the echo chamber effect was muted.

But when Facebook changes its algorithm -- which it does routinely -- does it have guidelines to make sure the changes aren't furthering an echo chamber? Or that the changes aren't inadvertently favoring one candidate or ideology over another? In other words, are Facebook's engineering decisions subject to ethical review? Nobody knows.

The other reason to be wary of Facebook's bias has to do with sheer size. Ms. Caplan notes that when studying bias in traditional media, scholars try to make comparisons across different news outlets. To determine if The Times is ignoring a certain story unfairly, look at competitors like The Washington Post and The Wall Street Journal. If those outlets are covering a story and The Times isn't, there could be something amiss about The Times's news judgment.

Such comparative studies are nearly impossible for Facebook. Facebook is personalized, in that what you see on your News Feed is different from what I see on mine, so the only entity in a position to look for systemic bias across all of Facebook is Facebook itself. Even if you could determine the spread of stories across all of Facebook's readers, what would you compare it to?

"Facebook has achieved saturation," Ms. Caplan said. No other social network is as large, popular, or used in the same way, so there's really no good rival for comparing Facebook's algorithmic output in order to look for bias.

What we're left with is a very powerful black box. In a 2010 study, Facebook's data scientists proved that simply by showing some users that their friends had voted, Facebook could encourage people to go to the polls. That study was randomized -- Facebook wasn't selectively showing messages to supporters of a particular candidate.

Facebook tinkered with users emotions in 2014 news feed experiment

NY Times Technology on Facebook's tinkering with users emotions in 2014 news feed experiment: outcry stirred.

http://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html.