There is some thing else I would like to point out as well , IBM folks on this thread can answer the question. This is regarding the "user modeling service" of Watson. I spoke to a couple of IBM folks and asked what are some of the coolest Apps they had seen that was built using Watson - and someone mentioned the following MSNBC article . It's Watson perception about the State of the Union Speech .What the user modelling service does - is to take text as input and sentiment analyze it ( and give outputs around it)
Conservation78%
Openness to change5%
Hedonism15%
Self-enhancement76%
Self-transcendence11%
========
Its plain gibberish , and you still get some results. I tried passing other text transliterated to English and Watson still gives results like this. I would expect it to atleast call it out as gibberish-text.
Sorry for the slow response, it's been internet years 8-)
We have an update coming for User Modeling (to be announced soon). After that update, such a gibberish post will return an error.
User Modeling is based on word counting. Users should ensure that their input is actually from a human and intelligible. The service looks for certain words in the input, and will reject input that doesn't have enough of those words for the service to estimate characteristics. In the upcoming release, the documentation will explain how this works and what the relevant words are.
Also, we will provide a measurement of how accurate our results are based on the number of words that are in the input. This should allow users to understand the reliability of the results in the context of their application (e.g. a casual movie recommender app might be ok with very low confidence, while an application that makes more critical recommendations might require higher confidence).
Yeah, you are right we should be filtering this sort of stuff out. The algorithms are robust in that they ignore words not in the system's vocabulary (rather than, say, crash) but we did not trap the case in which none of the "words" are familiar.
http://www.msnbc.com/msnbc/how-supercomputer-sees-the-state-...
What the folks @ MSNBC did was to pass the last 10 SOTU speeches to Watson and collate the results over a graph.
But, Here is why I have trouble believing Watson's perception. Try passing the following input to Watson - (or any other gibberish)
"jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfgk;gjldfg dgkjldfgdhfgkjdfhjg fkldskf;ksdlf;ksdlfks jkdhfkhsdjkfhksdhj ljfsdjfhdsjkfjskdhfkjsdhf sdfkls;dkfl; dkfl;sd;fsk roweruoweuroiweuroiwe uweoruweoruweo ruweuro kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdslgjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg dfsgj df gfgd flg;dfg g;fkglj sdfg kjgs fgjk ldfsj gs klfgjfds lgjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg fsgj df gf gdflg;dfg g;fkglj sdfg kjgsfgjkldfsjgs klfgjfd slgjfdlkg fd jkldsjglk fdsjgdfls kg;jsf g dsfg fdg jsdfjgdf skg fsgj df gfgdflg;dfg g;fkgl jsdfg kjgsfgjk ldfsjgs klfgjfdslgjfdlkg fd jkldsjg lkfdsj dfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj f gfgd flg;dfg g;fkgljsdfg kjgsfgjkldfsjgs
fkldskf;ksdlf;ksdlfks jkdhfkhsdjkfhksdhj ljfsdjfhdsjkfjskdhfkjsdhf sdfkls;dkfl; dkfl;sd;fsk roweruoweuroiweuroiwe uweoruweoruweo ruweuro kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdslgjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg dfsgj df gfgd flg;dfg g;fkglj sdfg kjgs fgjk ldfsj gs klfgjfds lgjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg fsgj df gf gdflg;dfg g;fkglj sdfg kjgsfgjkldfsjgs klfgjfd slgjfdlkg fd jkldsjglk fdsjgdfls kg;jsf g dsfg fdg jsdfjgdf skg fsgj df gfgdflg;dfg g;fkgl jsdfg kjgsfgjk ....
=====
And Watson rates it as the following.
Big 5
Openness100% Adventurousness100% Artistic interests2% Emotionality1% Imagination100% Intellect100% Authority-challenging100% Conscientiousness93% Achievement striving94% Cautiousness57% Dutifulness1% Orderliness1% Self-discipline81% Self-efficacy3% Extraversion1% Activity level1% Assertiveness1% Cheerfulness1% Excitement-seeking2% Outgoing1% Gregariousness1% Agreeableness1% Altruism1% Cooperation1% Modesty1% Uncompromising1% Sympathy1% Trust1% Emotional range11% Fiery1% Prone to worry10% Melancholy34% Immoderation24% Self-consciousness6% Susceptible to stress9%
Needs
Challenge61% Closeness84% Curiosity51% Excitement66% Harmony65% Ideal54% Liberty75% Love23% Practicality86% Self-expression25% Stability60% Structure57%
Values
Conservation78% Openness to change5% Hedonism15% Self-enhancement76% Self-transcendence11%
========
Its plain gibberish , and you still get some results. I tried passing other text transliterated to English and Watson still gives results like this. I would expect it to atleast call it out as gibberish-text.