I Heart CYC

Eliezer Tuesday:

EURISKO may still be the most sophisticated self-improving AI ever built – in the 1980s, by Douglas Lenat before he started wasting his life on Cyc.  … EURISKO lacked what I called "insight" – that is, the type of abstract knowledge that lets humans fly through the search space. 

I commented:

You ignore that Lenat has his own theory which he gives as the reason he’s been pursuing CYC. You should at least explain why you think his theory wrong; I find his theory quite plausible.

Eliezer replied only:

Artificial Addition, The Nature of Logic, Truly Part of You, Words as Mental Paintbrush Handles, Detached Lever Fallacy

The main relevant points from these Eliezer posts seem to be that AI researchers wasted time on messy ad-hoc non-monotonic logics, while elegant mathy Bayes nets approaches work much better, that it is much better to know how to generate specific knowledge from general principles than to just be told lots of specific knowledge, and that our minds have lots of hidden machinery behind the words we use; words as "detached levers" won’t work.  But I doubt Lenat or CYC folks disagree with any of these points.

The lesson Lenat took from EURISKO is that architecture is overrated;  AIs learn slowly now mainly because they know so little.  So we need to explicitly code knowledge by hand until we have enough to build systems effective at asking  questions, reading, and learning for themselves.  Prior AI researchers were too comfortable starting every project over from scratch; they needed to join to create larger integrated knowledge bases.  This still seems to me a reasonable view, and anyone who thinks Lenat created the best AI system ever should consider seriously the lesson he thinks he learned.

Of course the CYC project is open to criticism on its many particular choices.  People have complained about its logic-like and language-like representations, about its selection of prototypical cases to build from (e.g., encyclopedia articles), about its focus on answering over acting,  about how often it rebuilds vs. maintaining legacy systems, and about being private vs. publishing everything.

But any large project like this would produce such disputes, and it is not obvious any of its choices have been seriously wrong.  They had to start somewhere, and in my opinion they have now collected a knowledge base with a truly spectacular size, scope, and integration.

Other architectures may well work better, but if knowing lots is anywhere near as important as Lenat thinks, I’d expect serious AI attempts to import CYC’s knowledge, translating it into a new representation.  No other source has anywhere near CYC’s size, scope, and integration.  But if so, how could CYC be such a waste?

Architecture being overrated would make architecture-based fooms less plausible.  Given how small a fraction of our commonsense knowledge it seems to have so far, CYC gives little cause for optimism for human level AI anytime soon.  And as long as a system like CYC is limited to taking no actions other than drawing conclusions and asking questions, it is hard to see it could be that dangerous, even if it knew a whole awful lot.  (Influenced by an email conversation with Stephen Reed.)

Added:  Guha and Lenat in ’93:

The Cyc project … is not an experiment whose sole purpose is to test a hypothesis, rather it is an engineering effort, aimed at constructing an artifact. … The artifact we are building is a shared information resource, which many programs can usefully draw upon.  Ultimately, it may suffice to be the shared resource…. If there is a central assumption behind Cyc, it has to do with Content being the bottleneck or chokepoint to achieving AI. I.e., you can get just so far twiddling with … empty AIR (Architecture, Implementation, Representation.) Sooner or later, someone has to bite the Content bullet. … The Implementation is just scaffolding to facilitate the accretion of that Content. … Our project has been driven continuously and exclusively by Content. I.e., we built and refined code only when we had to. I.e., as various assertions or behaviors weren’t readily handled by the then-current implementation, those needs for additional representational expressiveness or efficiency led to changes or new features in the Cyc representation language or architecture.

At the bottom of this page is a little box showing random OpenCYC statements "in its best English"; click on any concept to see more.  OpenCYC is a public subset of CYC.

GD Star Rating
loading...
Tagged as:
Trackback URL:
  • Stefano Bertolo

    actually, Cyc is quite capable of taking actions that can be performed by the operating system of the machine on which it is running. for example, Cyc can send e-mail messages, can log its own bugs on a bug tracking systems such as Bugzilla (I personally wrote that module), etc… It should be reasonably straightforward to program it to do other things as well.

  • http://profile.typekey.com/sentience/ Eliezer Yudkowsky

    So my genuine, actual reaction to seeing this post title was “You heart WHAT?

    Knowledge isn’t being able to repeat back English statements. This is true even of humans. It’s a hundred times more true of AIs, even if you turn the words into tokens and put the tokens in tree structures.

    A basic exercise to perform with any supposed AI is to replace all the English names with random gensyms and see what the AI can still do, if anything. Deep Blue remains invariant under this exercise. Cyc, maybe, could count – it may have a genuine understanding of the word “four” – and could check certain uncomplicatedly-structured axiom sets for logical consistency, although not, of course, anything on the order of say Peano Arithmetic. The rest of Cyc is bogus. If it knows about anything, it only knows about certain relatively small and simple mathematical objects, certainly nothing about the real world.

    You can’t get knowledge into a computer that way. At all. Cyc is composed almost entirely of fake knowledge (barring anything it knows about certain simply-structured mathematical objects).

    As a search engine or something, Cyc might be an interesting startup, though I certainly wouldn’t invest in it. As an Artificial General Intelligence Cyc is just plain awful. It’s not just that most of it is composed of suggestively named LISP tokens, there are also the other hundred aspects of cognition that are simply entirely missing. Like, say, probabilistic reasoning, or decision theory, or sensing or acting or –

    – for the love of Belldandy! How can you even call this sad little thing an AGI project?

    So long as they maintained their current architecture, I would have no fear of Cyc even if there were a million programmers working on it and they had access to a computer the size of a moon, any more than I would live in fear of a dictionary program containing lots of words.

    Cyc is so unreservedly hopeless, especially by comparison to EURISKO that came before it, that it makes me seriously wonder if Lenat is doing something that I’m not supposed to postulate because it can always be more simply explained by foolishness rather than conspiracy.

    Of course there are even sillier projects. Hugo de Garis and Mentifex both come to mind.

  • Anonymous Coward

    And as long as a system like CYC is limited to taking no actions other than drawing conclusions and asking questions, it is hard to see it could be that dangerous, even if it knew a whole awful lot.

    I too have been thinking along these lines for a while.

    If friendly AI is too tough to design, how about completely passive AI?

    Anonymous.

  • bbb

    I want to criticize Eliezers hypothesis that an AI will, after rapidly developing itself, be able to take over the world. I want to do this on Hayeks disticntion between two kinds of knowledge: knwoledge as theories, and knowledge about specific circumstances of time and space. It seems to me that Eliezers hypothesis is based entirely on the first kind of knowledge, and neglects wholly the second kind.

    To show why I think Eliezers hypothesis is wrong, let me first try to state a missing theoretical link in Eliezers hypothesis as I stated it. Why will a superintelligent AI be in fact able to take over the world? Where is the link between intelligence and world domination?

    As I see it, Eliezer seems to suppose (correct me if I am wrong), that the AI will use its higher intelligence to simulate various possible futures and try to influence the course of actions in the world according to its own interests. It might also be able to expand its range of possible actions by enganging in market activity, in which it would have an enorimous advantadge over its human competitors, thus increasing its resources and scope for action.
    The key point is that the AI would use its higher intelligence to look further into the future and calculate more ramified consequences of its actions than humans would be able to. With its enormous intelligence it would also be able to calculate how humans would behave in response to each other and in response to its own actions, and just “pick” the preferred oggregated outcome.

    The way in which the intelligence would calculate the future and pick the preferred outcome would be to “simply” simulate all relevant and possible futures, given information about the conditions at the starting point, and its own actions. That is the same mechanism it would use to improve itself: first it would construct different “better” versions of itself, using theoretical insights, but then it would have to “test” their performance in reaching its goals in a simulated version of the world. There is no other way of assessing the “betterness” of an improved version of itself, intellligence is to be measured on the efficiency of goal-attainment in very general cirumstances, as Eliezer has correctly explained. An indication that this stylized mechanism of self-improvement is Eliezers view, is his frequent pointing at computing power.

    If this is Eliezers argument, I think it is flawed, because it fails to take into account the impossibility of acquiring the relevant dispersed specific knowledge of space and time, which would be an absolute necessity for accurate simulations of the future needed in both self-improvement and in wolrd-domination. However, as Hayek stated, it is impossible to acquire all the relevant dispersed knowledge which would be needed to effectively plan the future. Neither increases in computing power and a growth of the internet, nor better statistical modeling techniques will change this fact.

    I think that Eliezer completely neglects this fact, because he focuses only on “theoretical” knowledge, knowledge which can be stated as “if-then-hypotheses” and mathematical formulas, and thus on a very “abstract” notion of intelligence. However the “effectiveness” of an intellgence in action, the extent to which its actions will be successful according to its own goals, does only depend to a small extent on the body of abstract hypotheses it has accumulated, but to a much larger extent on how much “information” about the world it is able to incorporate into its predictions. The same is true for the Process of self-improvement of an AI.

    Note that abstract hypotheses can themselves be a store for a lot of specific knowledge. Humans store a lot of information about the world in their (unconsciously followed) “if-then”-rules-of-conduct. However, it is impossible to use these abstract rules to further improve the efficiency of human conduct or intelligence itself. An abstract simulation of the world with the goal of findig better rules of conduct for the AI cannot gain any new knowledge about the world by using this rule-stored-knowledge. The efficiency of new rules of conduct, or of cognitive or behavioral algorithms IN THE REAL WORLD cannot be tested inside a SIMULATION of the real world. This simulation will only be able to ascertain their comparative efficiency in the simulated world, but not in the real world. This is so because of the crucial importance of dispersed, specific knowledge.

    I think that an abstract discussion of the evolution of intelligence that fails to take into account the role of this information and focuses on abstract knowledge only misses the point.

  • http://cabalamat.wordpress.com/ Philip Hunt

    AIs learn slowly now mainly because they know so little

    No, they learn slowly (if at all) now because they are stupid. Intelligence is about understanding, not knowledge — Wikipedia and Google have lots of knowledge but no understanding.

    If an AGI has the ability to learn, not just to learn facts but to learn to understand things, then it’s like a human child, cabable in time of understanding the world it lives in.

    Cyc is more like Wikipedia or Freebase than it is like an AI.

  • http://supermodelling.net derekz

    Not that this is a poll or anything but I have to agree with Eliezer about CYC. I don’t see how CYC can simultaneously be ” a knowledge base with a truly spectacular size, scope, and integration” and also “how small a fraction of our commonsense knowledge it seems to have so far”… it is *something* spectacular due to the amount of work put into it, but I don’t think it has much “knowledge” at all because that isn’t what people are putting into it.

    Everybody has their own reasons for *why* so much effort yielded so little (mine is that the knowledge representation formalism they chose is really poor at hosting effective models of most aspects of the actual world we live in).

    I don’t think it’s quite right to spread the pessimism resulting from analysis of CYC to all of AI, but it does illustrate that until an idea demonstrates its effectiveness in the real world (instead of in toy worlds or theoretically) it does not deserve much credence as a “solution” for AI no matter how smart its proponent or how reasonable the idea sounds.

  • http://hanson.gmu.edu Robin Hanson

    Philip, when you understand better, you know more. So a system that understands little knows little, and that can be its main problem.

    Stefano, I’m sure CYC can take some actions, but it is hard to see a desire to log bug reports as threatening the safety of the world.

    Eliezer, conversation is action. Replacing every word you spoke or heard with a new random gensym would destroy your ability to converse with others. So that would be a terrible way to test your true knowledge that enables your conversation. I’ll grant that an ability to converse is a limited ability, and the ability to otherwise act effectively greatly expands one’s capability and knowledge.

  • http://hanson.gmu.edu Robin Hanson

    Derek, if AI is a truly hard problem, even a spectacular contribution could still be only a drop in its enormous bucket.

  • luzr

    Eliezer:

    With all respect, I think you are completely missing the point.

    I guess that the short synopsis of the article is: “we need weak AI to develop strong AI”.

    CYC algorithms and structure will certainly not lead to strong AI. But we can use its database to form one.

    E.g., we might try to create strong AI as some sort of genetic algorithm. But in that case, you need FEEDBACK. You can of course provide “manual” feedback to it, carefully inventing inputs and examining outputs of each ‘organism’.

    Or you can use weak AI to this for you million times faster.

    (Note that I am not saying that GA is the way to go – it might or might not be. That is not the point. You can train bayesian filters instead.)

  • http://cabalamat.wordpress.com/ Philip Hunt

    Robin: Philip, when you understand better, you know more.

    When I understand more I have more procedural knowledge. I don’t necessarily have much more declarative knowledge. And if I did have more declarative knowledge, it mightn’t lead to understanding.

    So a system that understands little knows little

    If I understand you correctly, this is factually inaccurate, because there are systems that know a lot, i.e. have lots of declarative knowledge (e.g. Wikipedia), but don’t have much or any procedural knowledge.

    Eliezer, conversation is action. Replacing every word you spoke or heard with a new random gensym would destroy your ability to converse with others. So that would be a terrible way to test your true knowledge that enables your conversation.

    I think what Eliezer was trying to get at is the difference between using words like a human does and knowing what they mean and using words like a simplistic computer program does without having any deep underlying knowledge of what the words mean.

    When a human uses the word “cat” we normally assume that the human would be able to draw a picture or a cat, would be able to recognise a cat on seeing one or a picture of one, would be able to act as a catsitter, effectively keeping a cat supplied with food and water. Could Cyc do any of these (the last if Cyc had a robot body)? I think not.

    When humans use a word we take it for granted that the have some deep knowledge of the sort I mention behind that word. But when a computer program uses a word, we can’t take it for granted — the copmuter could be just faking it. That’s why Eliezer’s test makes sense.

    My cat can recognise another cat and respond to it. It can also recognise humans, mice, birds, and respond to them in approriate ways (for a cat). I think my cat has more deep knowledge about the world, more understanding, more intelligence, than Cyc does.

  • luzr

    Phil Hunt:

    “When a human uses the word “cat” we normally assume that the human would be able to draw a picture or a cat, would be able to recognise a cat on seeing one or a picture of one, would be able to act as a catsitter, effectively keeping a cat supplied with food and water. Could Cyc do any of these (the last if Cyc had a robot body)? I think not.”

    I do not buy this argument. Do you think that blind people do not know what the cat is?

    If you want to argue that they can still touch it or smell it… What about whale? (Or substitute any animal they are hardly to ever come into direct contact with).

  • http://cabalamat.wordpress.com/ Philip Hunt

    luzr: I do not buy this argument. Do you think that blind people do not know what the cat is?

    Their ability to process data and make decisions intelligently regarding a cat is reduced. I’d be reluctant to employ a blind person as a catsitter, for example.

    If you want to argue that they can still touch it or smell it… What about whale?

    Their ability to process data intelligently is reduced to a level not much above Cyc’s. So a blind person’s level of intelligence with regard to the domain of whales is rather low.

  • luzr

    Phil Hunt:

    [quote]
    My cat can recognise another cat and respond to it. It can also recognise humans, mice, birds, and respond to them in approriate ways (for a cat). I think my cat has more deep knowledge about the world, more understanding, more intelligence, than Cyc does.
    [/quote]

    OK, following you logic: cat has more deep knowledge about the world, more understanding, more intelligence, than blind person does. Correct?

  • http://cabalamat.wordpress.com/ Philip Hunt

    luzr; OK, following you logic: cat has more deep knowledge about the world, more understanding, more intelligence, than blind person does. Correct?

    No. Although a cat quite possibly does have better domain-specific data processing ability in certain domain than a blind human. That’s because (1) the blind human doesn’t have access to all the relevant data, and (2) if the human was blind from birth, it’s possible that the parts of their brain dealing with spacial relations haven’t developed very well, and therefore the cat may well be superior in those domains.

  • Tyrrell McAllister

    OK, following you logic: cat has more deep knowledge about the world, more understanding, more intelligence, than blind person does. Correct?

    luzr, that shouldn’t be a controversial statement, at least with regard to some domains of knowledge. A typical cat probably has more knowledge about some things than almost any human alive. Consider, for example, the problem of how the cat can best interact with the local cat dominance hierarchy. A human cat specialist might know that, but I doubt that any other human could do as well. One assumes that it requires judging all sorts of subtelties of smell and posture and voice.

  • Marcello

    “The lesson Lenat took from EURISKO is that architecture is overrated; AIs learn slowly now mainly because they know so little. So we need to explicitly code knowledge by hand until we have enough to build systems effective at asking questions, reading, and learning for themselves.”

    Robin:

    The human genome fits on a single CD-ROM. Yet a human baby can learn fast. If you do not attribute this feat to the baby’s brain having a good architecture, then what on earth *do* you attribute it to?

    A baby doesn’t know that, say, Paris is the capital of France or that Bill Clinton is a president. Therefore, Cyc theoretically shouldn’t need that information. Yet here they are:
    http://sw.opencyc.org/concept/Mx4rvVj5jJwpEbGdrcN5Y29ycA
    http://sw.opencyc.org/concept/Mx4rwQBp5JwpEbGdrcN5Y29ycA

    Even if one characterizes what a baby is doing as reasoning from an initial base of “common sense statements” present at birth/appearing by maturation (which is incomplete at best, because babies learn from their environment) we already know that this supposed bank of statements must:
    – Not contain things that look like “Paris is the capital of France” or “Bill Clinton is a President”
    – Be highly compressible: the whole thing had to fit on a CD-ROM (and that’s if the entire human genome got enlisted into building the database.)

  • luzr

    Tyrrell:

    I do not have a problem with the word “knowledge”, but “intelligence”.

    It seems to me that this argument implies that in order to “be intelligent” and understand the meaning of words, you need human-like (or animal-like) senses.

    IMHO, that is not correct. What you need is information on input and ASCII text is as good as anything else. The intelligence is the ability to process informations and react accordingly. If the information is only text and reaction is another text is irrelevant, as long as responses are, well, uhm, intelligent…

    Note that this is not quite related to Cyc – I agree that Cyc itself will never develop into strong AI. It is just a tool.

    Marcello:

    That is right. But it will be very handy to have Cyc around to make the baby learn VERY fast – because humans are oh so slow….

    Especially, if you are about to test billion babies to find out witch one works.

  • Benja Fallenstein

    Marcello, I more or less share your opinion, but the genome fitting on a CD-ROM doesn’t seem impressive evidence to me; Matt Mahoney argues in his rationale for the large compression benchmark (related to the Hutter Prize) that “Assuming you spend several hours a day reading, writing, talking, or listening, you process about a gigabyte of language in your lifetime.” (I do remember reading that for the first time and thinking, that little?!?) I think that’s after (lossless!) compression to 1 bit per character, but if you multiply by five, there’s still a sizable fraction of it that fits on a CD. (Again, that’s before extracting the useful concepts from the noise.)

    Of course, that’s words, and brains have to do full audio/video/etc processing, but still.

  • http://hanson.gmu.edu Robin Hanson

    Marcello, the human genome clearly contains knowledge, some of which may be thought of as embodied in its architecture, and yes that demonstrates than an AI should be possible without being seeded with specific knowledge. But since we don’t know exactly how to build a baby, we have to try to collect what knowledge we can, not knowing exactly which knowledge will be needed. CYC only knows that Paris is the capital of France as an example, to help its builders describe more general concepts, like “capital”.

  • Chad

    Eliezer said:

    “- for the love of Belldandy! How can you even call this sad little thing an AGI project?”

    Robin didn’t actually make that claim in his post — the level of indignation in Eliezer’s response seems a bit misplaced to me.

    I think Robin’s point is that a repository of “factoids” could provide a useful kernel of content that an AGI program could use to skip over some part of its knowledge bootstrapping process.

    It’s likely that Eliezer believes that a true AGI would move past the need for whatever pre-digested knowledge could be collected in any such repository so quickly as to make the effort of building such a collection worthless — it would just go out and read Wikipedia, the Library of Congress, etc. etc., directly.

    My personal take: CYC could be very useful resource for weak AI projects which could provide enough value to justify the resources expended upon both CYC and those other weak AI projects (while we wait for true AGI to come around). However, I agree with Eliezer that CYC or other similar efforts are unlikely to play a significant role in true AGI.

    In addition, Eliezer possibly has concerns that efforts like CYC might detract from real AGI work — perhaps this is the source of the animus displayed in his response above. I am not sure I agree with this concerns, but then again I am not out there in the trenches trying to work on real AGI.

  • Doug S.

    So, the purpose of CYC is to give an AI like Eurisko (only a million times better) something to think about?

  • http://profile.typekey.com/halfinney/ Hal Finney

    Isn’t the theory that systems like Cyc can be said to exhibit understanding not because of suggestively-named tokens, but because a wide variety of tokens stand in the same relationship to each other as the corresponding concepts do in the real world? Suppose Cyc “knows” that humans need food, water and sleep, that the first comes in a wide variety, the second is generally uniform but of varying quantities and sizes, and that the third is an activity which has a duration. Then if we are told that X needs A, B, and C, that A comes in a wide variety, that B is a mass quantity of varying sizes, and that C is an activity with a duration, and if we had sufficient more pieces of information like this, eventually all becoming inter-related in complex ways, then we might be able to deduce that X was probably humans and A, B, and C were food, water and sleep. If Cyc or any system could achieve this degree of inter-related factual knowledge then do you think that should be enough to grant it some level of understanding?

  • http://profile.typekey.com/sentience/ Eliezer Yudkowsky

    Okay… look at this way. Chimpanzees share 95% of our DNA and have much of the same gross cytoarchitecture of their brains. You cannot explain to chimpanzees that Paris is the capital of France. You can train them to hold up a series of signs saying “Paris”, then “Is-Capital-Of”, then “France”. But you cannot explain to them that Paris is the capital of France.

    And a chimpanzee’s cognitive architecture is hugely more sophisticated than Cyc’s. Cyc isn’t close. It’s not in the ballpark. It’s not in the galaxy holding the star around which circles the planet whose continent contains the country in which lies the city that built the ballpark.

  • http://hanson.gmu.edu Robin Hanson

    Eliezer, we can make computers do lots of things we can’t train chimps to do. Surely we don’t want to limit AI research to only achieving chimp behaviors. We want to be opportunistic – developing whatever weak abilities have the best chance of leading later to stronger abilities. Answering encyclopedia questions might be the best weak ability to pursue first. Or it might not. Surely we just don’t know, right?

  • http://don.geddis.org/ Don Geddis

    Robin writes: “Other architectures may well work better, but if knowing lots is anywhere near as important as Lenat thinks, I’d expect serious AI attempts to import CYC’s knowledge, translating it into a new representation. No other source has anywhere near CYC’s size, scope, and integration. But if so, how could CYC be such a waste?”

    And yet that has never (?) happened, in the decade and a half that Cyc has been developed. No significant, serious, large scale — but independent — AI project has imported Cyc’s knowledge base in order to jumpstart its own efforts.

    If this failure does not convince you of the lack of value of Cyc’s accomplishments, what would? Is your theory one of conspiracy, that all the other AI researchers in the world hate Lenat and Cyc so much, that they refuse to use the value in Cyc even if it would greatly boost their own projects?

    Or is the more likely explanation that Cyc’s database, in truth, contains very little value? And that’s why nobody builds on it.

  • luzr

    Don Geddis:

    Yes, of course, as we all know, there are thousands of well funded attempts to build GAI in the first place…

    Seriously, I guess that the main problem is that nobody got that far to actually need Cyc.

    Eliezer can have all nasty comments about Cyc, but the sad truth is, at least it is some effort to do something about GAI.

    You can speculate about recursive selfimprovements forever – but that will not make them happen.

  • Tom Breton

    Wrt “suggestively named LISP tokens” and (illustrative quote) “replace each token with a gensym and what have you got?”

    The concept that’s missing here is an analog of what cryptographers
    call “unicity distance” – how long a string of tokens must be before
    there’s only one interpretation.[1] The same can conceptually be
    applied to systems of propositions.[2]

    For instance, let’s borrow Sowa’s favorite example, “The cat is on the
    mat”. Stripping the token names as false cues, it’s really “The G1757
    is in relation G1758 to the G1759″[4]. It could equally well be “the dog
    is on the mat” or “the dog is under the sofa”.

    A large, non degenerate system will mention G1757 more than once. It
    might, for instance, mention the shape of G1757s’ (cats’) pupils [3].
    That narrows down the possible real-world referents for G1757,
    excluding dogs.

    Of course, “pupils” isn’t a given either. One would have to use other
    propositions, perhaps “pupils are part of the eye”, geometric
    description of which part, and “eyes transduce electromagnetic
    radiation”.

    Even with perfect real-world knowledge, the computing power required
    to determine the unique [5] match, if there is one, might be enormous.
    But we don’t need to actually find them, any more than one needs to
    decrypt a given cyphertext in order to reason about it in the
    analogous manner.

    Footnotes:

    [1] I’ll list certain qualifiers in case anyone’s common sense fails
    to produce them: That’s on average. It’s possible to produce longer
    ambiguous strings, sometimes much longer, with effort or by exploiting
    degeneracy.

    [2] More common sense qualifiers: Measuring the size of a system of
    logical propositions is not easy like measuring the length of a string
    of tokens. A degenerate system can look big but really be small.
    Degeneracy, at least here, is a matter of degree – any large one is at
    least a little degenerate. If a token is only used once, or just a
    few times, the system is probably quite degenerate, at least in regard
    to that token.

    [3] No idea whether CYC actually does so.

    [4] Really, hairier than that, but I’m taking “is”, “the” and “in
    relation X to” as not requiring interpretation.

    [5] More common sense qualifiers: By “unique” here, I mean finding a
    unique human-scale interpretation of “The G1757 is in relation G1758
    to the G1759”, not neccessarily figuring out which cat and which mat.
    That means that the “unicity distance” is a function of the preciseness
    of interpretation that we require. And I’m leaving out the issue of
    counterfactual contexts – FWIW, Cyc notes them explicitly, so I don’t think there’s a problem there.

  • Lord

    CYC gives little cause for optimism for human level AI anytime soon

    I heartily agree, and everything else gives me far less. People want to believe this will occur in their lifetime because it gives them some expectation their work is valuable, but our ignorance is far greater than our knowledge.

  • Marcello

    Robin says: “But since we don’t know exactly how to build a baby, we have to try to collect what knowledge we can, not knowing exactly which knowledge will be needed.”

    I must say, I’m rather perplexed about what it even means for some information not associated with any particular cognitive architecture to be knowledge.

    I think the strings of symbols that are entered into Cyc loose their knowledge-ness in much the same sense as a dollar bill dropped on an island of tribal people who have never seen one looses its money-ness.

    Money-ness isn’t an intrinsic property of a physical dollar bill. For objects to have money-ness, you need a bunch agents trading them for things they value. Similarly, the fact that some bit-string is a piece of useful knowledge must not just be a fact about the bit-string, but a fact about how some agent interacts with the bit-string and then another system (which the bit-string was “knowledge about”) in a way that results in the agent getting more of what it wants. E.g. a book of Go problems is useful knowledge (for me) about how to play Go if I could read the book, do the exercises, and then be a better Go player.

    Speaking of Go, it is stupendously easier to write a book of Go problems which would help a motivated human become a strong-amateur Go player (this has been accomplished hundreds of times), than it is to write a computer program which plays strong-amateur-level Go (this has never been accomplished, despite the $1.6 million prize on offer). This despite programmers having access to a huge wealth of human knowledge about Go. (E.g. http://gobase.org/ )

    I’d say this is pretty good evidence that getting ones hands on the information which would be knowledge to a human is the easy part, and processing the information in ways that would make the piece of information actually merit the name “knowledge” is the hard part. And what is an AI architecture if not the way in which an AI processes information?

  • http://code.google.com/p/mindforth/wiki/AiHasBeenSolved Mentifex

    Take it easy, ESY!

    It’s not in the galaxy holding the star around which circles the planet whose continent contains the country in which lies the city that built the ballpark.

    Before Singularity came to be, Mentifex am.

    Now for a little progress report on Mentifex AI. Aw, never mind, it would just get deleted. (And the URL says it all anyway 🙂

  • Ben Jones

    Marcello, are you blogging somewhere? If not, why not?

  • Tim Tyler

    if knowing lots is anywhere near as important as Lenat thinks, I’d expect serious AI attempts to import CYC’s knowledge, translating it into a new representation. No other source has anywhere near CYC’s size, scope, and integration.

    Knowing lots is important – but there are other sources of knowledge besides Cyc. For example, Google have slurped up the entire internet, and scanned a substantial proportion of the books that have been published – but they haven’t shown much interest in Cyc. Why would they? AFAICS, Cyc is a useless, unmaintainable, GOFAI mess.

  • Tim Tyler

    Uh, the full Ing Prize was for beating beating a Chinese-Taipei Go Professional – and the prize expired in the year 2000. That was safe money, if ever I saw it.

  • http://www.transhumangoodness.blogspot.com Roko
  • Marcello

    @Tim Tyler: I didn’t know the Ing prize had expired. I stand corrected. With that said, the financial incentives still exist: owning the only company which could sell really good Go playing software would probably earn you more money than the Ing prize.

    @Ben Jones: I presently don’t have a blog. I am trying to optimize for becoming a useful AI researcher, and my current strategy involves taking lots of math classes. With that said, better ability to to communicate ideas like these looks useful, so I’ll consider starting one.

    @Robin Hanson: Having read the arguments in my second comment, have your opinions on whether architecture is overrated shifted?

  • http://hanson.gmu.edu Robin Hanson

    Marcello: “I must say, I’m rather perplexed about what it even means for some information not associated with any particular cognitive architecture to be knowledge.”

    If we hope to be create AIs that can read human writings, we have all the more reason to hope to create AIs that can make use of CYC, since CYC is more structured and easier to parse. AIs that can parse and use CYC should be feasible well before AIs that can parse and use random human writings. If that means we expect such AIs to share some basic architectural features with humans and CYC, so be it.

  • http://causalityrelay.wordpress.com/ Vladimir Nesov

    Robin Hanson:

    “AIs that can parse and use CYC should be feasible well before AIs that can parse and use random human writings.”

    I can’t take that for granted.

    • gwern

      Why not? As Robin says, the Cyc database is on the surface far more structured and useful than random English text. Are you envisioning a hard takeoff where the AI doesn’t even bother with curated databases like Cyc and goes straight to reading Wikipedia’s tagsoup and then Google Books & the open web?

  • http://yahoo.com annukumar

    plz send me cyc algorithm answer

  • http://yahoo.com annukumar

    explain cyc algorithm…………………………………………………………………………………….

  • Pingback: Overcoming Bias : Debating Yudkowsky