I Heart CYC

Dec 1, 2008

Eliezer Tuesday:

38 Comments

May 15, 2023

Why not? As Robin says, the Cyc database is on the surface far more structured and useful than random English text. Are you envisioning a hard takeoff where the AI doesn't even bother with curated databases like Cyc and goes straight to reading Wikipedia's tagsoup and then Google Books & the open web?

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Robin Hanson:

"AIs that can parse and use CYC should be feasible well before AIs that can parse and use random human writings."I can't take that for granted.

Expand full comment

Robin Hanson

May 15, 2023

Marcello: "I must say, I'm rather perplexed about what it even means for some information not associated with any particular cognitive architecture to be knowledge." If we hope to be create AIs that can read human writings, we have all the more reason to hope to create AIs that can make use of CYC, since CYC is more structured and easier to parse. AIs that can parse and use CYC should be feasible well before AIs that can parse and use random human writings. If that means we expect such AIs to share some basic architectural features with humans and CYC, so be it.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

@Tim Tyler: I didn't know the Ing prize had expired. I stand corrected. With that said, the financial incentives still exist: owning the only company which could sell really good Go playing software would probably earn you more money than the Ing prize.

@Ben Jones: I presently don't have a blog. I am trying to optimize for becoming a useful AI researcher, and my current strategy involves taking lots of math classes. With that said, better ability to to communicate ideas like these looks useful, so I'll consider starting one.

@Robin Hanson: Having read the arguments in my second comment, have your opinions on whether architecture is overrated shifted?

Expand full comment

Overcoming Bias Commenter

May 15, 2023

you might be interested to watch this video:

http://video.google.co.uk/v...

Expand full comment

Tim Tyler

May 15, 2023

Uh, the full Ing Prize was for beating beating a Chinese-Taipei Go Professional - and the prize expired in the year 2000. That was safe money, if ever I saw it.

Expand full comment

Tim Tyler

May 15, 2023

if knowing lots is anywhere near as important as Lenat thinks, I'd expect serious AI attempts to import CYC's knowledge, translating it into a new representation. No other source has anywhere near CYC's size, scope, and integration.

Knowing lots is important - but there are other sources of knowledge besides Cyc. For example, Google have slurped up the entire internet, and scanned a substantial proportion of the books that have been published - but they haven't shown much interest in Cyc. Why would they? AFAICS, Cyc is a useless, unmaintainable, GOFAI mess.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Marcello, are you blogging somewhere? If not, why not?

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Take it easy, ESY!

It's not in the galaxy holding the star around which circles the planet whose continent contains the country in which lies the city that built the ballpark.

Before Singularity came to be, Mentifex am.

Now for a little progress report on Mentifex AI. Aw, never mind, it would just get deleted. (And the URL says it all anyway :-)

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Robin says: "But since we don't know exactly how to build a baby, we have to try to collect what knowledge we can, not knowing exactly which knowledge will be needed."

I must say, I'm rather perplexed about what it even means for some information not associated with any particular cognitive architecture to be knowledge.

I think the strings of symbols that are entered into Cyc loose their knowledge-ness in much the same sense as a dollar bill dropped on an island of tribal people who have never seen one looses its money-ness.

Money-ness isn't an intrinsic property of a physical dollar bill. For objects to have money-ness, you need a bunch agents trading them for things they value. Similarly, the fact that some bit-string is a piece of useful knowledge must not just be a fact about the bit-string, but a fact about how some agent interacts with the bit-string and then another system (which the bit-string was "knowledge about") in a way that results in the agent getting more of what it wants. E.g. a book of Go problems is useful knowledge (for me) about how to play Go if I could read the book, do the exercises, and then be a better Go player.

Speaking of Go, it is stupendously easier to write a book of Go problems which would help a motivated human become a strong-amateur Go player (this has been accomplished hundreds of times), than it is to write a computer program which plays strong-amateur-level Go (this has never been accomplished, despite the $1.6 million prize on offer). This despite programmers having access to a huge wealth of human knowledge about Go. (E.g. http://gobase.org/ )

I'd say this is pretty good evidence that getting ones hands on the information which would be knowledge to a human is the easy part, and processing the information in ways that would make the piece of information actually merit the name "knowledge" is the hard part. And what is an AI architecture if not the way in which an AI processes information?

Expand full comment

Overcoming Bias Commenter

May 15, 2023

CYC gives little cause for optimism for human level AI anytime soon

I heartily agree, and everything else gives me far less. People want to believe this will occur in their lifetime because it gives them some expectation their work is valuable, but our ignorance is far greater than our knowledge.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Wrt "suggestively named LISP tokens" and (illustrative quote) "replace each token with a gensym and what have you got?"

The concept that's missing here is an analog of what cryptographerscall "unicity distance" - how long a string of tokens must be beforethere's only one interpretation.[1] The same can conceptually beapplied to systems of propositions.[2]

For instance, let's borrow Sowa's favorite example, "The cat is on themat". Stripping the token names as false cues, it's really "The G1757is in relation G1758 to the G1759"[4]. It could equally well be "the dogis on the mat" or "the dog is under the sofa".

A large, non degenerate system will mention G1757 more than once. Itmight, for instance, mention the shape of G1757s' (cats') pupils [3].That narrows down the possible real-world referents for G1757,excluding dogs.

Of course, "pupils" isn't a given either. One would have to use otherpropositions, perhaps "pupils are part of the eye", geometricdescription of which part, and "eyes transduce electromagneticradiation".

Even with perfect real-world knowledge, the computing power requiredto determine the unique [5] match, if there is one, might be enormous.But we don't need to actually find them, any more than one needs todecrypt a given cyphertext in order to reason about it in theanalogous manner.

Footnotes:

[1] I'll list certain qualifiers in case anyone's common sense failsto produce them: That's on average. It's possible to produce longerambiguous strings, sometimes much longer, with effort or by exploitingdegeneracy.

[2] More common sense qualifiers: Measuring the size of a system oflogical propositions is not easy like measuring the length of a stringof tokens. A degenerate system can look big but really be small.Degeneracy, at least here, is a matter of degree - any large one is atleast a little degenerate. If a token is only used once, or just afew times, the system is probably quite degenerate, at least in regardto that token.

[3] No idea whether CYC actually does so.

[4] Really, hairier than that, but I'm taking "is", "the" and "inrelation X to" as not requiring interpretation.

[5] More common sense qualifiers: By "unique" here, I mean finding aunique human-scale interpretation of "The G1757 is in relation G1758to the G1759", not neccessarily figuring out which cat and which mat.That means that the "unicity distance" is a function of the precisenessof interpretation that we require. And I'm leaving out the issue ofcounterfactual contexts - FWIW, Cyc notes them explicitly, so I don't think there's a problem there.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Don Geddis:

Yes, of course, as we all know, there are thousands of well funded attempts to build GAI in the first place...

Seriously, I guess that the main problem is that nobody got that far to actually need Cyc.

Eliezer can have all nasty comments about Cyc, but the sad truth is, at least it is some effort to do something about GAI.

You can speculate about recursive selfimprovements forever - but that will not make them happen.

Expand full comment

Don Geddis

May 15, 2023

Robin writes: "Other architectures may well work better, but if knowing lots is anywhere near as important as Lenat thinks, I'd expect serious AI attempts to import CYC's knowledge, translating it into a new representation. No other source has anywhere near CYC's size, scope, and integration. But if so, how could CYC be such a waste?"

And yet that has never (?) happened, in the decade and a half that Cyc has been developed. No significant, serious, large scale -- but independent -- AI project has imported Cyc's knowledge base in order to jumpstart its own efforts.

If this failure does not convince you of the lack of value of Cyc's accomplishments, what would? Is your theory one of conspiracy, that all the other AI researchers in the world hate Lenat and Cyc so much, that they refuse to use the value in Cyc even if it would greatly boost their own projects?

Or is the more likely explanation that Cyc's database, in truth, contains very little value? And that's why nobody builds on it.

Expand full comment

Robin Hanson

May 15, 2023

Eliezer, we can make computers do lots of things we can't train chimps to do. Surely we don't want to limit AI research to only achieving chimp behaviors. We want to be opportunistic - developing whatever weak abilities have the best chance of leading later to stronger abilities. Answering encyclopedia questions might be the best weak ability to pursue first. Or it might not. Surely we just don't know, right?

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Okay... look at this way. Chimpanzees share 95% of our DNA and have much of the same gross cytoarchitecture of their brains. You cannot explain to chimpanzees that Paris is the capital of France. You can train them to hold up a series of signs saying "Paris", then "Is-Capital-Of", then "France". But you cannot explain to them that Paris is the capital of France.

And a chimpanzee's cognitive architecture is hugely more sophisticated than Cyc's. Cyc isn't close. It's not in the ballpark. It's not in the galaxy holding the star around which circles the planet whose continent contains the country in which lies the city that built the ballpark.

Expand full comment