On Error Goto:Some Thoughts On Thinking Machinesby L. J. Hurst |
|
At one time if you could not afford to use an expert, or did not know one you could approach in your area, you still had access to expertise in books. Those books were, in today's sense, knowledge bases, full of facts, and their title pages would tell you they were written by experts - "A Doctor", "A Barrister" and so on. So you could treat your family for illnesses or make your will or calculate if you were paying the right amount of income tax with a reasonable confidence. However, if the information in books like that is wrong, then the diagnosis they encourage you to make is going to be wrong, and the consequences possibly serious. It is not that knowledge bases may get their facts wrong, though, that should matter in Information Systems, it is how they get it wrong. Some recent introductions to Artificial Intelligence raise this problem unwittingly and without answering it. The problem is, are there different kinds of error? If there are, should they be allowed to continue in their peculiarities or should they be reduced to a single class of wrongness? These questions started to worry me after my search for artificial intelligence began by reading Igor Aleksandr and Piers Burnett's Thinking_Machines: The_Search_for_Artificial_Intelligence. The answers to them may reflect on the trust we can place in Expert Systems and AI. A book is a good example of a knowledge base, and this one provides a good subject because at least one of its authors is an expert: Igor Aleksandr is Kobler Professor in the Management of Information Technology at Imperial College. It ought to be trustworthy but then I started to find things that were wrong. How do you use or trust a knowledge base when the information in it is in error? For instance, errors in a passage like this one:- "Let us suppose that the following line is added to a program written in BASIC:
500 IF (P=<20)*(Q>5):GOTO 800
This contains at least two errors: firstly, the syntax of a line of BASIC is wrong - it would not interpret or compile and run on any machine that I know; secondly, after correcting the program syntax, the English explanation is not an explanation of the logic in line 500 (their description requires the line to read 500 IF P<20 and Q>5 THEN GOTO 800 ); (thirdly, perhaps, the grammar of the explanation is dodgy - "continue" does not require the following "on"). So this knowledge base already holds two types of error - firstly, a factual error. The line of code claimed to be a line of BASIC program is no such thing. Secondly, a description in English is given which is asserted to be a description of that line in BASIC - that is, different elements are meant to be synchronous within the KB and they are not. If both the BASIC and the English could be reduced to logical symbols they ought to be identical, but attempt to make that reduction and still they don't match. Which is the right statement of the intended logic? How can anybody tell? Even so, the whole seems to consist of two types of error, for one cannot be explained in terms of the other. This problem becomes more confusing in the chapter on Knowledge Bases. The authors build an example out of the film Casablanca. They use a set of LISP-like commands to link the actors, the actors' famous marriages, the film's famous song, its well-known quotations, and a work inspired in turn by the film - Woody Allen's Play_It_Again,_Sam. Their knowledge base of quotations - they give two - "Play it, Sam" and "Arrest the usual suspects" -is oddly wrong. The latter is, simply, factually wrong, while the former is wrong because it is right. In Casablanca, Captain Louis Reynaud, the Vichy police chief played by Claud Rains, gave the order "Round up the usual suspects", he did not say "Arrest the usual suspects". Ingrid Bergman did say "Play it, Sam", but everyone misquotes Humphrey Bogart as saying "Play it again, Sam" rather than his lines "You played it for her. You can play it me. Play it". To correct this inconsistency you have to suppose that the authors imply a missed set of data - misquotations - so you go from set 1, quotations, through set 2, misquotations, to set 3, titles inspired by misquotation. But, then, should not "Arrest the usual suspects" be in set 2, not in set 1, that is in the implied set, not in real, printed one? The set of worldwide misquotations is huge: "Elementary, my dear Watson", "Alas, poor Yorick, I knew him well" and "Blood, sweat and tears" are other well-known misquotations, but I've never come across "Round up the usual suspects" misquoted before, and I doubt if the authors intended to imply any similarity between the status of their examples. They just misremembered, or plain got it wrong when they wrote it down. Now imagine that all the information in this book was supplied as the basis of a computerised system - from the information it had available could anybody say that the output would be trustworthy? The obvious answer to this is to identify "Garbage in, garbage out" as the problem, and try to limit the garbage going in by fitting the system with validity checks. The textbooks list them - range checks, picture checks, check digits etc etc - but at least part of the time the types of error thrown up by Aleksandr and Burnett are much more abstract than could be checked by any rule base: the difference between quotation and misquotation, and then between valid misquotation and invalid misquotation, seems to me huge and important and not really subject to automation in any way with which we are acquainted. (Think of all the stages a book goes through before publication - note making, writing, revision, submission, refereeing, editing, typesetting, proofreading. Despite all these stages, where anyone could have identified the errors, they still slipped through). The structure of the book is corrupted. In fact, in the case of this book, it is even worse. On the title page, and on the dust jacket the book's title is given as I have printed it. On the spine it is abbreviated to Thinking_Machines. Yet in the British Library Cataloguing in Publication Data on the copyright page the title is printed as "Thinking machines: a search for artificial intelligence" - the information about itself that this book feeds to another knowledge base is corrupt. Surely while different editions of a book may have different titles, a single editon should only have one? I know that allowing for synonymity is important for easy access and retrieval of data. For instance, the movie buff says "What was Captain Reynaud's order in Casablanca - something like 'Arrest the usual suspects'?" and the elaborate retrieval system goes away looking for synonyms of "arrest" and "suspect" and can find "Round up the usual suspects". However, the thesaurus is only intervening in the search, it is not true that the words "Arrest the usual suspects" are heard in the film. The thesaurus is just helping to find the data. The knowledge store must contain only truths if it is to be trustworthy. Expert Systems, from the few I've seen, are limited by their rules base, slowed by their file handling and the funny things they do with screens. When they overcome these present problems, though, Expert Systems builders must come onto the bigger problem. No system currently stores its information in as complex a way as a book, but books show up the failings that threaten anything based on an expert's knowledge. A book is not interactive. Expert systems should be - what will happen to a system that supposes "Arrest the usual suspects" is the same thing as "Round up the usual suspects", and then produces a medical diagnosis that confuses "Dye" and "Die" and "Diet" and suggests a course of treatment? Expert Systems risk owing too much to engineers builing on the Ronan Point and M25 principle. The role for philosophy in computing for too long has been no more than the use of Boolean algebra - that is not enough. Shouldn't someone wait until the philosophers' contribution has been made and the implications of error are better understood? If you're going to have a rule based system then one of the rules must be capable of telling you when you can trust the data and when you can't. The fact that experts have not been able to do it in even simple things, like introductory books, does not bode well.
Thinking Machines The Search for Artificial Intelligence by Igor
Aleksandr and Piers Burnett
This review appeared in PROGRAM NOW
|
Note: N/A |