Fluid Concepts and Creative Analogies

Douglas Hofstadter and the Fluid Analogies Research Group

To Seek Whence Cometh a Sequence

Hofstadter

Given a series of numbers (0,1,2,720!,...), humans can recognize the pattern and give the 'next item.' In modeling this on a computer in the Seek-Whence project, breadth first or depth first search is out -- there are infinitely many possible operators describing patterns. Instead, humans seem to scan quickly for easily detectable characteristics and then follow the leads they find -- an initial breadth first survey, lots of small intense depth first searches locally, another breadth first overview, and so on.

Humans also will usually ignor larger numbers as requiring too much computational effort, as well as small numbers for which there are too mnay known facts. Numbers that have a single outstanding property usually give the first clue to a sequence. Numbers savvy and pattern sensitivity are required for pattern recognition.

Numbers Savvy

recognizing numbers by salient properties,
making guesses about properties of numbers
knowing relationships between properties of numbers ,
being able to make complicated calculations

Pattern Sensitivity

noticing sameness
noticing simple relationships
noticing analogies
imposing consistency (altering pattern fragments)
building abstractions
shifting boundaries
driving towards beauty

Islands of order -- stretches of numbers with internal logic -- are recognized by plateaus, up-runs, down-runs, palindromes... and used as larger building blocks when regularities among islands are recognized -- looking at first, middle, last elements for patterns between islands, matching up by analogy (34-567, 222-888). It is good to have diverse types of islands for efficiency in picking up ideas. Now the concepts of up/down-runs, palindromes, etc can be stretched to include structures of numbers as well as numbers.

Adopting an analogy enhances the likelihood of similar perceptions and weakens commitment to dissimilar perceptions. At first, pattern recognition is done bottom-up, looking for any pattern, but it becomes increasingly top own as the perceptions we choose exert influence on the perceptual process.

Mathematics is the art of choosing the most elegant generalization, but how to generalize?

Good generalizations may involve variations in conceptual spheres

tylenol poisoning --> medicine poisoning --> random killing --> random evil act --> random act...

me-too phenomenon - forgetting maiden name shares essence with writing wrong year on checks in Janary

Generalization involves

moving internal boundaries

swapping components or shifting substructure levels

merging and splitting substructures

lengthening or shortening a given component

adding new components or levels

replacing one concept by a closely related one

trying out the effect of reversals on various conceptual levels

Important themes from the Seek-Whence project

inseperability of perception and high-level cognition
easily reconfigurable multilevel cognitive representations subcognitive pressures- - the more important a representation or concept is , the more influence it has
commingling of many pressures
simultaneous feeling-out of many potential pathways
centrality of making of analogies and variations on a theme
deeper (no slip) and shallower (slippable) aspects the inner structure of concepts and conceptual neighborhoods.- - context dependent conceptual overlap and proximity and context independent conceptual depth

The Architecture of Jumbo

Hofstadter

Jumbo came out of the idea of anagrams and word jumbles. Its goal is to demonstrate the equavalence of cognition with deep perception -- perception that involves highly abstract and nonverbalizable categorization. Intelligence comes out of interactions of many thousands of parallel processes that take place within milliseconds and are inaccessible to introspection. Cognition equals recognition.

Jumbo takes letters and makes pseudo words from a set of rules on how letters then to combine (glom) in English. There is no attempt to check whether these are real words. Jumbo models the way humans juggle many little pieces and tentatively combine them into bigger pieces in trying for the strongest combinations; this may involve multilevel cleaving, splicing, regrouping, reordering and rearranging.

Letters first spark (check for compatability) then flash (if they are compatable) then bond (if their compatability is strong)...

Letters can be seen as atoms and clusters like ck or th as small molecules. As in molecules, this is a flexible multilevel structure held together by bonds of many different strengths, with natural breaking poings on weak bonds.

Jumbo's parallelism is based on the distributed parallelism of a cell. Each activity is carried out by enzymes that wander around the cell's cytoplasm, randomly finding cells of the right type to test/bond together/take apart. In cells this is efficient and reliable.

There is a static data structure of affinities put together intuitively by Hofstadter.

A spark is a short-lived simple data structure telling who sparks (tries out a bonding) with whom and in what order, and this is associated with a codelet in a bag called the coderack. When the codelet is selected (probabilistically) to run, it will look at the spark, evaluate its viability, and suggest whether further exploration is warranted, in which case it will spawn new codelets to do this. All processing in Jumbo is implemented by codelets, rated by urgency -- higher urgency means higher probability of being chosen, but does not guarantee that higher urgency will be chosen before low.

The terraced scan idea came out of a footnote in a Hearsay II paper, suggesting that the knowledge sources' preconditions have pre-preconditions and so on. A terraced scan is a parallel investication of many possibilities to different depths, quickly throwing out bad choices and homing in rapidly on good. First there are quick, superficial tests, and then increasingly elaborate and more expensive tests on the choices that pass. Passing may be by degrees, which will determine the urgency of associated codelets.

When a set of bonded items has passed enough tests that it gloms together, it is surrounded by a membrane and is now treated as a single item which has affinites sepreate from those of its constituent letters (h has no affinity with r, but th does). Each fresh glom is assigned a new node with properties of the glom attached (constituents, type (syllable/word) happiness, etc). Sparks from glom constituents with items outside the glom are still possible, but less likely; if successful, they will break up the glom.

Some work in AI suggests multiple top level structures being processed in parallel. In Jumbo, this does not happen; it is easy for 'musing' codelets to consider other worlds and for 'action' codelets to change to other worlds, but the previous version is not kept around in parallel. In Jumbo, parallelism is essential for assembly, but not used much at higher levels.

A glom's happiness is a function of how well the glom fits into the current state of the world and whether its bonds are of strong type. A single letter will not be desperate initially, but as more other letters are glommed, it will be increasingly unhappy. Unhappy gloms will give more urgency to their associated codelets.

The temperature of the system is the inverse of all the happiness in the system. High temperature (low happiness) makes changes more likely, while low temperature makes change more difficult.

If Jumbo comes to an unhappy solution, it may backtrack and then make random choices. The randomness (weighted by urgency) means all pathways are open, even if very unlikely. Jumbo can transform an unhappy pseudo word by increasing or preserving entropy (perceived disorder).

Entropy perserving can involve regrouping (week-nights to wee-knights) or rearrangements (spoonerism, forkerism, kniferism, sporkerism, exchange of syllables, interior reversal). These changes cause serendipitous interactions more efficiently than random recombination

Entropy increasing involves dissolving gloms, possibly all the way down to letters. Dissolvers are less likely to run when temperature is low.

In the coderach, dissolvers plant codelets for sparks which plant codelets for flashes, which plant codelets for bonds... Control is guided by urgency and happiness. Jumbo's intelligence is emergent and statistical rather than made up of explicit rules.

Arithmetical Play and Nondeterminism

Hofstadter

In estimating magnitudes of expressions, people who are more successful tend to use different strategies each time, becuase they are choosing from a larger palette of possible strategies.

Numbo: A Study in Cognition and Recognition

Daniel Defays

The aim of Numbo is to clarify relations between perception and cognition. In the game of Numble:

there are bricks from 1-25
target from 1-150
addition, subtraction & multiplication of the bricks are allowed in trying to build up the target

This is representative of a large class of problems where there is

a well defined goal, various operations and standard search techniques for the problem space
construction of larger units from smaller, with creation and destruction of temporary structures at various levels
rearrangement and dismantling of these structures
interaction of a priori knowledge, familiar concepts, and salient features of the input.

When humans solve Numble problems, top down and bottom up strategies are intertwined.

Architecture

The Pnet is a network encoding rote declarative knowledge such as

multiplicative decompositions of small integers
landmark numbers
salient numbers
small amount of rote knowledge about basic arithmetic

There are also label nodes 'addition', 'subtraction',. 'multiplication', and 'similar'. Each node has a degree of activation, which spreads out from the node to its nearest neighbors. Its ability to transmit activation is its weight. Every link between nodes has a label indicating what kind of relationship it encodes -- the links are meta-linked with appropriate label nodes. The more highly activated a given label is, the greater the weight of each link it labels.

Association simulated by spreading activation is the key notion

it allows different types of knowledge to be stored in a uniform way
it allows a given target to simultaneously evoke many different ideas ad strategies, surrounding each node by a halo of connotations
it allows great flexibility in controlling the processing by making it possible to change the focus of attention in a very general way
it is task independent

The cytoplasm is the working memory or blackboard; it has nodes and links that come and go, determined by the codelets acting under the influence of the Pnet. The target and each of the bricks gets its own node, these then interact:

grouping of bricks into blocks
dismantling of blocks
production of secondary targets
bonding of bricks, blocks ad targets

The Pnet can act on the cytoplasm by downloading pieces of its structure to the cytoplasm. The difference between declarative and procedural knowledge disappears once an operation is completed. Learning can be seen as uploading from the cytoplasm to the Pnet

nodes can have type -- target, secondary target, brick, block, operation nodes can have status -- taken and free
nodes can have attractiveness -- numerical value of likelihood that the node will be used.

Codelets may change the Pnet, the cytoplasm, or perform a test and then create/modify other codelets. They are both dumb (syntactic rather than semantic) and myopic (no sense of general overview).

High Level Perception, Representation and Analogy

David Chalmers, Robert French, and Douglas Hofstadter

Perception is influenced by

belief

goals

external context

and can be radically reshaped when necessary

The formation of appropriate representations lies at the heart of human high level cognitive abilities -- the task of understanding how to draw meaning out of the world. Some models in low level perception build primitive representations of the environment. High level cognitive modeling starts with representation at the conceptual level, with any meaning already built in. There has been little work at bridging the gap between the two.

After representation formation, we must deal with flexibility of high level perceptual processes, understanding objects and situations in different ways depending on context and top down influences -- different representations of an object or situation at different times.

The objectivist view has a unique correct, complete structure, independent of any human understanding.

The Physical Symbol System Hypothesis -- thinking occurs through the manipulation of symbolic representations composed of atomic symbolic primitives. This is difficult to shift subtly in response to context changes, and ends up being as fixed and absolute as the objectivist idea.

Steps toward representational flexibility have been taken in sophisticated connectionist models with highly context dependent representations. Each representation is a vector in multidimensional space whose position can adjust flexibly to changes in environmental stimuli.

Models without representation formation assume it is possible to model high level cognitive processes independently of perceptual processes. It seems unlikely that this will provide the kind of flexibility needed in a fully cognitive model which will probably require continual interaction between representation building and manipulation.

Analogies

The situation perception process takes the salients of a given situation and filters and organizes them to provide an appropriate representation for a given context

Mapping takes representations of two situations and finds appropriate correspondences between components of one with the components of the other.

Most analogy programs ignore the problem of representation building. But analogy building is integral to perception and vv. In making an analogy, the process of filtering the available data from a sufficiently complex representation of a known object would be high level perception all over again, there must be some way that the useful parts of the representation get put into working memory from the complete representation in long-term memory.

Conceptual Halos and Slippability

Hofstadter

in 1234554321, 4 plays roles

ext to center
next to largest number
predecessor of center
predecessor of largest number

These roles become clear when we try to find the '4' of 123475574321

Conceptual slippage is the context-induced replacement of one concept by a closeley related on in the mental representation of some situation. Concepts are like halos, they can overlap, but have a central core. Conceptual slippage is the discrete jump through overlapping halos from one core to another.

Copycat Project: A Model of Mental Fluidity and Analogy-Making

Hofstadter & Melanie Mitchell

Copycat is a program that discovers insightful analogies in a psychologically realistic way. Its top level behavior emerges as a statistical consequence of myriad small computational actions.

If abc changes to abd, how do you change ijk 'the same way'

ijl -- 'replace the rightmost letter by its successor'
ijd -- 'replace the rightmost letter by d'
ijk -- 'replace all c by d'
abd -- 'replace all strings by abd'

If aabc changes to aabd, how do you change ijkk 'the same way'

iikl -- replace the rightmost letter by its successor
ijll -- replace the rightmost unit by its successor
jjkk -- recognize a mapping between the doubled letters that suggests (creates a pressure) we slip rightmost and leftmost, now replace the leftmost letter by its successor
hjkk -- the rightmost-leftmost slip pressures us to also 'opposite' slip successor and predecessor

Although it handles analogy making in a very small domain, Copycat simulates fluid concepts, so the microdomain brings out transcendental general issues. The microdomain contains idealized concepts (sameness, successor) and structures out of which problems are made. There is a type/token distinction; a type is 'a' while a token is an instance of a type and has neighbors, adjacencies...

Copycat focuses on how to discover mappings and how to perceive and make sense of situations -- that is te waking of dormancy of the small number of known prior concepts that are relevant and applying them to the relationships in the situation.

Major Components of the Copycat Architecture

Slipnet -- site of all permanent Platonic concepts, it contains only concept types, no instances. Distances between concepts in the Slipnet can change over the course of a run, and these distances determine the likelihood of slippage.

Workspace -- site of perceptual activity, containing instances of various concepts from the Slipnet combined into temporary perceptual structures.

Coderack -- from which codelets are stochastically chosen.

In the Slipnet, concepts are represented by nodes, and conceptual relationships by links, with a numerical length representing the conceptual distance. The Slipnet is dynamic, with nodes acquiring levels of activation, spreading activation to neighbors, and losing activation by decay. Conceptual links adjust lengths dynamically. Each node has a static feature called conceptual depth -- the generality and abstractness of the concept ('opposite' is deeper than 'successor' is deeper than 'a'). Deeper concepts are less likely to be immediately percieved and more likely to be involved in the essence of a situation. The deeper the concept, the less likely it is to slip.

Each node spreads activation to its neighbors according to their distance from it, and loses activation with speed inversely proportional to its depth. Deeper concepts are kept around once they are determined to be relevant. Changes to the Slipnet disappear when the context is taken away, so it snaps back to its original state and does not model learning. The topology of the Slipnet stays static although the 'shape' changes.

There are several types of links, metalinked to labels, which are concepts in the network. Ever link constantly adjusts its length inversely with the activation of its label.

At the start of a run, each item in the Workspace has bare-bones information (letter type, leftmost, rightmost, etc). Overtime, codelets scout items for features and items acquire descriptions and are linked by various perceptual structures.

A set of objects in the workspace bonded by a uniform bond type is a candidate to be chunked into a group, which can have its own characteristics.

Pairs of objects in different frameworks (abc, xyz) are probabilistically selected and scanned for similarities, the most promising becomeming bridges in the Workspace, that is, they are considered each others's counterparts. If there is a difference between them, this is embodied in a conceptual slippage. The most favored slippages have components that are shallow and have high overlap.

As the Workspace becomes more complex, new structres are pressured to be consistent with existing structures. This consistency is the viewpoint. A structure's strength depends on tontext independent facets such as the depth of its concepts and how well it fits with the viewpoint. When structures are made in the workspace, they activate concepts in the Slipnet.

In the Coderack there are scout codelets and effector codelets. Souts look at a potential action and try to estimate its promise, then create more codelets to follow up on its findings. Effectors create or destory structures. There are bottom up codelets (noticers) and top down (seekers). Top down codelets are proxies for Workspace and conceptual pressures.

At first, the Coderack only has bottom-up similarity scanners, whose discoveries generate situation specific pressures.

In the Slipnet, initially activated concepts tend to be conceptually shallow, and there is a tendency to move from no themes to themes -- clusters of highly activated, closely related deep concepts. In the Workspace, there is a tendency to move from no structure to much structure, from many local unrealted objects to a few global coheret structures. Processes change from parallel towards serial, bottom up towards top down, and nondeterministic toward deterministic.

As Copycat is run enough times on a given problem, different solutions will emerge with frequency depending on the probabilites of the codelets. The final temperature of the system will tell how confident Copycat is of its answer. In some cases an answer that comes up a lot will have higher temperature than a rarer one (Copycat is not as happy with mrrjjj->mrrkkk as an analogy to abc->abd than it is with mrrjjj->mrrjjjj). If the temperature is clamped (not allowed to rise and allow for open-mindedness to less likely possibilities, or fall and hold things together as they are) then these less likely but more satisfying answers are not found.

Two Early AI Approaches to Analogy

Hofstadter

Thomas Evan's ANALOGY solved 'a is to b as c is to ...' problems from IQ tests, selecting its answer rather than producing it as COPYCAT does. It was given five numbered figures and ranked all of them - exhaustive search. ANALOGY built up its own representation of the pictures from a set of LISP data structures describing lines curves dots, etc. This primitive perceptual processing was temporally separated from conceptual mapping. Modeling both types of processes was not done again in further research. Most later research considers analogy making to be a big gun for problem solving, not a constantly used cognitive tool.

Walter Reitman's Argus solved 'bear is to pig as chair is to...' and chose from four choices [foot table coffee strawberry], table because table and chair are in the same group (furniture) as are bear and pig (animals). Reitman used a parallel processing version of the mind (rare then). Argus involved mutual interaction of a serial process trying to carry out a given task (an analogies problem) and a semantic network with parallel propagating activation. Nodes with high activation introduced dynamic biases into the serial process.

Perspectives on Copycat: Comparisons with Recent Work

Melanie Mitchell and Douglas Hofstadter

Most analogy making programs concentrate on how a mapping is made from a source problem with known solution to a target problem. Copycat is one of the few that focus on the construction of representations for the source and target situations, and how construction interacts with mapping, and how new previously unincluded concepts can be brought in and come to be seen as relevant.

Retrieval of Old and Invention of New Analogies

Hofstadter

The Copycat approach is like (aha, a metanalogy) explaining a cat to an alien by sending a live ant along with information on how ants are like and unlike cats. Other approaches are like sending a toy wind-up cat.

Caricature analogies are analogies concocted spontaneously for purposes of ridicule. To make it, one must have a strong sense of the conceptual skeleton of the situation. The analogy is shifted to the center of the original domain, where the silliness of the claimed equality will e easier to see.

Prolegomena to Any Future Metacat

Hofstadter

The ability to reperceive - to take a fresh look at a situation thought to be understood already is at the crux of creativity.

Copycat has too little awareness of the processes it is carrying out and the ideas it is working on. It may end up looping around and around to hit the same impasse that a human will learn quickly to avoid. It should be more self aware.

Maybe AI will be shades of gray consciousness, not black and white machine or mechanical.

Human brains possess concepts, allowing complex representational structures to be built that automatically come with associated links to prior experiences. Brains can self monitor, allowing a complex internal self model to arise, giving the system enormous self control and open endedness. Self monitoring helps the system to avoid falling into mindless ruts.

Creativity means having a sense of what is interesting to most people, following it recursively throughout the choices with a sense of self-confidence, applying it at the metalevel and being sensitive to unintended patterns as well as patterns in mental processes -- sensitivity to form as well as content, and modifying it accordingly, flexibly, adapting to experience.

What a Future Metacat should do

it should be sensitive to any sufficiently salient event in its Slipnet (the activation of a very deep concept) and should explicitly turn this into a clear characterization of what the problem is really about, in the Workspace. Also, it should keep a record off actions in the Wokspace.
it should be able to work backwards from a given answer that it did not produce and see the issues the answer is about, and then discuss the answer.
it should store a trace of its solution which it can use to tell it when to jump out of a situation and how to deal with a new problem.
it should be ale to handle meta-analogies to map one analogy problem (with answer) to another. It should also be able to meta-meta.
it should be able to make up new, high-quality puzzles.

Tabletop, BattleOp, Ob-Platte, Potelbat, Belptto, Platobet

Hofstadter and Robert French

The Tabletop project is developing a program to do analogy problems involving objects on a coffehouse tabletop ('do this' across a table). The Platobet, the alphabet of abstract Platonic categories is fairly rich, so that plate and saucer are conceptually quite close, and quite far from saltshaker.

In one configuration, certain interacting mental pressures will be evoked. A slight tweak will shift the pressures considerably, to give a different answer.

Potelbat is a formula based program for doing this in a brute-force way. It did reasonably well on most simple Tabletop problems, but tells nothing about how humans would do it.

A scaled-up domain for tabletop would be questions like 'What is the Bloomington of California?" When applied to military situations, this is Battleop. What is the Ob of Nebraska? Platte. So geographical analogy questions are traditionally called Ob-Platte puzzles. In Tabletop, both situations in the analogy are there in their entirety, not true for Ob-Platte puzzles. But as perception is not complete and instantaneous, this is not really a distinction.

At first the tabletop domain may seem very trivial, but it models very general kinds of analogy-making. Again perceptual pressures cause context-dependent pressures to emerge, and push the program to focus on certain concepts, areas of the table and objects more heavily than on others.

Fluid Concepts and Creative Analogies
Douglas Hofstadter and the Fluid Analogies Research Group
(Basic Books) Harper Collins 1995