10. Some Thoughts on Languages and Learning

What is the greatest invention of the universe? Undoubtedly, it is human being. Then, what is the greatest invention of human being so far? Most likely, the answer is human language. Indeed, languages are human-made inventions, which help mankind to encode, record, acquire, transfer and most importantly re-discover both propositional and procedural knowledge.

Interesting enough, if we ask this question of “what is knowledge?” to public, you will get many versions of answers. Ironically, everyone knows what knowledge refers to, but yet is not able to give a commonly acceptable definition. Ten years ago, we have advocated the following definition: Knowledge are the clusters of properties and constraints of all entities. In other words, knowledge refer to properties and constraints possessed by all entities in existence. It is our view that the above definition of knowledge will receive least critics from the research community. So far, we have not yet found any better version of definition of knowledge. If you know one, please share with the research community.

Once we know what knowledge refer to, we can continue to raise the following question: what is the best way of representing knowledge? If you go to ask this question in front of college students or researchers, it is for sure that you will not get the correct answer most of the time.

Surprising enough, all of us innately know, but mentally are not able to speak out, the correct answer which is: human languages. Yes, human languages and the extensions of them are the best ways of representing knowledge discovered by human beings. This fact is largely ignored by the textbooks or research community of Artificial Intelligence because knowledge representation is still considered as “an issue which still looks for definite answer”. Despite such shortcoming in research community, public expectation on Artificial Intelligence is still increasingly high.

On the other hand, from the viewpoint of human languages, we can easily understand the nature of mathematics, which are also called as technical languages. In fact, mathematics are extensions of human languages, and are for the purposes of describing properties and constraints at much deeper levels of abstraction. Therefore, it is not a surprise for people to say that mathematics are simply technical languages invented by human beings.

Also, from the viewpoint of human languages, we should call programming languages the extensions of human languages, instead of calling them machine languages. This is simply because programming languages are for the purposes of describing algorithms, logics, control loops and data processing in the form of programs (which have no difference from texts). So, we must say that programming languages are not for machines to use. Especially, if all of us love human languages, all of us in general, and college students in particular, should equally love mathematics and programming languages.

Knowing the nature of human languages, we can now examine the nature of this massively-used term: learning. We all know that an individual will normally spend about 6 years in primary school, 6 years in secondary/high school, and at least 4 years in university, for the sole purpose of learning under the guidance of teachers. Therefore, learning is a process which involves the presence of learners and teachers. And, the primary activities of learners are to learn knowledge as well as skills, and to discover better ways of learning, while the primary activities of teachers are to teach knowledge as well as skills, and to discover better ways of teaching.

At this point, everyone should be able  to understand the true nature of languages as well as learning. However, if you think further, you will find out that the following questions are still waiting for satisfactory answers: Why is a human being able to learn any human language in the world? Why isn’t an animal (e.g. a Monkey, a Cat or a Dog) able to learn a little bit of human language? What will be the principles which will enable machines to learn human languages?, etc.

Therefore, the actual challenge in learning is to make machines of tomorrow to be able to master all human languages and the extensions of them. Without considering the aspect of human languages, deep learning will always remain deeply superficial.

Ming Xie




The objective of the L2M program is to develop technologies necessary to enable a next-generation adaptive artificial intelligence (AI) system capable of continually learning and improving performance in real-world environments while remaining constrained by pre-determined capability limits. Such a system would be able to apply existing knowledge to new circumstances without pre-programming or training sets, and would be able to update its network based on its situation for a variety of applications.

The L2M program aims to develop technology that could support the creation of a new generation of AI systems that learn online, in the field, and based on what they encounter—without having to be taken offline for reprogramming or retraining for new conditions.

The L2M program will combine computer science with biology-inspired principles of learning with the goal of developing the capability to perform continual learning, with a focus on the creation of learning paradigms and evolving networks that learn perpetually through external data and internal goals. Performers will be tasked with developing systems that can demonstrate an ability to learn new tasks without losing capability on previously learned tasks, and can apply previous knowledge to novel situations—and in doing so develop more complex capabilities.


8. China is betting big on the Industrial Internet of Things (IIoT)

China is betting big on the Industrial Internet of Things (IIoT).

The Chinese government projects the country’s Industrial Internet of Things to grow to 450 billion RMB (about $65 billion) by 2020 – occupying roughly one quarter of China’s whole IoT market. The IIoT segment in China has maintained a growth rate of more than 25 percent. At present, Chinese enterprises whose status can be described as Industrial 3.0 know they need a lot of improvements, especially in cost control, production efficiency and process management.


7. Japan: Frontlines of “Robo-Conversion”

For waiters and waitresses in Japan, jobs are plentiful. For every applicant there are 3.8 wait jobs to choose from, which means 2.8 wait jobs go wanting. For drivers, there are 2.7 jobs per applicant, while for builders the ratio is 3.9 to 1.

That, of course, means that things don’t look as plentiful if you own a restaurant, a trucking company or want to build a shopping mall. Finding enough workers is just about impossible. A recent Financial Times headline says it all: Japan worker shortage has only one winner so far: robots.

6. Plagiarism and Myopia in AI

Extracted from:  https://www.facebook.com/juyang.weng/posts/10155025045069783

Hi, Fei-Fei, I am sorry that this episode takes the form of a letter to you mainly since you are an organizer of ImageNet. Please consider this communication as a purely academic debate that addresses important issues of AI. I am happy that you are doing extremely well in terms of recognition and funding and you are also one of my friends. Please do not be personally offended by this kick-off of scientific debate in AI, because this open discussion is very important for the future of AI.

Your AI systems are also fundamentally not scalable because they do not have their autonomous developmental programs. It is impossible to build a strong AI system without fully autonomous development.

This is a deep and complex subject of science, not a superficial and personalized issue of styles as Mr. Deng Xiaoping argued using his socalled socialism with Chinese characteristics. Unfortunately, isolated greatly within during many years’ of so-called Chinese Communist Revolution, Deng Xiaoping did not gain sufficient knowledge about this deep and complex science. Likewise, many researchers in AI did not grain sufficient knowledge about the breakthroughs in the direction of Autonomous Mental Development (AMD). See Weng, McClelland, Pentland, Sprons, Stockman, Sur, and Thelen: Science, 2001 about what AMD is.

Second, let us look at a plagiarism issue in AI.

You used your work to talk about the future of AI without addressing fundamental AI issues that knowledgeable scientists have long complained about, such as Marvin Minsky (re. neural networks are scruffy and so are yours), Michael Jordan (re. neural networks do not abstract well and so do yours), and John Tsotsos (re: the exponential complexity of visual attention and your systems completely miss the function of autonomous attention).

After our brief conversation during CVPR 2010, I followed up with an email to you June 18, 2010. In that email, I reminded you “brittle vision systems have been well known for long” (implying that your graphic model based vision is brittle) and “we now have a model for brain-mind [5] to address the deep-trenched brittleness in model-based agents (vision and beyond).” However, I did not receive any of your response.

I here cautiously alert to you that our brain model, Developmental Networks (DN), has solved the series of fundamental problems that Marvin Minsky, Michael Jordan, and John Tsotsos raised but did not solve.

However, DN originated from Cresceptron.

Did you plagiarize Cresceptron?

Cresceptron was published by Weng, Ahuja and Huang (IJCNN 1992, ICCV 1993, and IJCV 1997), which is the first deep-learning convolutional networks for general 3-D objects in cluttered backgrounds. We argued (IJCV1997), citing psychological studies, that the human vision system does not have perfect shift-invariance, scale-invariance, or orientation invariance. If there is a monolithic 3-D model in the brain, such invariances seem to be natural consequences. Therefore, any 3-D monolithic model should not exist in the brain.

Specifically, Cresceptron seems to be the first incrementally growing network for (1) detection, (2) recognition, and (3) segmentation of general 3D objects (and scene category) from images of cluttered natural scenes.

As soon as Cresceptron was published, many researchers asked me why I think that brain learns 3D objects from 2-D patches in 2-D images but the brain does not have a monolithic 3-D model. 3-D model based recognition was a prevailing approach till Cresceptron was published then.

Namely, Cresceptron started a revolution in computer vision, which Sandy Pentland suggested me to call appearance based vision, in contrast with then popular aspect graph method that is based on a 3D model.

In your PhD thesis 2005, did you plagiarize Cresceptron for your entire approach?

In Section 9.2, literature review, you cited only some feature extraction techniques that assume 3-D objects (e.g., shape from shading, Hough transform, scale invariance features, and affine-invariant features). In Section 9.3 Contribution, you stated “to learn a completely new object category given very few training examples,” which is exactly what Cresceptron did 12 years before your thesis. The “one-shot” recognition idea in your thesis is also what Cresceptron did. Your Fig. 11.3 is the same, in ideas, to Fig. 1 in Cresceptron ICCV 1993. You wrote about “incrementally” in the 2nd paragraph on page 79, but this is also what Cresceptron did.

ICCV where Cresceptron published was arguably the most visible computer vision conference then, 12 years before your PhD thesis. The complete journal version of Cresceptron was published in IJCV, arguably the most visible computer vision journal 8 years before your PhD thesis. However, your entire PhD thesis did not cite Cresceptron.

Your later papers also used other ideas of Cresceptron, such as the AND and OR structures (see Fig. 3, grouplets, of the CVPR 2010 paper and so on). Cresceptron automatically generated such structures but you only handcrafted them. Yet you still did not cite Cresceptron.

In many of your papers and talks, you cited only the deep convolutional networks of Kunihiko Fukushima, Geoffrey E. Hinton and Yann LeCun for deep learning.

It is well known that Kunihiko Fukushima used a deep convolutional network (Neocognitron) for only classification of isolated, single numerals that is fundamentally 2D, not 3D, not cluttered natural scenes. Till which recent years have Kunihiko Fukushima, Geoffrey E. Hinton and Yann LeCun used their networks to conduct exclusively the classification of isolated, single numerals that is fundamentally 2D?

Also turning your blind eyes to the brain’s developmental program, the ImageNet Contest that you initiated is not only myopic but also misleading the AI field. The ImageNet competitions lack rigorous protocols that any scientific and engineering problem must clearly specify:

1. What is known (e.g., a particular set of static symbolic labels)?
2. What is given (e.g., all training and testing images along with the solutions)?
3. What is assumed (e.g., episodic image classification only and attention-free only)?
4. What are not allowed in producing the results (e.g., can a team design methods after looking at test data, learn after seeing the exam)?
5. How long do the interactive manual developmental processes last (e.g., years across competitions)?

Does each team’s training and testing were under organizer’s due supervision, because the following humans interventions could be critical?

1. test-specific symbolic representations,
2. task-specific algorithms,
3. test-specific algorithms,
4. parameter tweakings,
5. handpicks training and testing data (you called “clean”, but does a child require mom to clean the visual world on his retinas before he can learn?),
6. expansions of computing resources after seeing the test performance,
7. human handpicks only the best one result from many results to report.

Is this a human algorithm assisted by computers? This is not autonomous development that AMD algorithms were meant where every network must not only successful like every human child, but also must be optimal in the sense of maximum likelihood conditioned on the learning experience of each child (network).

But you only stated “computer telling us”, but in fact humans telling us with computers as dumb tools. Those computers in your ImageNet do not truly “see” in the sense of human vision.

For example, the “dual task paradigm” used in your PhD thesis cannot totally eliminate the effect of attention. Many psychologists do not know that. All you need is to understand how top-down attention works and learns in the DN network. Dual-attention is an anytime learned brain behavior.

It is also difficult for a human brain to label every image consistently depending on what pops up from its mind when he looks at the scene under the dual task paradigm. Human attention could be very rich, depending on the individual’s life experience and laboratory instructions.

Was every ImageNet team’s computer program under the dual task paradigm?

You compared the 3.6% machine error rate in 2015 with 5.1% from humans. Is it a fair comparison or a misleading comparison?

Was it honest for you to state “machines’ progress has basically reached, and sometimes surpassed human level” (GIF’17)?

Let us further examine that many humans in each team were in the iterative loop of your contest. Suppose that you have enough money to buy the total competition time of all students in Stanford University to participate in your ImageNet Stanford team, but you only report the student that happened to get the highest score. You claim then “see Stanford education, surpassed the level other humans!”

You cannot afford to sell the entire Stanford team to every human user for a computer vision application. This is why every human brain learns autonomously within a closed skull. Each brain must learn successfully without buying the entire Stanford team into its skull to design representations and tweaking parameters.

I predict that over the years from 2010 to 2015, more and more teams interactively figured out ways to beat ImageNet competitions that you listed GIF’17. For example, some data and symbolic representations were re-used over the years, but you do not care how those data are used. Your ImageNet competition seems to be a race of both human resource and machine resource, where teams could try to fully take advantage of the task limitations of the ImageNet tests. The ImageNet competitions seem not scientific, non-developmental, myopic, and not general-purpose. The contests were designed to bypass the major fundamental bottleneck problems in AI.

The AIML Contest series was meant to overcome major limitations of ImageNet Contests and other contests. It is meant for AMD, where a single learning engine must autonomously develop for any practical sensing modality and motor modality, including vision, audition, and natural language acquisition (text). It is meant for any practical tasks that a human teacher in the environment has on mind; however, the task to be learned is not known to the programmer of the learning engine. I invite you to take a look at the Rules of AIML Contest, which was successfully conducted during 2016 without getting a penny of government funding. AIML Contest will be conducted again in 2017.

Jan. 23, 2017

John Weng, PhD

Professor, Michigan State University, USA

5. Some Thoughts on Humanoid Robotics

Without a human body, human intelligence cannot be developed on its own. This fact equally applies to robots. Most importantly, it explains why many research teams working in the field of robotics do build physical prototypes.

In recent decades, it has been an ambition of human beings to develop human-like machines, which are called robots in general and humanoid robots in particular. So far, tremendous progresses have been made in science and engineering for the understanding of design, analysis and control of grasping, biped walking and manipulating mechanisms. In parallel, a great deal of wonderful results has been generated from the investigation on various principles behind learning, and comprehension of language and images, etc.

Fueled by the excitements and impressive shows of HONDA’s and SONY’s humanoid robots, should we believe that a new era of humanoid robots has just started, and that more excitements are waiting ahead?

In fact, humanoid robots are human-made inventions and creatures. Therefore, the development of humanoid robots is basically invention-centric research, which should not be shadowed by our limited understanding about the nature. Instead, we have the full freedom in exercising our creativity for the invention of humanoid robots.

On the other hand, humanoid robotics research will gain a lot of benefits from the results of the discovery on how human body and mind work. As the outcome of invention will stimulate discovery-centric research in various ways, humanoid robots which are embodiment of mind and body, are undoubtedly ideal platforms for us to validate, or apply, theories from the study in neuroscience, psychology, learning, and cognition.

We all know that the embodiment of mechanics, electronics, control, communication, perception, decision-making, artificial psychology, and machine intelligence has greatly enlarged the scope of scientific investigation into the engineering approaches and principles underlying the development of humanoid robots. Because both the discovery-centric research and the invention-centric research in humanoid robotics could make progress hand-in-hand, this opens a new horizon in which fruitful results are expected to emerge in various forms of new theories, new technologies and new products. In comparison with industrial robotics research, humanoid robotics research will certainly offer much more opportunities and inspirations for new inventions and new discoveries.

Therefore, our goal toward this research direction is to adopt an integrative approach which aims at developing human-like Artificial Self-Intelligence (or artificial life) which could autonomously learn and develop its physical, intellectual, and emotional abilities through the interaction with human beings and environments. Some of our research focuses include: cognitive vision, cognitive speech, machine perception, machine learning, intelligent & real-time OS, image understanding, natural language understanding, conversational dialogue, machine translation, etc.

Ming Xie



4. Some Thoughts on Artificial Intelligence

With the increase in performance and capabilities of today’s computers, researchers and scientists are excited about the possibility of computerizing human intelligence in the form of computer programs so as to make computers possess a certain degree of human-like intelligence. In the middle of last century, this human endeavor has produced a new technical term that was named as: Artificial Intelligence (AI).

However, until today, there is no common consensus on the definition of Artificial Intelligence. For example, it is still not clear about whether Artificial Intelligence literally refers to computerized human-intelligence or machine’s self-intelligence.

As we know, the study of Artificial Intelligence historically focuses on problem-solving (i.e. analysis) and learning. The difficult issue of synthesis (i.e. creativity) has not yet been received much attention. Although artificial intelligence may literally mean man-made intelligence inside machines or robots, the contents discussed under the traditional paradigm of artificial intelligence suggests that it implicitly deals with computerized human-intelligence or computational intelligence (i.e. rationality).

Then, we can raise this question: What do we mean by intelligence from an engineering point of view? Refer to my book on Fundamentals of Robotics published in 2003.  One possible definition of intelligence is as follows: “Intelligence is the self-ability which links perception to actions so as to achieve intended outcomes. Intelligence is a measurable attribute, and is inversely proportional to the effort spent in achieving an intended goal.”

In a conference held in Italy in 2004, I have further made a concise definition of machine (or robot) intelligence as follows: “Robot intelligence is an attribute engendered by a robot’s brain, under the governance of causality and rationality. Causality is the study of intelligence without considering motivation (i.e. value and belief), while rationality is the study of intelligence in relation to motivation.”

In view of above definitions, it is clear to us that the achievement made so far in the field of AI is still very limited. Many questions remain un-answered, for example: How to make machines to acquire and learn knowledge by themselves? How to make machines to communicate knowledge or meanings by themselves? How to make machines to understand knowledge and meanings by themselves? How to make machines to synthesize (i.e. create) knowledge or meanings by themselves?

Therefore, it is still a tremendous challenge for us to develop innate algorithms or engineering principles underlying Artificial Self-Intelligence (AsI). And, our goal toward this research direction is to develop practical engineering solutions to the problem of how to make future machines and robots to autonomously learn, understand, synthesize, and communicate meanings.

Ming Xie


1. Can Robots Learn Languages the Way Children Do?

Share your responses to the question of “Can Robots Learn Languages the Way Children Do?”

You can find my original response here.

Below is my another version of response:

Can Robots Learn Language the Way Children do?

 Ming Xie

Nanyang Technological University

Singapore 639798

The question here is how to design the mind which enables robots to learn languages by experiencing the real world in the same way a child does. In particular, what does it meant by “experiencing the real world”?  Could the real world be modelled? How does the modelling of the real world help robots to autonomously learn, analyze and synthesize human languages through the interaction with the physical world (i.e. environment)? What is the definition of meanings?  What is the relationship between physical meanings and languages? What is the principle behind the process of learning human languages?  These are the issues that should be addressed with regard to the design of the mind for robots to learn and to understand human languages through interaction with the real world ([1],[2]).

Let’s first examine the issue of “what is the definition of meanings?”.

The understanding of the definition of meanings will help us to find the appropriate principle behind the design of a robot’s mind. In our opinions, the real world should be divided into both the physical world (i.e. environment) and the conceptual worlds (i.e. texts in various languages). Therefore, the meanings of a word in a human language will consist of two parts: a) the physical meanings of the entity, referenced by the word, in the physical world, and b) the conceptual meanings of the word itself, in various human languages. Here, we consider that an entity’s properties (i.e. geometrical, mechanical, chemical, electrical, etc) as well as its constraints (i.e. kinematic constraint, dynamic constraint, etc) are the physical meanings of the entity. Due to the nature of constraints, when multiple entities co-exist in a common space of the physical, interactions among these entities will occur. And, these interactions will create the concepts such as actions, behaviours, events, episodes and stories, etc. Therefore, along the history of mankind, the process of encoding the meanings in the physical world gives rise to the invention of human languages. On the basis of the inventive nature of human languages, we advocate that the use of human languages creates the so-called conceptual worlds. That is to say that a conceptual world is the set of texts in one human language, which describes the meanings of the physical world. Hence, multiple human languages produce multiple conceptual worlds. Most importantly, the properties and constraints of a word in a particular human language are simply the conceptual meanings of the word itself. For example, nouns, verbs, adjectives, proverbs are properties of words, while noun-phases and verb-phases are constraints of words. In summary, properties and constraints in the physical world as well as the conceptual worlds define what we call the meanings.

Then, let’s examine the issue of “what is the relationship between physical meanings and languages?”.

We all know that languages are the inventions of human beings for the purpose of encoding the physical meanings of entities in the physical world. Interestingly enough, the relationship between physical meanings and languages is similar to the relationship between scenes and cameras. For example, we can say that cameras are the inventions of human beings for the purpose of projecting the appearances of scenes into images. In a similar way, we can say that languages are the inventions of human beings for the purpose of projecting the physical meanings of entities into texts. In robot vision, one of the tasks is to do reconstruction or photo interpretation, which aims at reconstructing the scenes from given videos or images. Similarly, in robot hearing, one of the tasks is to do reconstruction or text understanding, which aims at reconstructing the physical meanings from given texts or sounds of texts.

Now, we come to this important question of “Can robots learn language the way children do?”

As mentioned above, properties and constraints are the contents of knowledge or meanings. And, the best way of representing these knowledge or meanings is the use of human languages. Therefore, the mastery of human languages is crucial to the development of robots of tomorrow which are capable of interact and communicate with human beings. Human children have the innate capability of mastering or learning any human language. This capability depends on two important factors. The first factor is the built-in blueprint of the mind which is the foundation of learning human languages. The second factor is the chance of interaction in the physical world during the process of learning human languages. As a result, if we could make robots of tomorrow to also gain these two conditions (i.e. the built-in blueprint of the mind similar to human beings’ one, and the ability of interacting in the physical world) , then robots will be able to learn language the way children do.

Actually, we have initiated the project with the aim of developing a robotics mind under the name of KnowNet, which is a software with the functionality such as teacher-assisted learning of human languages, vision-guided learning of human languages, visualization of physical meanings in 3D virtual space, text understanding, text synthesis, speech recognition, speech synthesis, conversational dialogue and multiple language translation, etc.


  1. Xie, Jayakumar. S. Kandhasamy and H.F. Chia. Meaning-centric Framework for Natural Text/Scene Understanding by Robots,International Journal of Humanoid Robotics,  1(2), June 2004.
  2. Jayakumar S. Kandhasamy, Organized Memory for Natural Text Understanding and Its Meaning Visualization by Machine,PhD thesis (Under Review), School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore (2005).