14. Is the recent progress of humanoid robots a threat?


My thoughts are as follows:

Crimes are motivated by desires. Desires are linked to biological properties of human beings. As long as humanoid robots do not include the biological properties linked to desires, humanoid robots will not be able to self-plan and self-execute any crime. However, humanoid robots could be the tools for bad guys to use in a process of committing crimes. But, such scenario could be avoided with built-in software which evaluates and monitors the consequence of humanoid robots’ actions.

Ming Xie

13. Google CEO Sergey Brin, “I didn’t see Artificial Intelligence coming. How wrong was I?”

My short response to his question is as follows:

If we believe that Artificial Intelligence simply means computer-aided human intelligence or computerized human intelligence, then we have already achieved a great deal in this direction. And, all the data-based decision-making systems are good examples of such achievements.

If we believe that Artificial Intelligence should mean machine’s, or robot’s, self-intelligence, then only very few people are making the progress into this direction. And, we are still in the infant stage in this direction. Most importantly, I believe that much more resources should be put into this direction, instead of continuing to speculate about computer-aided human intelligence or computerized human intelligence.

Ming Xie


12. What is dynamic biped walking of humanoid robots?

In the video below, Professor Oh did not believe that there should be a specific term on dynamic walking.

Please share your view about the classification of biped walking by humanoid robots.


Below is my short viewpoints:

According to the supply of mechanical energy, biped walking can be classified into two categories: one is called actuated biped walking and the other is called non-actuated biped walking. However, our main concern here is about the stability or balance, which is a challenge to both planning and control of biped walking by humanoid robots.

Interestingly, in the domain of biped walking, there are two types of stability or balance. One is called static stability or balance, in which the zero-moment point (ZMP) is always within the support zone of a humanoid robot’s foot or feet (assume that there is no other contact point between a humanoid robot and the environment). The other is called dynamic stability or balance, in which the body supported by biped maintains its state of motion. For example, in the sagittal plane of a humanoid robot, a dynamic stability is achieved with a successive clock-wise rotations of legs for backward walking or a successive of counter-clock-wise rotations of legs for forward walking. And, in the coronal plane of a humanoid robot, a dynamic stability is achieved with periodic oscillations of clock-wise and counter-clock-wise rotations of legs.

Hence, on the basis of these two types of stability or balance, we can classify biped walking into the following two categories: static walking which maintains static stability, and dynamic walking which maintains dynamic stability.

Ming Xie


10. Some Thoughts on Languages and Learning

What is the greatest invention of the universe? Undoubtedly, it is human being. Then, what is the greatest invention of human being so far? Most likely, the answer is human language. Indeed, languages are human-made inventions, which help mankind to encode, record, acquire, transfer and most importantly re-discover both propositional and procedural knowledge.

Interesting enough, if we ask this question of “what is knowledge?” to public, you will get many versions of answers. Ironically, everyone knows what knowledge refers to, but yet is not able to give a commonly acceptable definition. Ten years ago, we have advocated the following definition: Knowledge are the clusters of properties and constraints of all entities. In other words, knowledge refer to properties and constraints possessed by all entities in existence. It is our view that the above definition of knowledge will receive least critics from the research community. So far, we have not yet found any better version of definition of knowledge. If you know one, please share with the research community.

Once we know what knowledge refer to, we can continue to raise the following question: what is the best way of representing knowledge? If you go to ask this question in front of college students or researchers, it is for sure that you will not get the correct answer most of the time.

Surprising enough, all of us innately know, but mentally are not able to speak out, the correct answer which is: human languages. Yes, human languages and the extensions of them are the best ways of representing knowledge discovered by human beings. This fact is largely ignored by the textbooks or research community of Artificial Intelligence because knowledge representation is still considered as “an issue which still looks for definite answer”. Despite such shortcoming in research community, public expectation on Artificial Intelligence is still increasingly high.

On the other hand, from the viewpoint of human languages, we can easily understand the nature of mathematics, which are also called as technical languages. In fact, mathematics are extensions of human languages, and are for the purposes of describing properties and constraints at much deeper levels of abstraction. Therefore, it is not a surprise for people to say that mathematics are simply technical languages invented by human beings.

Also, from the viewpoint of human languages, we should call programming languages the extensions of human languages, instead of calling them machine languages. This is simply because programming languages are for the purposes of describing algorithms, logics, control loops and data processing in the form of programs (which have no difference from texts). So, we must say that programming languages are not for machines to use. Especially, if all of us love human languages, all of us in general, and college students in particular, should equally love mathematics and programming languages.

Knowing the nature of human languages, we can now examine the nature of this massively-used term: learning. We all know that an individual will normally spend about 6 years in primary school, 6 years in secondary/high school, and at least 4 years in university, for the sole purpose of learning under the guidance of teachers. Therefore, learning is a process which involves the presence of learners and teachers. And, the primary activities of learners are to learn knowledge as well as skills, and to discover better ways of learning, while the primary activities of teachers are to teach knowledge as well as skills, and to discover better ways of teaching.

At this point, everyone should be able  to understand the true nature of languages as well as learning. However, if you think further, you will find out that the following questions are still waiting for satisfactory answers: Why is a human being able to learn any human language in the world? Why isn’t an animal (e.g. a Monkey, a Cat or a Dog) able to learn a little bit of human language? What will be the principles which will enable machines to learn human languages?, etc.

Therefore, the actual challenge in learning is to make machines of tomorrow to be able to master all human languages and the extensions of them. Without considering the aspect of human languages, deep learning will always remain deeply superficial.

Ming Xie




The objective of the L2M program is to develop technologies necessary to enable a next-generation adaptive artificial intelligence (AI) system capable of continually learning and improving performance in real-world environments while remaining constrained by pre-determined capability limits. Such a system would be able to apply existing knowledge to new circumstances without pre-programming or training sets, and would be able to update its network based on its situation for a variety of applications.

The L2M program aims to develop technology that could support the creation of a new generation of AI systems that learn online, in the field, and based on what they encounter—without having to be taken offline for reprogramming or retraining for new conditions.

The L2M program will combine computer science with biology-inspired principles of learning with the goal of developing the capability to perform continual learning, with a focus on the creation of learning paradigms and evolving networks that learn perpetually through external data and internal goals. Performers will be tasked with developing systems that can demonstrate an ability to learn new tasks without losing capability on previously learned tasks, and can apply previous knowledge to novel situations—and in doing so develop more complex capabilities.


8. China is betting big on the Industrial Internet of Things (IIoT)

China is betting big on the Industrial Internet of Things (IIoT).

The Chinese government projects the country’s Industrial Internet of Things to grow to 450 billion RMB (about $65 billion) by 2020 – occupying roughly one quarter of China’s whole IoT market. The IIoT segment in China has maintained a growth rate of more than 25 percent. At present, Chinese enterprises whose status can be described as Industrial 3.0 know they need a lot of improvements, especially in cost control, production efficiency and process management.


7. Japan: Frontlines of “Robo-Conversion”

For waiters and waitresses in Japan, jobs are plentiful. For every applicant there are 3.8 wait jobs to choose from, which means 2.8 wait jobs go wanting. For drivers, there are 2.7 jobs per applicant, while for builders the ratio is 3.9 to 1.

That, of course, means that things don’t look as plentiful if you own a restaurant, a trucking company or want to build a shopping mall. Finding enough workers is just about impossible. A recent Financial Times headline says it all: Japan worker shortage has only one winner so far: robots.

6. Plagiarism and Myopia in AI

Extracted from:  https://www.facebook.com/juyang.weng/posts/10155025045069783

Hi, Fei-Fei, I am sorry that this episode takes the form of a letter to you mainly since you are an organizer of ImageNet. Please consider this communication as a purely academic debate that addresses important issues of AI. I am happy that you are doing extremely well in terms of recognition and funding and you are also one of my friends. Please do not be personally offended by this kick-off of scientific debate in AI, because this open discussion is very important for the future of AI.

Your AI systems are also fundamentally not scalable because they do not have their autonomous developmental programs. It is impossible to build a strong AI system without fully autonomous development.

This is a deep and complex subject of science, not a superficial and personalized issue of styles as Mr. Deng Xiaoping argued using his socalled socialism with Chinese characteristics. Unfortunately, isolated greatly within during many years’ of so-called Chinese Communist Revolution, Deng Xiaoping did not gain sufficient knowledge about this deep and complex science. Likewise, many researchers in AI did not grain sufficient knowledge about the breakthroughs in the direction of Autonomous Mental Development (AMD). See Weng, McClelland, Pentland, Sprons, Stockman, Sur, and Thelen: Science, 2001 about what AMD is.

Second, let us look at a plagiarism issue in AI.

You used your work to talk about the future of AI without addressing fundamental AI issues that knowledgeable scientists have long complained about, such as Marvin Minsky (re. neural networks are scruffy and so are yours), Michael Jordan (re. neural networks do not abstract well and so do yours), and John Tsotsos (re: the exponential complexity of visual attention and your systems completely miss the function of autonomous attention).

After our brief conversation during CVPR 2010, I followed up with an email to you June 18, 2010. In that email, I reminded you “brittle vision systems have been well known for long” (implying that your graphic model based vision is brittle) and “we now have a model for brain-mind [5] to address the deep-trenched brittleness in model-based agents (vision and beyond).” However, I did not receive any of your response.

I here cautiously alert to you that our brain model, Developmental Networks (DN), has solved the series of fundamental problems that Marvin Minsky, Michael Jordan, and John Tsotsos raised but did not solve.

However, DN originated from Cresceptron.

Did you plagiarize Cresceptron?

Cresceptron was published by Weng, Ahuja and Huang (IJCNN 1992, ICCV 1993, and IJCV 1997), which is the first deep-learning convolutional networks for general 3-D objects in cluttered backgrounds. We argued (IJCV1997), citing psychological studies, that the human vision system does not have perfect shift-invariance, scale-invariance, or orientation invariance. If there is a monolithic 3-D model in the brain, such invariances seem to be natural consequences. Therefore, any 3-D monolithic model should not exist in the brain.

Specifically, Cresceptron seems to be the first incrementally growing network for (1) detection, (2) recognition, and (3) segmentation of general 3D objects (and scene category) from images of cluttered natural scenes.

As soon as Cresceptron was published, many researchers asked me why I think that brain learns 3D objects from 2-D patches in 2-D images but the brain does not have a monolithic 3-D model. 3-D model based recognition was a prevailing approach till Cresceptron was published then.

Namely, Cresceptron started a revolution in computer vision, which Sandy Pentland suggested me to call appearance based vision, in contrast with then popular aspect graph method that is based on a 3D model.

In your PhD thesis 2005, did you plagiarize Cresceptron for your entire approach?

In Section 9.2, literature review, you cited only some feature extraction techniques that assume 3-D objects (e.g., shape from shading, Hough transform, scale invariance features, and affine-invariant features). In Section 9.3 Contribution, you stated “to learn a completely new object category given very few training examples,” which is exactly what Cresceptron did 12 years before your thesis. The “one-shot” recognition idea in your thesis is also what Cresceptron did. Your Fig. 11.3 is the same, in ideas, to Fig. 1 in Cresceptron ICCV 1993. You wrote about “incrementally” in the 2nd paragraph on page 79, but this is also what Cresceptron did.

ICCV where Cresceptron published was arguably the most visible computer vision conference then, 12 years before your PhD thesis. The complete journal version of Cresceptron was published in IJCV, arguably the most visible computer vision journal 8 years before your PhD thesis. However, your entire PhD thesis did not cite Cresceptron.

Your later papers also used other ideas of Cresceptron, such as the AND and OR structures (see Fig. 3, grouplets, of the CVPR 2010 paper and so on). Cresceptron automatically generated such structures but you only handcrafted them. Yet you still did not cite Cresceptron.

In many of your papers and talks, you cited only the deep convolutional networks of Kunihiko Fukushima, Geoffrey E. Hinton and Yann LeCun for deep learning.

It is well known that Kunihiko Fukushima used a deep convolutional network (Neocognitron) for only classification of isolated, single numerals that is fundamentally 2D, not 3D, not cluttered natural scenes. Till which recent years have Kunihiko Fukushima, Geoffrey E. Hinton and Yann LeCun used their networks to conduct exclusively the classification of isolated, single numerals that is fundamentally 2D?

Also turning your blind eyes to the brain’s developmental program, the ImageNet Contest that you initiated is not only myopic but also misleading the AI field. The ImageNet competitions lack rigorous protocols that any scientific and engineering problem must clearly specify:

1. What is known (e.g., a particular set of static symbolic labels)?
2. What is given (e.g., all training and testing images along with the solutions)?
3. What is assumed (e.g., episodic image classification only and attention-free only)?
4. What are not allowed in producing the results (e.g., can a team design methods after looking at test data, learn after seeing the exam)?
5. How long do the interactive manual developmental processes last (e.g., years across competitions)?

Does each team’s training and testing were under organizer’s due supervision, because the following humans interventions could be critical?

1. test-specific symbolic representations,
2. task-specific algorithms,
3. test-specific algorithms,
4. parameter tweakings,
5. handpicks training and testing data (you called “clean”, but does a child require mom to clean the visual world on his retinas before he can learn?),
6. expansions of computing resources after seeing the test performance,
7. human handpicks only the best one result from many results to report.

Is this a human algorithm assisted by computers? This is not autonomous development that AMD algorithms were meant where every network must not only successful like every human child, but also must be optimal in the sense of maximum likelihood conditioned on the learning experience of each child (network).

But you only stated “computer telling us”, but in fact humans telling us with computers as dumb tools. Those computers in your ImageNet do not truly “see” in the sense of human vision.

For example, the “dual task paradigm” used in your PhD thesis cannot totally eliminate the effect of attention. Many psychologists do not know that. All you need is to understand how top-down attention works and learns in the DN network. Dual-attention is an anytime learned brain behavior.

It is also difficult for a human brain to label every image consistently depending on what pops up from its mind when he looks at the scene under the dual task paradigm. Human attention could be very rich, depending on the individual’s life experience and laboratory instructions.

Was every ImageNet team’s computer program under the dual task paradigm?

You compared the 3.6% machine error rate in 2015 with 5.1% from humans. Is it a fair comparison or a misleading comparison?

Was it honest for you to state “machines’ progress has basically reached, and sometimes surpassed human level” (GIF’17)?

Let us further examine that many humans in each team were in the iterative loop of your contest. Suppose that you have enough money to buy the total competition time of all students in Stanford University to participate in your ImageNet Stanford team, but you only report the student that happened to get the highest score. You claim then “see Stanford education, surpassed the level other humans!”

You cannot afford to sell the entire Stanford team to every human user for a computer vision application. This is why every human brain learns autonomously within a closed skull. Each brain must learn successfully without buying the entire Stanford team into its skull to design representations and tweaking parameters.

I predict that over the years from 2010 to 2015, more and more teams interactively figured out ways to beat ImageNet competitions that you listed GIF’17. For example, some data and symbolic representations were re-used over the years, but you do not care how those data are used. Your ImageNet competition seems to be a race of both human resource and machine resource, where teams could try to fully take advantage of the task limitations of the ImageNet tests. The ImageNet competitions seem not scientific, non-developmental, myopic, and not general-purpose. The contests were designed to bypass the major fundamental bottleneck problems in AI.

The AIML Contest series was meant to overcome major limitations of ImageNet Contests and other contests. It is meant for AMD, where a single learning engine must autonomously develop for any practical sensing modality and motor modality, including vision, audition, and natural language acquisition (text). It is meant for any practical tasks that a human teacher in the environment has on mind; however, the task to be learned is not known to the programmer of the learning engine. I invite you to take a look at the Rules of AIML Contest, which was successfully conducted during 2016 without getting a penny of government funding. AIML Contest will be conducted again in 2017.

Jan. 23, 2017

John Weng, PhD

Professor, Michigan State University, USA

5. Some Thoughts on Humanoid Robotics

Without a human body, human intelligence cannot be developed on its own. This fact equally applies to robots. Most importantly, it explains why many research teams working in the field of robotics do build physical prototypes.

In recent decades, it has been an ambition of human beings to develop human-like machines, which are called robots in general and humanoid robots in particular. So far, tremendous progresses have been made in science and engineering for the understanding of design, analysis and control of grasping, biped walking and manipulating mechanisms. In parallel, a great deal of wonderful results has been generated from the investigation on various principles behind learning, and comprehension of language and images, etc.

Fueled by the excitements and impressive shows of HONDA’s and SONY’s humanoid robots, should we believe that a new era of humanoid robots has just started, and that more excitements are waiting ahead?

In fact, humanoid robots are human-made inventions and creatures. Therefore, the development of humanoid robots is basically invention-centric research, which should not be shadowed by our limited understanding about the nature. Instead, we have the full freedom in exercising our creativity for the invention of humanoid robots.

On the other hand, humanoid robotics research will gain a lot of benefits from the results of the discovery on how human body and mind work. As the outcome of invention will stimulate discovery-centric research in various ways, humanoid robots which are embodiment of mind and body, are undoubtedly ideal platforms for us to validate, or apply, theories from the study in neuroscience, psychology, learning, and cognition.

We all know that the embodiment of mechanics, electronics, control, communication, perception, decision-making, artificial psychology, and machine intelligence has greatly enlarged the scope of scientific investigation into the engineering approaches and principles underlying the development of humanoid robots. Because both the discovery-centric research and the invention-centric research in humanoid robotics could make progress hand-in-hand, this opens a new horizon in which fruitful results are expected to emerge in various forms of new theories, new technologies and new products. In comparison with industrial robotics research, humanoid robotics research will certainly offer much more opportunities and inspirations for new inventions and new discoveries.

Therefore, our goal toward this research direction is to adopt an integrative approach which aims at developing human-like Artificial Self-Intelligence (or artificial life) which could autonomously learn and develop its physical, intellectual, and emotional abilities through the interaction with human beings and environments. Some of our research focuses include: cognitive vision, cognitive speech, machine perception, machine learning, intelligent & real-time OS, image understanding, natural language understanding, conversational dialogue, machine translation, etc.

Ming Xie