Skip to main content

The code that writes code that writes code

I read that folks had observed that some machine learning models could be used to write code that runs.  Let that sink in for a moment.  At the same time, be aware that biases and extensive memory have been observed in the same model.

This might be considered as an implementation of automatic programming, and is definitely not the first time machine learning models are used for generating code (or strings that look like code).

The model that writes code (Z) when given some inputs is itself a piece of (very very very large and complex) code (Y).  If expressed in a general purpose programming language, it would have perhaps thousands of variables and many more operations.  No human programmer wrote this code - another piece of code (X) did.

Humans -> X -> Y -> Z

This is now the classical scientific fiction set up where a robot make other robots (which may in turn make other robots).  In the case when Y is a neural network, X would be responsible for both the training loops and the network architecture specification.  The latter can be done with another model, adding more steps in the diagram above.

In every step of the process, the humans are further removed from the process itself and potentially lose direct control over the outcome Z - this is completely natural and expected since the human inputs to the process becomes increasingly abstract and high-level.  The code Z, however, is expected to affect or be used by other humans; the utility of Z is why we want it written in the first place.

Losing the control over implementation details of Z is not by itself problematic as long as it works properly.  But how will the human developers guarantee that it works as intended if they don’t even write the code that writes Z? More importantly what actions can the human developer take when Z does not work correctly?  The human users at the other end - those that use Z or are affected by Z - might expect the developers to have full control over it.

This is not an entirely unfamiliar situation outside of machine learning. A software company’s CTO might not write any code but charts out a feasible course towards a software solution while engineering managers take up the threads and have their team of software engineers work on them.  The main thing that is different is the human-human interface. If the engineers are unclear about the requirements they are expected to seek clarification, they are expected to think up reasonable edge and corner cases that may arise within the use cases of their work, they are expected to not only come up with an implementation but also make the the implementation easy and cheap to maintain and understand.  If certain requirements are not technically feasible the engineering manager is expected to surface this with the CTO. 

In the scenario where machine learning models learn to write code, these helpful communication channels (from implementer to requirement specifier) seem lacking or at least not very widely used: active learning is an active research area that might help.





Comments

Popular posts from this blog

Machine learning and software development - debugging

Machine learning is an impressive approach to create software.  The universal approximation theorem is often cited to establish the claim that deep learning - a branch of machine learning - is already sufficiently expressive to approximate any numerical functions.  Ignoring the impracticality of this claim, I would like to contrast how this approach of creating software is very different from the traditional approach with human software developers. There are many ways machine learning based software creation differs from the traditional approach: The requirements are specified differently; the creation process is different; the testing is done differently; the created software is debugged differently.  In this post I will focus on debugging. The testing aspect will be discussed elsewhere, but let's say that you have found a bug realized in the following form:  There is an input x whose output f(x) of f is not the expected output y. And the goal of debugging is to modify this

"Good enough" is good enough

You bought a watch that tells the correct time 90% of the time you look at it.  The other 10% of time it might give you an obviously incorrect time (say, showing 11PM during the day) or might be just slightly off to make you miss that important meeting. After using this watch for a while you notice that its correctness depends on several factors such as the time of the day, your location in longitude (but not latitude), your viewing angle at the watch face, what you had for breakfast the previous day, and so on.  When you need to find out if you can catch the next bus, for example, you learn to first move to a location with the optimal ambient lighting, tilt your head just so, and raise your wrist at just the right speed to ensure a correct reading. You went back to the store to exchange the watch for a better one that is correct 95% of the time.  However, it comes with a different set of quirks for getting the correct time.  You need to remember to rub your right cheek before reading