Using ChatGPT in education - a personal experience

José Nuno Oliveira

^{INESC TEC & School of Engineering, University of Minho}

Over the past four years, several events haveshaken the world from the relative tranquillity of the post-World War II period. Towards the end of 2019, a pandemic emerged, the likes of which had not been seen since the 1918 flu. In 2022 and 2023, two wars began on Europe’s doorstep; and on the last day of November 2022, the news came that Artificial Intelligence (AI) software capable of automatically carrying out tasks that until then were thought to be the privilege of the human intellect (e.g., writing prose, composing poetry and programming computers) was available online.

It is often said that nothing stays the same after a war or a pandemic. In effect, COVID-19 had a profound impact on education - demonstrating, to the great consternation of some, that traditional face-to-face teaching was becoming obsolete. After all, it was possible to study online, saving time and resources, by learning directly from teaching materials made available by increasingly sophisticated teachers and YouTubers.

As if this weren’t enough, the emergence of ChatGPT seemed to appear as yet another disruptive element for teachers, academics and other professionals. Of course, the scientific advances that made large language models (LLM) like ChatGPT possible were well known to experts, researchers and labs, especially to those who developed them. It was also common to use online tools like “Google translate” and others to check a manual translation or even do it automatically in full, and then improve it by hand. Furthermore, people were already used to interactive “bots” showing up here and there in online services. What was awe-inspiring was the broad spectrum of knowledge that ChatGPT seemed to encompass and its conversational fluency. It was no longer just taxi drivers and bank employees who had their jobs at risk: the professional future of educators and researchers themselves, whether academic or not, was threatened.

Many immediately thought of a prohibitionist stance. Others just ignored such services due to their unreliability. In fact, in April 2023, the author of these lines received a message from an Irish colleague about ChatGPT reporting his death (the author’s) in 2019. Many anecdotes circulated and still circulate on social media to the delight of humans who ridicule the imperfect engine that tries to replace them. On a different register, “The False Promise of ChatGPT”, a guest essay by Noam Chomsky and other linguists on the NewYork Times (March 8, 2023) was quite clear about the essence of LLMs not being genuine intelligence [1]. Moreover, the prospect of large-scale high-tech plagiarism driven by LLM technology became a concern that eventually led, last June, to EU regulation 2024/1689 laying down harmonised rules on the use of AI.

Early in December 2022 emails began circulating, more or less surprised at what LLMs could do in computer programming. Tempted to challenge ChatGPT with a problem not widespread in the literature, the author chose a simple problem description from the introductory classes of one of his courses, which goes like this: “For each list of calls stored in a mobile phone (eg., numbers dialled, SMS messages, lost calls), the store operation should work in a way such that (a) the more recently a call is made the more accessible it is; (b) no number appears twice in a list; (c) only the most recent 10 entries in each list are stored.”

ChatGPT promptly generated a Python program that, despite minor syntax errors, was quite acceptable and not much different from what a first-year student would write. Interestingly, it suffered from unnecessary implementation detail such as recording the exact time of each call — precisely the same kind of mistake that students tend to make in their (very often “biased”) programs.

As the course deals precisely with how to build elegant and generic programs using functional programming, the next experiment was to ask for a program for the same challenge, but in Haskell, the language used in the course. Although Haskell is less popular than Python, the software produced a syntactically correct Haskell program capable of using existing libraries, albeit in a rather baroque way.

Impressed with the results, the author responded with an unusual request: as the course addresses a concise programming style that dispenses with program variables to ensure correctness by construction, a solution in this style (termed “point-free”) was now requested. The result was quite surprising (see the figure below); quite surprising - because, in the challenge for which it was perhaps least trained, the machine came up with the best solution: a solution with a three-step pipeline, one per paragraph of the problem, in the correct order, with just one error in one of the steps.

Realising how this error contained a technical subtlety that was studied in the course, it was decided to use the example in an exam question [2]: students were asked to analyse the solution proposed by ChatGPT and diagnose the error. The results were anything but encouraging, as this was the question most students sank on, and for a simple reason: they are trained neither to analyse code written by others nor to express themselves in prose. Worse than that, faced with such an unusual question, most of them didn’t even try to answer.

In retrospect, such a reaction is nothing more than part of a much larger problem in social behaviour, and one that negatively impacts education: the lack of critical thinking caused by the current overdose of social network communication based on telegram-like messages that are forwarded without critical evaluation. This is the crucible in which fake news ferments.

Aware of the urgent need to prepare computer science students for a critical use of LLM-based tools, the author has since embarked on a teaching style in which, upon formulating a new problem, the outcome of a LLM solution (typically by ChatGPT) is promptly taken as starting point and analysed by students. Experience in doing this at master’s degree level, for Alloy models generated from problem requirements, has been quite pedagogical. It is true that much garbage is generated, but students are learning from it and becoming aware that better prose generates better models.

It must be said that these experiments have been carried out tentatively, not systematically. Nevertheless, what has been learnt can already be framed in a more subtle setting, a kind of “revenge of the Arts”. Why revenge and in what sense? Many students will have sought STEM training to free themselves from unpopular subjects such as literature, poetry, and the arts in general. As a result, they lack command of written prose, let alone articulated speech. If LLMs now seem to need well-written requirements to produce less waste, how ready are students to properly formulate their quests? Is having “survived” reading a voluminous novel like Tolstoy’s Anna Karenina actually an asset for being a good programmer in the LLM era?? Here is a provocative question deserving some meditation. We live in the offspring of the “big divide” between art and science, in the name of specialisation and productivity. Moreover, the proclivity of youngsters to technology is well-known. But we may have to revisit such a disastrous split in our modern times, as some are already proposing by advocating STEAM and not just STEM education, where the ‘A’ stands of course for ‘art’.

These are odd times for anyone who, like the author, still regards programming as a calculational exercise out of which a correct computer program should emerge. Clarity and economy of thought are essential to such exercises, as coincidentally LLM-generated programming seems to require too. We should not rule out any technology enabling us to produce good software, be it through AI, mathematics or both. One thing can be taken for granted: who in the future will fly in a plane whose LLM-generated software has not been verified 100% correct? On the day direct generation of programs from requirements proves to be definitively effective, the need for (formal) verification will remain. And this is where jobs in the future of computing are likely to be found.

Sources

¹ Noam Chomsky (March 8, 2023), The False Promise of ChatGPT. New York Times

²https://haslab.github.io/CP/2223/Material/cp22231.pdf