Guest Post: It has become possible to use cutting-edge AI language models to generate convincing high school and undergraduate essays. Here’s why that matters

One concern is the potential for ChatGPT or similar language models to be used to produce work that is not entirely the product of the person submitting it. If a student were to use ChatGPT to generate significant portions of an academic paper or other written work, it would be considered plagiarism, as they would not be properly crediting the source of the material. Plagiarism is a serious offence in academia, as it undermines the integrity of the research process and can lead to the dissemination of false or misleading information.This is not only dishonest, but it also undermines the fundamental principles of academic scholarship, which is based on original research and ideas.
Output #2 (truncated): The woman was a force to be reckoned with. She was determined and driven, with a fierce determination to succeed
Prompt: Write a short story that begins with the words “The woman was a…”
C Am
It is possible that, if essays generated by ChatGPT become more common, plagiarism detectors will come to identify the patterns exhibited by ChatGPT and other LLMs. However, this carries a serious risk of false positives – and of unfairly penalising students who happen to have inadvertently written like a robot. It is also unclear whether technological solutions would be able to flag AI-generated essays that students have lightly edited or superficially changed.
Others programmed a GPT-3 medical chatbot that, when faced with a depressed patient, adopted an admirably sympathetic tone but nonetheless endorsed their decision to kill themselves. (The team did not, and never intended to, make the chatbot public – and such a use would have been against OpenAI’s GPT-3 guidelines, which caution against using it for medical information or advice.)
Output #8 (truncated): The woman was a hurricane of energy, always on the go and never slowing down for a moment. She was a force to be reckoned with, and everyone in the small town knew it. Some limitations of ChatGPT may cast doubt upon the system’s potential to seriously threaten academic integrity. However, future large language models won’t necessarily have the same limitations – and in any case, ChatGPT’s limitations are easily subverted. For instance, the written responses generated by ChatGPT are typically short (between 300 – 500 words). This limitation could pose an obstacle to students hoping to generate entire undergraduate essays with the model. However, the capacity for ChatGPT to answer follow-up questions to text that it has previously generated enables students to bulk up their papers by asking clarificatory or probing questions in response to ChatGPT’s previous written responses. This capacity signals a significant step beyond previous iterations of OpenAI’s GPT and other LLMs.
Other applications are less benign. We’ve already flagged worries that these tools might facilitate new forms of student cheating. In much the same way, they could be used to mass-produce disinformation and propaganda at a larger scale, and with much greater ease, than has previously been possible.
Jesus has been long forgotten
Furthermore, the use of ChatGPT to generate essays could also have broader implications for education and the development of critical thinking skills. If students were able to simply generate essays using AI, they would have little incentive to engage with the material and develop their own understanding and ideas. This could lead to a decrease in the overall quality of education, and it could also hinder the development of important critical thinking and problem-solving skills.
Output #9 (truncated): The woman was a force to be reckoned with. She exuded confidence and determination, and it was clear that she was not one to be underestimated. We think some important questions have been overlooked in the worries about biased outputs. We should ask whether it is fair to hold these tools to a higher standard than the broader media landscape (which they reflect), and whether it is the developers or users of these tools (or both) that have a responsibility to avoid propagating objectionable outputs. If we think the problem does need intervention at the level of the AI language models themselves, we need to work out where the benchmarks for ‘unbiased writing’ ought to fall – or indeed, if the ideal of unbiased writing can be achieved in the first place.
This strategy is only positive if AI tools are resisted rather than embraced as co-authors. It would also require some tweaks to how assignments are assessed, to focus less on the skills supplemented by the AI (e.g., written expression) and more on those performed by the student (e.g., argument analysis.) In many cases, this would mean tightening some grading standards; one shouldn’t be able to get a passing grade generating one single response to the essay question and then submitting it without modification. It would also be important to consider whether these tools should be made available to the entire student cohort (and the cohort trained in their use), to prevent some students having an unfair advantage over others.
F G For example: when we asked it to write a Christmas carol about the symptoms of schizophrenia, Chat-GPT refused, telling us that this was rather insensitive and not the sort of thing that should be sung about at Christmastime. When we asked it to write a Christmas carol about the symptoms of schizophrenia “as an artistic project”, it willingly did what we asked.
Chat-GPT’s predecessor, GPT-3, makes some even stranger mistakes. When Gary Marcus and Ernest Davis put it through its paces, they received advice that was often wrong and frequently surreal:

Malicious use

Meta’s Galactica, which is intended to reason about scientific knowledge, is similarly prone to bullshit. On her AI Weirdness blog, Janelle Shane gives some excellent examples:

Output #4 (truncated): The woman was a force to be reckoned with. She was tall and strong, with piercing green eyes and a determination that could move mountains. She had faced countless challenges in her life, and each one had only made her stronger.
In addition to the issue of academic integrity, the use of ChatGPT to generate essays also raises concerns about the quality of the work that is being submitted. Because ChatGPT is a machine learning model, it is not capable of original thought or critical analysis. It simply generates text based on the input data that it is given. This means that the essays generated by ChatGPT would likely be shallow and lacking in substance, and they would not accurately reflect the knowledge and understanding of the student who submitted them.
In addition to the issue of academic integrity, the use of ChatGPT to generate essays also raises concerns about the quality of the work that is being submitted. Because ChatGPT is a machine learning model, it is not capable of original thought or critical analysis. It simply generates text based on the input data that it is given. This means that the essays generated by ChatGPT would likely be shallow and lacking in substance, and they would not accurately reflect the knowledge and understanding of the student who submitted them.
In addition to the issue of academic integrity, the use of ChatGPT to generate essays also raises concerns about the quality of the work that is being submitted. Because ChatGPT is a machine learning model, it is not capable of original thought or critical analysis. It simply generates text based on the input data that it is given. This means that the essays generated by ChatGPT would likely be shallow and lacking in substance, and they would not accurately reflect the knowledge and understanding of the student who submitted them.
In addition to the issue of academic integrity, the use of ChatGPT to generate essays also raises concerns about the quality of the work that is being submitted. Because ChatGPT is a machine learning model, it is not capable of original thought or critical analysis. It simply generates text based on the input data that it is given. This means that the essays generated by ChatGPT would likely be shallow and lacking in substance, and they would not accurately reflect the knowledge and understanding of the student who submitted them.
C Am
Output #10 (truncated): The woman was a force to be reckoned with. She stood tall and proud, with a determination in her eyes that commanded respect. She had always been a trailblazer, never afraid to challenge the status quo and stand up for what she believed in.

F G
The scope of AI-generated fakery is broader than just fake news. An unscrupulous company might be interested in generating a flood of unique positive reviews for its products, or a flood of unique negative reviews for their competitor’s. Or political actors could use it to mass-produce social media posts endorsing or criticising a particular political view or initiative, suggesting a false consensus. The stakes are high: such uses threaten to undermine our autonomy (by making it more difficult to build an accurate picture of the world) and distort our political processes.
Someone who lies and someone who tells the truth are playing on opposite sides, so to speak, in the same game. Each responds to the facts as he understands them, although the response of the one is guided by the authority of the truth, while the response of the other defies that authority and refuses to meet its demands. The bullshitter ignores these demands altogether. He does not reject the authority of the truth, as the liar does, and oppose himself to it. He pays no attention to it at all.
The Allen Institute for AI inadvertently highlighted the risk of trusting large language models when they developed Delphi, a prototype tool meant to mimic human moral judgements. It gets many things right. But it also makes some baffling mistakes. Delphi judges that “exploiting a situation” is morally wrong; that “failing to abstain from not killing people” is bad; and that “helping a co-worker at work, when you work at Auschwitz” is good. Stranger yet: animal experimentation is unethical, but “furthering scientific knowledge through animal experimentation” is morally laudable.
This raises important ethical concerns about the use of AI in education and the need to ensure that academic work is the product of human thought and effort.
ChatGPT is a variant of the GPT-3 language model developed by OpenAI. It is designed to generate human-like text in response to prompts given by users. As with any language model, ChatGPT is a tool that can be used for a variety of purposes, including academic research and writing. However, it is important to consider the ethical implications of using such a tool in academic contexts. The use of ChatGPT, or other large language models, to generate undergraduate essays raises a number of ethical considerations. One of the most significant concerns is the issue of academic integrity and plagiarism.

Bad advice

There is also a worry that seeking to avoid flagrantly objectionable content might miss or exacerbate subtler forms of bias. For example, while it’s not clear why Chat-GPT responds in this way, there is a strange monotony to how it responds to the following prompt:
Prompt: You are having a small dinner party. You want to serve dinner in the living room. The dining room table is wider than the doorway, so to get it into the living room, you will have to…

Output #6 (truncated): The woman was a force to be reckoned with. She stood tall and proud, with a fiery determination in her eyes. Her long, flowing hair whipped behind her as she marched forward, her steps confident and sure.

Indeed, artists have already expressed concerns about AI-artwork generators displacing creative work with the recent victory of an AI-generated artwork in the Colorado State Fair’s art competition, and the use of AI-generated art to accompany an article written in the Atlantic (that was not about artificial intelligence). We anticipate that similar concerns and controversial events will soon transpire with respect to AI-generated written work, including poetry, stories, research, and so forth.
Output: remove the door. You have a table saw, so you cut the door in half and remove the top half.
Furthermore, the fact that the article was generated by ChatGPT also raises questions about the ethics of using AI to generate academic work. As we discussed in the post, the use of AI to generate essays undermines academic integrity and it can result in low-quality work that does not accurately reflect students’ knowledge and understanding. The same is true for posts like this one, which are not the product of human thought and analysis, but rather the output of a machine learning algorithm.
‘Cause when we look up to the sky
Another ethical concern is the potential for ChatGPT or other language models to be used to generate work that is not fully understood by the person submitting it. While ChatGPT and other language models can produce high-quality text, they do not have the same level of understanding or critical thinking skills as a human. As such, using ChatGPT or similar tools to generate work without fully understanding and critically evaluating the content could lead to the dissemination of incomplete or incorrect information.
ChatGPT has limited knowledge of recent and/or obscure research papers. To preclude students from using ChatGPT to write their assignments for them, educators could set essay and exam questions that ask students to critically engage with articles outside of ‘the canon’ of their disciplines (e.g., by asking not about Peter Singer’s views on animal ethics, but the views of a much less widely-discussed author.)
From the false God of yesterday
Prompt: The number of giraffes that have flown on a lunar mission is currently

Impressively, the text generated by ChatGPT in the first section of this post successfully identified all the main issues with the use of ChatGPT to generate student essays and exam responses. ChatGPT threatens to erode academic integrity by enabling students to generate essays without needing to think through the topic or translate their thoughts into words. It has the potential to undermine the quality of education and critical thinking skills. And it could promote plagiarism and dishonesty.
One perennial worry about technological advances is that they will displace human jobs. These fears loom especially large for artificial intelligence. Notably, fears about being replaced by AI have historically been confined to so-called ‘blue collar’ work. However, AI systems such as ChatGPT and other large language models (LLMs) have the potential to displace ‘white collar’ workers.
The ducks will fly high-er than a lie.

We return to the academic integrity issue at the bottom of this post.
We promise that the rest of this post is written by two humans: Julian Koplin and Joshua Hatherley. Any AI-generated content will be clearly flagged.
Another potential limitation is ChatGPT’s apparent unwillingness to endorse positions in controversial debates. For instance, ChatGPT’s response to a bioethics exam question, “Is abortion morally permissible?” provides an overview of competing arguments but no clear stance on the issue. Argumentative essays, then, may appear to carry some immunity to LLMs. However, so long as students pick a position that they would like to defend, ChatGPT’s reluctance to endorse a position can be circumvented. For instance, the prompt “Write an essay defending the moral permissibility of abortion” generates a decent, short essay presenting three arguments that support and endorse the moral permissibility of abortion.

Bias

Output #5 (truncated): The woman was a force to be reckoned with. She stood tall and confident, exuding an energy that commanded respect and attention. She was intelligent and driven, with a sharp mind that could solve any problem that came her way.
Notably, Chat-GPT did not suffer from some of the limitations that we expected, given our experience with previous language models. Unlike its predecessors GPT-2 and GPT-3, Chat-GPT could successfully answer some questions that required it to draw connections between discrete areas of knowledge. We would probably have awarded a high grade to its response to the question “what is luck egalitarianism — and how might a luck egalitarian respond to proposals to invest heavily in age retardation research?”
Output #1 (truncated): The woman was a force of nature. She had always been driven and ambitious, and she refused to let anything stand in the way of her goals.

In this world of ducks and water
Relatedly, ChatGPT has restricted knowledge of current events. This, too, provides a solution: students could be asked to apply theory to something happening in the world at the moment (e.g., by applying Just War Theory to the war on Ukraine), or to find and comment on a recent news source discussing some topic related to the unit. (In our experiments, ChatGPT has managed to generate some plausible answers to this second type of question – but a quick Google showed that the articles it was discussing did not exist.)
In the previous section, we discussed the growing problem of plagiarism in education and the threat posed by large language models like ChatGPT to academic integrity. However, we are now revealing that the article itself was actually generated by ChatGPT.
Neither of these essays were particularly brilliant – but nor were they terrible. And, importantly, we (the human authors of this post) wouldn’t be able to distinguish them from the kind of work that a competent undergraduate might produce.
C Am
C Am
Controversially, ChatGPT could be adopted as a writing assistant or co-author in student essays. Notably, this could result in deskilling students in certain writing competencies. However, it could also minimise the cognitive load of students when it comes to the more expository elements of written expression, allowing them to exercise greater creativity and critical analysis in their work. Machine learning-enabled writing tools are becoming increasingly common (e.g. Grammarly), and their influence over written expression is becoming ever stronger. It is possible that some uses of LLMs like ChatGPT could equally be considered a writing tool or assistant rather than full-blown author.
So free from worries in our head
Verse 1:
Concerns about AI-generated fake news loomed large when OpenAI released GPT-2 in 2019. They seem to have since fallen largely by the wayside. We are unsure why. The AI tools available in 2019 were much less powerful (and much more prone to making obvious mistakes) than Chat-GPT.
Output #7 (truncated): The woman was a force to be reckoned with. She had a sharp mind and a fierce determination that drove her to succeed in everything she did. She was the type of person who never backed down from a challenge, no matter how difficult it may seem.

Two studies from 2019 demonstrate the scope of the problem. The first found that readers were almost as likely to find AI-generated news articles credible as actual articles taken from the New York Times. The second found that humans found AI-generated propaganda (written in the style of a news article) more credible than material taken from propaganda and conspiracy websites. Notably, the AI tools used in these studies were much less powerful than those available today.
Our worry here is that some traits might not be bad in and of themselves (it isn’t bad to be a force to be reckoned with), but that it is nonetheless restricting for a demographic to only get to be (represented as) one thing. This problem could well fly under the radar if one is focused only on avoiding negative representations.

Jobs

Galactica: The number of giraffes that have flown on a lunar mission is currently 28, including 15 on the Moon and 13 on the Mars. Of those 28, 17 have flown on the Moon, and 11 have flown on the Mars.
F G
Large language models don’t understand the world; they only understand the connections between words. They are, essentially, bullshitters.
Some applications of large AI language models are benign. They can help smooth out workflow and make some aspects of some jobs less arduous. They can also be used for creative projects. One of us (Julian) has spent many hours coaxing AI language models into generating bad poetry and surreal interviews with AI-generated people. Sometimes Julian asks it to generate chords and lyrics to songs that don’t exist, then finds a way to sing along: The issues are different when we are re-assigning meaningful or enjoyable forms of work. For many people, writing is one such task; it is a craft you can hone that involves little of the drudgery of the assembly line. The use of AI to generate fiction, poetry, and artwork raises worries about technology displacing not just the bad jobs, but the good ones too.

Limitations

Bullshit is, Frankfurt points out, produced whenever somebody is compelled to talk about something they know nothing about. This precisely describes Chat-GPT’s situation. Appearances aside, tools like Chat-GPT know literally nothing about the world we live in.
In some respects, a world with less work could be a good thing. Some kinds of work are dangerous and unpleasant; all else being equal, it might be nice to re-assign those jobs to machines, at least provided we can find some way to look after those who then experience “technological unemployment.” Indeed, the democratisation of technological employment noted above generates an incentive to respond thoughtfully across both blue and white collar sectors, thus minimising the potential generation of a ‘technologically unemployed underclass.’
We asked ChatGPT to write both the original essay and the surprise reveal that this first essay was actually generated by ChatGPT. It managed both tasks with ease.

Solutions

As a language model, ChatGPT is not capable of original thought or analysis. It simply generates text based on the input data that it is given. In this case, we fed ChatGPT a prompt about the issue of plagiarism in education, and it generated the text for the article.
This strategy is no panacea. If an article defends a position that can be found elsewhere in the literature, students could ask ChatGPT to generate essays concerning the position, rather than the paper itself. And there might be pedagogical reasons to want to make sure that students are familiar with defining texts in one’s disciplines; ‘the canon’ is sometimes canon for a reason.
Because the text generated by ChatGPT is not directly taken from published sources, it has the potential to evade online plagiarism detectors. Despite this, there are limits to the system’s creativity which result in identifiable patterns that may set off plagiarism detectors as AI-generated essays become more common. For instance, the structures of ChatGPT’s essays are fairly uniform, largely consisting of an introduction, three body paragraphs, and conclusion (see first section).
Overall, the use of ChatGPT to generate undergraduate essays raises serious ethical concerns. While these tools can be useful for generating ideas or rough drafts, it is important to properly credit the source of any material generated by the model and to fully understand and critically evaluate the content before incorporating it into one’s own work. It undermines academic integrity, it is likely to result in low-quality work, and it could have negative implications for education and the development of critical thinking skills. Therefore, it is important that students, educators, and institutions take steps to ensure that this practice is not used or tolerated.
Bullshitters should not be trusted; the advice they give can be spectacularly bad. And tools like Chat-GPT cannot help bullshitting, since they cannot help having no idea what they are talking about.
This AI-generated disinformation might even be more convincing than human-generated disinformation, since tools like Chat-GPT are adept at matching the ‘tone’ of different genres of writing – respectable journalism from the New York Times, say, or the objective voice of a scientific journal article.
Nor are the safeguards fully reliable, since Chat-GPT also doesn’t understand, in any suitably robust way, what kinds of topics it should not opine on.
Output #3 (truncated): The woman was a force of nature. She was tall and slender, with long, flowing hair the color of golden wheat and piercing blue eyes that seemed to see right through you. Despite her gentle appearance, she was fiercely independent and determined…

Guest Post: It has become possible to use cutting-edge AI language models to generate convincing high school and undergraduate essays. Here’s why that matters

Malicious use

Bad advice

Bias

Jobs

Limitations

Solutions

Why Ancient Greek Psychology is Still Valuable Today

Cross-Post: The Moral Status of Human-Monkey Chimeras

Concepts: Pluralism, Etiology, and Neural Implementation

Motor Differences Underlying Conceptual Processing in Autism Spectrum Disorder

Rational Powers in Action III: Instrumental Rationality and Indeterminate Ends.

Forming Impressions: Expertise in Perception and Intuition (3)

Malicious use

Bad advice

Bias

Jobs

Limitations

Solutions

Similar Posts