Update on what ChatGPT is bad at doing.

Intro

I've experimented now for about a year with giving ChatGPT tasks. The below criticisms apply to ChatGPT 4o. I recently got access to ChatGPT 4.5 which is better at coding and following instructions (it at least takes the instruction to give steps one at a time, seriously, whereas ChatGPT 4o almost always ignores that). However, 4.5 is more grumpy in tone. I found o1 and o3 not great either, so I keep going back to 4o. Anyway, so far my primary concern is that I can see that within a year or two, this thing will be able to do most of what I can do. It certainly makes my work more efficient.

The bad

Anyway, this is what 4o is bad at:

Making, interpreting, processing, editing spreadsheets. You have to convert the spreadsheet to CSV first (you lose all formulas and charts). It is ok at interpreting small data sets, e.g. 20-30 bits of simple information e.g. amount of money spent per month, with each month listed. It's no good at, for example, creating a Nett Present Value spreadsheet/analysis or fixing spreadsheets with inconsistencies in data.
Accurate journal references. It gets it about 83% correct. I measured. 17% errors. Usually an incorrect URL or DOI for the article, or, the article simply does not exist.
Generic research. It's good at generating boilerplate research documents but the content is not very unique. This is from the Deep Research function. You have to read it yourself and put in actual insights.
Editing long documents or just searching documents for specific errors. Even basic requests like checking for double spaces do not work. In one case it saw a poem and thought that it was grammatically incorrect and had incorrect capitalisation because it did not recognise that it was a poem. It is OK at suggesting alternative wording but seems to prefer active voice American-style writing. It gets confused about academic passive-voice style and wants to change it.
Generating table grids with correct contents. I tried asking it to create a Word file with a bingo sheet. Nope. It generates a generic table (in Word) and some of the bingo sheets have duplicate clues (they're supposed to be random and unique). Even though I told it the squares must be square, it ignored that. Only way to achieve this was to ask it to do it in HTML and embed a javascript randomiser. That worked well. You can see the result at https://www.ostrowick.co.za/bingo/. In other words, the best workaround when it does stupid layouts is to tell it to generate HTML code.
Interpreting PDFs for purposes other than summaries. It really cannot do it properly, in the end. You have to convert the PDF to plain text and clean it up first by making chapter breaks clear with say, underscores, and deleting running headers and footers and page numbers. Even then it doesnt like long documents. I tried giving it a book to summarise in this manner. Nope, I had to do it chapter by chapter. It doesn't understand pagination or chapter headings. If you ask it for page numbers of errors it struggles to give accurate answers. If there's a table or graphic in the PDF, good luck with that.
Code line numbers. Even if you give it a flat text file with code in it, it cannot tell you exactly what line number a bug is in. It tells you within about 10 or so lines, e.g. if the bug is in like 63 it might tell you the bug is in line 77. I suspect it is expecting UNIX linefeeds/carriage returns and gets confused by DOS ASCII. I'll have to test this theory but so far it is pretty bad at that.
Generating images. It has gotten better at making images. But it is not good at vectors. Forget vector diagrams with labels, such as flowcharts and corporate graphics for powerpoints, it just makes flat png files and in one case it switched to ?German? ... yeah German. It is also heavily censored and won't generate images of politicians. I asked it to generate a satirical cartoon and it flat refused, whilst saying it is allowed to do so, it refused all versions of the prompt. Pretty sure satire is fair use of someone's image.
Generating powerpoints. It can generate powerpoints but they are very low quality copy/paste jobs with bullets mindlessly divided between pages/slides for no strong reason. No branding/background graphics etc., not even an attempt. No contents page, etc.
Advanced Mathematics. It still makes mathematical errors. My test is to ask for the rotation by volume of a sine wave of 180 degrees. It does not generally give the right answer, it messes around for ages and you have to argue with it. But otherwise basic stuff it is ok.
Following instructions - specifically the instruction to give steps one at a time. I have to tell it approximately every half hour to give me step-by-step instructions for debugging code or installing complex software. It generally runs ahead and gives 3-20 steps without a pause. The trouble is, it assumes it will all work as planned, proudly announcing at the end of the instructions, "WHY THIS WORKS". And so if you want to report a failure at say, step 2 out of 20, you have to scroll up screeds of instructions to find step 2 and click on it to reply and say "hey no it failed at this step with such and such error".
Character images. I have managed to get it to generate a series of images with repeating characters but they gradually drift till they no longer look the same. I also tried training it on my own images and my partner's images and eventually it complained about deepfakes, sadly before getting them really accurate.

The great

Anyway, this is what 4o is great at:

Coding, specifically shellscripts and code from scratch. If there is any task you need to automate it is amazing at that, e.g. I download and process quite a lot of images and videos, but certain formats take too much space, e.g. from my phone (png, mov). I made scripts that folderise by date and convert to jpg and mp4 to use less space. Scripts work first time, no need to debug.
Another case, I gave it a sketch of a series of tabs (user interface) - as in, a sketch on a piece of paper. It gave me the code instantly and it worked instantly.

You can see this on https://www.ostrowick.co.za/index.php?page=Birdpress_features/tabs.html

Summarising longwinded video transcripts and/or PDFs. I even got it to make piecharts of longwinded data in a pdf. I specifically dislike watching videos over 2 minutes long, so I just get the transcript, copy paste into ChatGPT, and say "summarise this please and give me max 10 bullets of insights". At a reading speed of 700wpm, which is about 10x faster than most people speak, I save 90% time by doing this.
Solving IT problems. A number of obscure problems have come up that it was able to solve, e.g. an old ipod not wanting to connect to my Mac, etc.
Making generic paragraphs and joiner paragraphs. For example, if I have written two sections of research and can't think how to write the paragraph to segue between the two, I give it the two paragraphs (last one from first section, first one from second section) and say "write a paragraph conceptually linking these two sections." Great.
Generic stock photos, although they do have a kind of airbrushed / plastic look. See website below.
Captioned diagrams and cartoons. If you go to www.academiccomputing.co.za there are a few examples there. Not bad at all.
Basic research documents. As I say above, they need to be edited and all references checked. But in general it does not misrepresent authors' positions.
Evaluating truth claims. I've given it random stuff to fact check and it is really good.
Providing short interpretations of small data sets with useful insights. For example, a table with five rows of data and five columns. It can tell you what the data means if you give it context.

Search This Blog

John's AI blog

Update on what ChatGPT is bad at doing.

Popular posts from this blog

Recent Experiments testing ChatGPT's limits and some useful prompts

Building my own LLM