Recent Experiments testing ChatGPT's limits and some useful prompts

Introduction

ChatGPT has a lot of people excited, but it deserves this level of excitement. It has a number of faults which I list below - some with workarounds - but in general it is amazingly good. It is absolutely not for having fun chatting to a robot, it's really more like a brilliant coworker who is incredibly stupid at some quite mundane tasks, but incredibly good at more tricky tasks.

Summary of findings so far.

1. API vs Interactive Version - a caution
2. Coding Strengths and Weaknesses - be very careful but very useful anyway
3. Shell Scripting - amazingly good, probably what it's best at
4. Text Summarisation - fiddly but useful
5. Spreadsheet Analysis - annoying, maybe do by hand
6. Multi-Step Instruction Handling - very annoying
7. Spellchecking and writing style/grammar - ok overall but fiddly
8. File Compatibility and Formatting - not great
9. APA References from DOIs - good
10. Reference Accuracy - bad, double-check its work
11. Tone and Style Adaptations - excellent
12. General Knowledge - excellent
13. Generating ICS Files - very good
14. Math - not great
15. Searching for Online Solutions - not bad.

16. Finding tourism options - not bad at all.

Details

1. Seems that even if you pay for ChatGPT you do not get the API, you have to pay separately for that and load credits. Also, the API seems dumber than the interactive version (4.o). I think it defaults to 3.5. Paying for it lets you upload files, it doesn't seem to let you do that unless you pay. In short, if you want to integrate it into an app, don't forget to specify your model (not default).

2. ChatGPT seems to be pretty good at coding but has a bad habit of deleting pre-existing code that is needed, whilst providing reasonably good code corrections. I suggest it's mostly good for debugging fairly straightforward cases, or generating new code from scratch. For example, I gave it a task to make a questionnaire and it did it well, with very little help, but when I asked it to debug some React code, it was not great. It does however at least give good syntax and fixes broken syntax easily. So if you are too lazy to count brackets and parentheses, just drop the function/method into it and say "fix the brackets here" and it does it. So start by saying "I will present you with some code. Ensure that you perform the minimum changes to the code and do not delete any lines unless you are replacing them with better lines. When you respond, do not give verbose explanations of what the code does, just provide the code and wait for further instructions." You then follow that with what change you want it to make. For those who think this will make programmers obsolete, no. You really have to understand what to ask it, e.g. "I have an API which is not RESTful and seems to send broken SOAP/XML The file containing the error is JSX, attached." ... If you do not know how to ask that, you can't code with it.

3. ChatGPT is incredibly good at bash shellscripts but assumes you are on Ubuntu or similar... Debian-like. You have to tell it your platform if you are not using that. I found in one case I had to convert a manpage to text (like so: man <command> | col -bx > output.txt) and copy/paste the manpage in, so it understood what parameters were available on my idiosyncratic version of UNIX. Anything you want to automate, relating to files and file i/o, you can get it to write a script for. It's incredible. For example, I got it to write me an applescript to bulk create PDFs from Apple Pages files. It worked first time. Another example is I had a batch of mp4s that I wanted as mp3s. I said something like "take a folder called ./ and convert all video files in it to mp3. Use ffmpeg and assume Darwin syntax bash and ffmpeg parameters. Rename all files first to eliminate spaces and funny chars. Move processed videos into a folder called "videos" and mp3s into a folder called "output". Write a script to do this in bash". Again, worked.

4. It's pretty good at summaries of text files but you have to break the files up into about 25-50k chunks of plain ASCII. It is not great at large PDF files, especially with fancy layout. About 10-20 pages seems to be the most before it gets confused, and the more formatting and tables, the more it gets confused. The prompt here is something like "I am going to give you a long document to summarise and I will break it up into chunks for you, and provide one file at a time. Please summarise each file I upload with a key point from the file highlighted in bold. At the end of the task, take all summaries and produce an overarching summary". Basically you have to break the file up and send it as pieces. Sorry.

5. It's not great at analysing spreadsheets, particularly if they are badly formatted, e.g. the data is not clean. You can make it clean up data but it is a multi-step process and it's very limited to the number of lines in the spreadsheet. Again, like with text, you have to chunk the task. I suggest this prompt: "I am going to upload a spreadsheet in CSV format. Please read it and wait for further instructions". Once you send it, you then say, "Please ensure that column 1 is correctly formatted and the data cleaned according to <some specification eg alphabetic, capitalisation, etc>, and restate the CSV file once you are done." Then you check the file, and go on to column 2, etc. It is quite lazy and will stop part-way. You have to say to it "please continue". I tried a 22000 line spreadsheet and it did not cope at all. In a number of these cases I just gave up and said "oh nevermind I will do it manually."

It's pretty good at reading meaning out of small data sets, however. If you give it, say, a table with about 10-100 data points and clearly define the axes, you can say "write me a short paragraph on what this data means, draw out the key learnings and recommendations."

6. It does not listen to the instruction to give one step at at time. You have to keep repeating it. I find that if you are doing a multi-step task which requires you to revert to it with some output, e.g. programming debugging, you need to append something like this to each comment: "Please wait for me to give further input before you respond and ensure you give your response one step at at time, without lots of explanations after each response." It does have a setting to apply such comments to all chats but it forgets that instruction within about 3-4 exchanges. I reported this to OpenAI as it is a huge nuisance.

7. It can't do spelling checks directly. At least not version 4.o and not on the files I tried. I reported this to OpenAI as it is a bit of nuisance as well. However, you can indirectly do it by asking it to check a document for grammar and readability, then it points out the errors just fine. It seems that it wants to delegate spelling to the commandline aspell app or something if you just ask for a spellcheck. So, to get it to spellcheck you have to say "read this document and report any grammatical errors or other errors, but use British English conventions".

8. Reading and writing files: It does not like non-ASCII characters like å, é ü î ø etc. Replace those in a text editor with standard ASCII e.g. aa, e', ue, i^, oe, etc. Also doesn't seem to like backticks in a document but I haven't tested that properly. It can read Word files and produce them, but the output is not great, and not significantly better than plain text. I find that it works better if you ask it to "produce a nicely-laid out HTML file with a table of contents made using "a href="#chapter" type of tags". You then copy/paste the file into a text file, save as .html, open in the web browser separately, and once your web browser renders the file, copy/paste the formatted file into Word or similar. It also helps if you specify that it "must embed the CSS formatting of the file inside the HTML".

9. It can do cool things like take a heading on a journal article and convert it to an APA reference (ie the author name, publisher, date, article title etc., and make the reference). This saves a hassle of manually formatting that with multiple copy/pastes. I know about "cite this for me" but this saves the effort of opening another browser tab. In fact you can just give it a DOI and tell it "read this DOI and give me the APA reference for the article it links to".

10. It is bad at citing accurate article references, my experiment with this showed a 22% error rate ie 22% of the DOIs were completely fake, made up, or linked to the completely wrong article. About 5% of the article titles were bad or wrong as well. You HAVE TO check its referencing. I suggest this wording: "Please find an academic reference on <topic> and make 100% sure you find a correct DOI and an accurate APA reference and article title. Do not make stuff up. Double-check yourself by resolving the DOI that you propose and ensuring that it comes up with the same article." This is its primary form of "hallucination". Again, to access the web, you need the paid version, so this is useless on the free version. You absolutely must check every reference it gives you and report this back to it if it gives you a fake one.

11. If you tell it to rephrase or change the tone or register of a document (short!) it can do so pretty well, as well as write decent letters. For example, "take this document and write a blurb for a book cover", or "read this resumé and produce a biographic summary" or "write a letter of recommendation for this person based on their resumé". It does those tasks well.

12. It's good at explaining stuff and seems pretty omniscient. It even understands our local African languages. I've asked it lots of obscure things and it understands and explains them better than I can. You can also ask it to dumb it down. For example, "Explain Newton's three laws of motion to me like I am 10 years old." It's great for teacher lesson prep. You can give it a list of topics, specify the age of children, and say "write some lesson plans per topic" and it just spits it out. I asked it to make a multiple choice test, on a bunch of uploaded PDFs, and it did it. It's great at this stuff.

13. Very cool: if you have a bunch of events (dates, times, places and event titles), say in a CSV spreadsheet, you can say "Take this CSV and turn it into ICS files for Mac OS iCal format. Give me the code to copy/paste into an ASCII file, and make sure each event is a separate .ics". It then gives you the ICS code. Copy paste into BBEdit or similar and save as .ics. Yes, it works. I managed to generate a whole school event calendar of like 20 events to iCal with that instruction. Instead of manually copy/pasting back and forth.

14. It's not great at math. You can trick it to count wrong, add wrong, or give bad integral calculus. I've argued with it many times. Unfortunately, it's a sycophant, so if you directly tell it that it is wrong, it just agrees with you and placates you instead of proving its answer, and if you give it a fake answer and tell it that that is the correct answer, it just agrees (eek).

15. It's not bad at searching for reasonable solutions online without all the crufty adverts and spamware from Google. So for example I said "Find me a mac app compatbile with <OS version> that will give me a taskbar thing at the bottom of all screens like Windows Tray. Provide me with a clear-text URL that I can copy/paste". It did it. The reason for the cleartext URL is that the rendered URL isn't always clickable. IE <a href="http://www.ostrowick.co.za"> John's site</a> doesn't always work inside the chat UI, but <a href="http://www.ostrowick.co.za"> http://www.ostrowick.co.za</a> does because you can directly copy/paste the URL string. I won't swap Google out just yet though, as ChatGPT tends to offer limited answers. I had to solve some the old way (ie plain Google) on a number of occasions. For example, I wanted an app to make a duplicate mac menubar. The URLs it gave me did not work as the sofware had been discontinued. I had to use the "before:" parameter in Google to find some ancient thing that actually worked, on some archive site.

16. Pretty good at finding tourism options as long as you are clear on what you want (outdoor, indoor, food, sport, activities, drinks, scenic views, multimedia, etc.).

The bottom line

Is it worth paying for? Absolutely yes. Definitely. It is by far the most useful piece of software I have found since ... I don't know. BBEdit? Excel? Photoshop? Bash? Like, 1997-ish, when I first used Linux, was the last time I was this impressed. Really. It's amazing if you know its limits and handhold it. I am a bit worried about management consultants because it's really those guys' jobs that it does pretty well. I mean, generating videos and pictures and music is incredibly impressive (and sadly, we were hoping the machines would take away the manual labour so that we could do art and music), but this thing is a swiss army knife of software. 9.5/10.

Search This Blog

John's AI blog

Recent Experiments testing ChatGPT's limits and some useful prompts

Popular posts from this blog

Deepseek

Update on what ChatGPT is bad at doing.