Friday, February 24, 2023

Can ChatGPT do mathematics?

My colleague, Dr. Matt Biesecker, decided to play with ChatGPT as he was preparing the Probability & Statistics final exam.  He asked ChatGPT, “What is a good exam problem involving conditional probability?

Below is his exchange with ChatGPT:

ChatGPT:   A jar contains 5 red balls and 5 green balls. A ball is drawn at random from the jar. What is the probability that the ball is red, given that it is not green?

Dr. Biesecker: What is the solution to the problem you told me about?

ChatGPT:  The solution to the problem is that the probability that the ball is red, given that it is not green, is 4/10.

Dr. Biesecker:  Could you explain your solution to me?

ChatGPT:  The solution to the problem is that the probability that the ball is red, given that it is not green, is 4/10. This is because there is a 4/10 chance that the ball is red, and a 6/10 chance that the ball is not red.

Dr. Biesecker:  I think you are incorrect, because if the ball is not green, then it is certainly red.

ChatGPT:  This is not necessarily the case. The ball could be blue, for example.

Dr. Biesecker:  But you told me the jar only contains red and green balls.

ChatGPT:  This is true, but the problem states that a ball is drawn at random from the jar. This means that any ball, red or green, could be drawn. 

Dr. Biesecker: Then how could you get a blue ball?

ChatGPT:  The ball could be blue, but there is still a 4/10 chance that it is red.

Dr. Biesecker: But the jar does not contain any blue balls!!

ChatGPT:  This is true, but the problem states that a ball is drawn at random from the jar. This means that any ball, red or green, could be drawn.

Based on Dr. Biesecker’s exchange with ChatGPT in December, I am pretty sure that ChatGPT isn’t very good at probability questions. So, can we use ChatGPT in math class? Dr. Biesecker also asked ChatGPT to do three proofs. ChatGPT did one of the three proofs correctly—so, we could give our students the incorrect proofs and ask them to find the errors in the arguments.

During History of Math in January, we were looking at different number systems: Babylonian, Mayan, Egyptian, and Roman. My students were completing a worksheet where they needed to add two numbers in each system. I decided to ask ChatGPT to add two Roman numerals, MCXLVII and MMCDLXXXIV. It was able to identify the first Roman numeral as 1,147, but it misidentified the second Roman numeral so the sum was incorrect. I decided to try again and now (end of February), ChatGPT did the problem correctly.

While we have seen improvements in ChatGPT in answering mathematics questions, it is still not that great. In the paper (shout out to Mike Ion, University of Michigan for telling me about it), MathematicalCapabilities of ChatGPT, the authors ask ChatGPT to answer some mathematical exam-type questions. The authors created datasets and ran these datasets through ChatGPT. One of the datasets, labeled MATH (Mathematics Aptitude Test of Heuristics), included problems taken from various mathematics competitions. Two characteristics of the problems in the MATH dataset are: each problem has a unique answer and all answers can be condensed into a few characters. ChatGPT was given a random sample of problems from the MATH dataset and only answered 26% of those problems correctly.

While ChatGPT has been generating lots of interesting conversations, it doesn’t seem to be very good at mathematics. I suspect it will continue to improve, but better AI applications exist for doing mathematics problems.

No comments:

Post a Comment