Handling ChatGPT Large Files
Here is how ChatGPT responded to my same question I asked Gemini in the previous post about examining a large .zip file containing .jpg items that needed to be OCRed. I have not placed the entire ChatGPT session here because this was a project to check on questions about an annuity policy for a friend:
I have scanned in a large document’s pages as greyscale .jpg page files that I am hoping OCR can be used on them. They are in a .zip file. Are you able to read in a .zip file and answer questions about the overall project from my scanned pages?
Yes, Charles — upload the .zip file and I should be able to work with it. 👍
I can generally:
- Open/extract the ZIP
- Review the scanned JPG page images
- Use image reading and OCR-style extraction where needed
- Summarize the document
- Answer questions about the contents
- Look for names, dates, topics, repeated themes, contradictions, missing pages, etc.
- Help organize it into sections or create a cleaner typed version
A few practical notes:
- Greyscale JPG scans are usually fine, especially if the text is clear.
- OCR quality depends heavily on scan sharpness, page skew, margins, contrast, handwriting, and font size.
- If the document is very large, I may process it in sections and give you useful partial results rather than pretending everything is perfect.
- If the pages are out of order, filenames like
page-001.jpg,page-002.jpg, etc. help a lot.
Go ahead and upload the .zip file when ready.
This is the file. How well are you able to handle it?
I can handle this ZIP pretty well. I inspected it, and it looks very workable. 👍
Here’s what I found:
- The ZIP is about 122 MB.
- It contains 125 JPG page scans, numbered Annuity0000.JPG through Annuity0124.JPG.
- The pages appear complete with no missing page numbers.
- Every image is greyscale JPG.
- Every page is the same size: 2550 × 3299 pixels, which is a good scan size for OCR.
- The folder structure is nicely organized into sections, such as:
- Preface to contract
- Actual contract
- Contract schedule
- Riders
- Original annuity application
- IRA disclosures
- Privacy notice
- Withdrawal forms
- Required minimum distribution form
- Annuity values / withdrawal benefits
- Financial professional info
I tested OCR on several sample pages. The results are good to very good on typed contract pages. For example, the OCR clearly picked up contract schedule details, dates, values, headings, and much of the form text.
The weaker spots will likely be:
- handwritten entries
- checkboxes
- small print in dense forms
- pages with logos, shading, or gray background noise
- tables where columns need careful interpretation
- exact dollar amounts, percentages, contract numbers, and dates that must be verified carefully
So, my practical answer is:
Yes, I can read this ZIP and answer questions about the overall project.
For general summaries, explanations, section-by-section review, and finding key items, it should work well. For legally or financially critical details, I would OCR it, then spot-check against the image pages before treating exact numbers as final.
Best next step would be for you to ask something like:
“Give me a plain-English summary of the whole document.”
or
“Find anything related to withdrawals, penalties, surrender charges, death benefits, and required minimum distributions.”
The short answer is no, I cannot directly unzip a file, run OCR (Optical Character Recognition) on images, and read the text from them. As an AI, I don’t have a built-in OCR engine to extract text locked inside image files (like