ChatGPT: Checking a very Large set of Files in a Zip File

This page shows how ChatGPT handled a large zipped set of scanned grayscale JPG pages for an annuity policy project. ChatGPT said it could open the ZIP file, inspect the images, use OCR-style reading, summarize the document, and answer questions. After reviewing the upload, it reported that the file was about 122 megabytes and contained 125 numbered JPG scans with no missing pages. It also noted the folder structure, page size, and likely OCR quality. The key takeaway is that ChatGPT could handle the project well, while warning that critical legal or financial details needed careful verification.

Handling ChatGPT Large Files

Here is how ChatGPT responded to my same question I asked Gemini in the previous post about examining a large .zip file containing .jpg items that needed to be OCRed. I have not placed the entire ChatGPT session here because this was a project to check on questions about an annuity policy for a friend:

I have scanned in a large document’s pages as greyscale .jpg page files that I am hoping OCR can be used on them. They are in a .zip file. Are you able to read in a .zip file and answer questions about the overall project from my scanned pages?

OpenAI ChatGPT LogoYes, Charles — upload the .zip file and I should be able to work with it. 👍

I can generally:

  1. Open/extract the ZIP
  2. Review the scanned JPG page images
  3. Use image reading and OCR-style extraction where needed
  4. Summarize the document
  5. Answer questions about the contents
  6. Look for names, dates, topics, repeated themes, contradictions, missing pages, etc.
  7. Help organize it into sections or create a cleaner typed version

A few practical notes:

  • Greyscale JPG scans are usually fine, especially if the text is clear.
  • OCR quality depends heavily on scan sharpness, page skew, margins, contrast, handwriting, and font size.
  • If the document is very large, I may process it in sections and give you useful partial results rather than pretending everything is perfect.
  • If the pages are out of order, filenames like page-001.jpg, page-002.jpg, etc. help a lot.

Go ahead and upload the .zip file when ready.

This is the file. How well are you able to handle it?

I can handle this ZIP pretty well. I inspected it, and it looks very workable. 👍

Here’s what I found:

  • The ZIP is about 122 MB.
  • It contains 125 JPG page scans, numbered Annuity0000.JPG through Annuity0124.JPG.
  • The pages appear complete with no missing page numbers.
  • Every image is greyscale JPG.
  • Every page is the same size: 2550 × 3299 pixels, which is a good scan size for OCR.
  • The folder structure is nicely organized into sections, such as:
    • Preface to contract
    • Actual contract
    • Contract schedule
    • Riders
    • Original annuity application
    • IRA disclosures
    • Privacy notice
    • Withdrawal forms
    • Required minimum distribution form
    • Annuity values / withdrawal benefits
    • Financial professional info

I tested OCR on several sample pages. The results are good to very good on typed contract pages. For example, the OCR clearly picked up contract schedule details, dates, values, headings, and much of the form text.

The weaker spots will likely be:

  • handwritten entries
  • checkboxes
  • small print in dense forms
  • pages with logos, shading, or gray background noise
  • tables where columns need careful interpretation
  • exact dollar amounts, percentages, contract numbers, and dates that must be verified carefully

So, my practical answer is:

Yes, I can read this ZIP and answer questions about the overall project.
For general summaries, explanations, section-by-section review, and finding key items, it should work well. For legally or financially critical details, I would OCR it, then spot-check against the image pages before treating exact numbers as final.

Best next step would be for you to ask something like:

“Give me a plain-English summary of the whole document.”

or

“Find anything related to withdrawals, penalties, surrender charges, death benefits, and required minimum distributions.”

Gemini: Checking a very Large set of Files in a Zip File

This page describes trying Gemini first for a project involving many scanned grayscale JPG pages inside a ZIP file. Gemini responded that it could not directly unzip the file, run optical character recognition, or read text locked inside image files in that chat interface. Instead, it recommended doing the OCR step first with tools like Adobe Acrobat, Google Drive, Google Docs, Microsoft OneNote, or Word. Gemini then suggested uploading or pasting the extracted text for analysis, summaries, or questions. The main surprise was that Gemini required outside preparation, while the project goal was direct AI review of the zipped scans.

Handling Gemini Large Files

I had a project where I ended up scanning a large number of pages as greyscale .jpg files. My experience is that those files are smaller than color files and tend to be more OCR compatible. I first intended to do the project with Gemini, and then would try out ChatGPT. Here is the prompt I used and what I ended up to – very much to my surprise:

I have scanned in a large document’s pages as greyscale .jpg page files that I am hoping OCR can be used on them. They are in a .zip file. Are you able to read in a .zip file and answer questions about the overall project from my scanned pages?

Gemini AI LogoThe short answer is no, I cannot directly unzip a file, run OCR (Optical Character Recognition) on images, and read the text from them. As an AI, I don’t have a built-in OCR engine to extract text locked inside image files (like .jpg), nor can I extract .zip archives directly in this chat interface.

However, all hope is not lost! You can absolutely still use AI to analyze your project. You just need to handle the OCR step first.

Here is the easiest workflow to get this done so we can look at your project together:

Step 1: Convert the Images to Text (OCR)

You need to turn those .jpg pages into actual, selectable text. Since you have a large document, doing this page-by-page is tedious. Instead, use one of these free, batch methods:

  • Adobe Acrobat / Online OCR Tools: If you combine the JPGs into a single PDF, Adobe Acrobat (or free online tools like Adobe’s web converter, PDF24, or iLovePDF) can run automatic OCR and output a searchable PDF.

  • Google Drive (Free & Highly Accurate): Upload your .jpg files or a combined PDF to Google Drive. Right-click the file, select Open with, and choose Google Docs. Google will automatically run OCR and paste all the extracted text into a new document.

  • Microsoft OneNote/Word: You can insert images into Word or OneNote, right-click, and select “Copy Text from Picture.”

Step 2: Bring the Text to Me

Once you have the text extracted into a readable format (like a .txt file, a Word document, or a searchable .pdf), you can:

  1. Upload the document directly to this chat (if your interface allows file attachments).

  2. Copy and paste key sections or chapters directly into our conversation.

💡 Tip for Large Documents: If the overall document is massive, it’s usually best to feed it to me section by section, or start by pasting the table of contents/executive summary. This helps ensure we don’t hit any character limits and allows us to dissect the project with much higher accuracy.

Once you’ve got the text ready, send it over and I’d be happy to help you analyze the project, summarize it, or answer any specific questions you have!