Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


MadEngineer

4239 posts

Uber Geek

Trusted

#302932 4-Jan-2023 19:11
Send private message

I'm not sure if it's a fault of my installation, the version I'm using or just newbie fails but all the examples of using the pdf modules to open a table and export to Excel aren't working and give errors

 

for example:

 

[code]

 

import openpyxl
import tabula

 

# Read in the PDF file
df = tabula.read_pdf("table.pdf", pages="all")

 

# Create a new Excel file
wb = openpyxl.Workbook()
ws = wb.active

 

# Iterate through the rows and columns of the dataframe and write the values to the Excel worksheet
for r in df.index.tolist():
    for c in df.columns:
        ws.cell(row=r+1, column=c+1).value = df.iloc[r, c]

 

# Save the Excel file
wb.save("table.xlsx")

 

[/code]

 

Gives the error on the first for line

 

AttributeError: 'builtin_function_or_method' object has no attribute 'tolist'

 

 

 

Examples from this page seem to be OK 

 

Scraping Tables from PDF Files Using Python | Towards Data Science

 

... however only after I removed the all=true option

 

What's at fault here?





You're not on Atlantis anymore, Duncan Idaho.

Create new topic
ezbee
2369 posts

Uber Geek


  #3017022 4-Jan-2023 19:54
Send private message

Hi, I, Um, asked ChatGPT.

This could be rubbish, but see what you think.

 

""

 


This error is occurring because df.index is returning a built-in function or method (i.e. range or enumerate), rather than a list-like object. This is causing the tolist() method to raise an AttributeError, as built-in functions and methods do not have a tolist() attribute.

 

To fix this error, you can try using df.index.to_list() instead of df.index.tolist(). This will convert the index of the dataframe to a list, which should allow you to iterate over it in the for loop.

 

Alternatively, you can try iterating over the rows of the dataframe directly using for r in df.iterrows():, which should work regardless of the type of the dataframe's index.

 

""


 
 
 

Free kids accounts - trade shares and funds (NZ, US) with Sharesies (affiliate link).
MadEngineer

4239 posts

Uber Geek

Trusted

  #3017040 4-Jan-2023 20:48
Send private message

ROLF ^ that's funny because it's chatgpt that led me down this path.  I tried different iterations based on such suggestions as you've provided - you can even reply to it with, "the above example gave error x" and it will reply back saying something to the effect of "that's likely due to the line x and here's an explanation. It needs to be changed to this...  Here's a new version of the code ...." but that will have yet another error.





You're not on Atlantis anymore, Duncan Idaho.

marpada
474 posts

Ultimate Geek


  #3017045 4-Jan-2023 21:06
Send private message

Actually I think the issue is that tabula.read_pdf doesn't return a Panda dataframe but a list of dataframes. That article might be very outdated so you want to start checking the examples in the offical project page like https://nbviewer.org/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb. (see on step 8 how it picks the first item of list df[0])

 

 

 

 

 

 

 

 




darylblake
1157 posts

Uber Geek

Trusted

  #3017196 5-Jan-2023 09:29
Send private message

is there any way of checking 

 

 

 

for r in df.index.tolist():

I think df is the dataframe, and its probably some sort of collection/array/dictionary. 

e.g.

 

df[0].tolist() might provide you with more luck perhaps? 

if thats the case then you want to take the len(df) and loop through it, calling to list. to get a list of the columns? 

I never used this library, but heres some examples

 

 

 

https://nbviewer.org/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb


ezbee
2369 posts

Uber Geek


  #3017232 5-Jan-2023 11:03
Send private message

Oh heck, sorry for the diversion down dead-end lane.


MadEngineer

4239 posts

Uber Geek

Trusted

  #3018469 7-Jan-2023 23:44
Send private message

Thanks for the replies.  I'll check these out when I next can.

 

For some of the other code I've been running, (eg, "A csv file is without a header row is located at c:\temp\stolenvehicles.csv and the first column contains the plate numbers of stolen vehicles. The next seven columns contain information about the stolen vehicle in order of colour, make, model, year, stolen date and area. Generate some python code that monitors the webcam for text and checks the csv file to see if the vehicle is stolen, while displaying the image from the webcam and any detected text that it is then checking the csv file for"* and "create some code in python that analyses fine movement in a webcam to detect one's heart rate") sometimes I've found it can come down to the versions of the modules installed.  Asking the bot for solutions doesn't usually work.  It did list a bunch of solutions once including the requirement of a particular version and how to update it.

 

 

 

*New Zealand Police stolen vehicles page | New Zealand Police





You're not on Atlantis anymore, Duncan Idaho.

Create new topic





News and reviews »

Māori Artists Launch Design Collection with Cricut ahead of Matariki Day
Posted 15-Jun-2025 11:19


LG Launches Upgraded webOS Hub With Advanced AI
Posted 15-Jun-2025 11:13


One NZ Satellite IoT goes live for customers
Posted 15-Jun-2025 11:10


Bolt Launches in New Zealand
Posted 11-Jun-2025 00:00


Suunto Run Review
Posted 10-Jun-2025 10:44


Freeview Satellite TV Brings HD Viewing to More New Zealanders
Posted 5-Jun-2025 11:50


HP OmniBook Ultra Flip 14-inch Review
Posted 3-Jun-2025 14:40


Flip Phones Are Back as HMD Reimagines an Iconic Style
Posted 30-May-2025 17:06


Hundreds of School Students Receive Laptops Through Spark Partnership With Quadrent's Green Lease
Posted 30-May-2025 16:57


AI Report Reveals Trust Is Key to Unlocking Its Potential in Aotearoa
Posted 30-May-2025 16:55


Galaxy Tab S10 FE Series Brings Intelligent Experiences to the Forefront with Premium, Versatile Design
Posted 30-May-2025 16:14


New OPPO Watch X2 Launches in New Zealand
Posted 29-May-2025 16:08


Synology Premiers a New Lineup of Advanced Data Management Solutions
Posted 29-May-2025 16:04


Dyson Launches Its Slimmest Vaccum Cleaner PencilVac
Posted 29-May-2025 15:50


OPPO Reno13 Pro 5G Review 
Posted 29-May-2025 15:33









Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.







GoodSync is the easiest file sync and backup for Windows and Mac