Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


MadEngineer

4591 posts

Uber Geek
+1 received by user: 2570

Trusted

#302932 4-Jan-2023 19:11
Send private message

I'm not sure if it's a fault of my installation, the version I'm using or just newbie fails but all the examples of using the pdf modules to open a table and export to Excel aren't working and give errors

 

for example:

 

[code]

 

import openpyxl
import tabula

 

# Read in the PDF file
df = tabula.read_pdf("table.pdf", pages="all")

 

# Create a new Excel file
wb = openpyxl.Workbook()
ws = wb.active

 

# Iterate through the rows and columns of the dataframe and write the values to the Excel worksheet
for r in df.index.tolist():
    for c in df.columns:
        ws.cell(row=r+1, column=c+1).value = df.iloc[r, c]

 

# Save the Excel file
wb.save("table.xlsx")

 

[/code]

 

Gives the error on the first for line

 

AttributeError: 'builtin_function_or_method' object has no attribute 'tolist'

 

 

 

Examples from this page seem to be OK 

 

Scraping Tables from PDF Files Using Python | Towards Data Science

 

... however only after I removed the all=true option

 

What's at fault here?





You're not on Atlantis anymore, Duncan Idaho.

Create new topic
ezbee
2651 posts

Uber Geek
+1 received by user: 3089


  #3017022 4-Jan-2023 19:54
Send private message

Hi, I, Um, asked ChatGPT.

This could be rubbish, but see what you think.

 

""

 


This error is occurring because df.index is returning a built-in function or method (i.e. range or enumerate), rather than a list-like object. This is causing the tolist() method to raise an AttributeError, as built-in functions and methods do not have a tolist() attribute.

 

To fix this error, you can try using df.index.to_list() instead of df.index.tolist(). This will convert the index of the dataframe to a list, which should allow you to iterate over it in the for loop.

 

Alternatively, you can try iterating over the rows of the dataframe directly using for r in df.iterrows():, which should work regardless of the type of the dataframe's index.

 

""




MadEngineer

4591 posts

Uber Geek
+1 received by user: 2570

Trusted

  #3017040 4-Jan-2023 20:48
Send private message

ROLF ^ that's funny because it's chatgpt that led me down this path.  I tried different iterations based on such suggestions as you've provided - you can even reply to it with, "the above example gave error x" and it will reply back saying something to the effect of "that's likely due to the line x and here's an explanation. It needs to be changed to this...  Here's a new version of the code ...." but that will have yet another error.





You're not on Atlantis anymore, Duncan Idaho.

marpada
487 posts

Ultimate Geek
+1 received by user: 182


  #3017045 4-Jan-2023 21:06
Send private message

Actually I think the issue is that tabula.read_pdf doesn't return a Panda dataframe but a list of dataframes. That article might be very outdated so you want to start checking the examples in the offical project page like https://nbviewer.org/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb. (see on step 8 how it picks the first item of list df[0])

 

 

 

 

 

 

 

 




darylblake
1172 posts

Uber Geek
+1 received by user: 410

Trusted

  #3017196 5-Jan-2023 09:29
Send private message

is there any way of checking 

 

 

 

for r in df.index.tolist():

I think df is the dataframe, and its probably some sort of collection/array/dictionary. 

e.g.

 

df[0].tolist() might provide you with more luck perhaps? 

if thats the case then you want to take the len(df) and loop through it, calling to list. to get a list of the columns? 

I never used this library, but heres some examples

 

 

 

https://nbviewer.org/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb


ezbee
2651 posts

Uber Geek
+1 received by user: 3089


  #3017232 5-Jan-2023 11:03
Send private message

Oh heck, sorry for the diversion down dead-end lane.


MadEngineer

4591 posts

Uber Geek
+1 received by user: 2570

Trusted

  #3018469 7-Jan-2023 23:44
Send private message

Thanks for the replies.  I'll check these out when I next can.

 

For some of the other code I've been running, (eg, "A csv file is without a header row is located at c:\temp\stolenvehicles.csv and the first column contains the plate numbers of stolen vehicles. The next seven columns contain information about the stolen vehicle in order of colour, make, model, year, stolen date and area. Generate some python code that monitors the webcam for text and checks the csv file to see if the vehicle is stolen, while displaying the image from the webcam and any detected text that it is then checking the csv file for"* and "create some code in python that analyses fine movement in a webcam to detect one's heart rate") sometimes I've found it can come down to the versions of the modules installed.  Asking the bot for solutions doesn't usually work.  It did list a bunch of solutions once including the requirement of a particular version and how to update it.

 

 

 

*New Zealand Police stolen vehicles page | New Zealand Police





You're not on Atlantis anymore, Duncan Idaho.

Create new topic








Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.