Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




242 posts

Master Geek
+1 received by user: 58


Topic # 129445 16-Sep-2013 11:54
Send private message

Hi Guys,

In the below code - I am grabbing data from an API. It iterates through 20 cities, and each city has several pages, each page being a seperate xml document which is defined in the url.

The code does the following:

1) Grabs the xml
2) Checks how many pages there are ($totalpages)
3) Iterates through the pages while the current page number is <= to the total number of pages.


The issue I am having, is that the 2nd while loop works fine. But the program isn't iterating through the city numbers - it stops at 1 after going through all the pages for city number 1, and doesn't go on to city number 2.

I have not used ++ to increase variable citynumber and pagenumber as for some reason, that did not work (Ie: It didnt increase the variable by 1 - any hints on why that might be would be appreciated :)

The sleep() is issued due to flood control by the API server. It's quite strict so hence waiting over a minute for each iteration.

I am quite a PHP beginner, so please dont laugh :)

So - any reason why citynumber is not increasing?

Cheers



<?

$citynumber = 1;

while ($citynumber <= 20)
{

        $xml_feed = 'http://somewebsite/api/data.xml?city=$citynumber';
       
        // CURL stuff ommitted for brevity

        $pagenumber = (Current page number pulled from above xml document);
        $totalpages = (Number pulled from above xml document);

        while($pagenumber <= $totalpages)

        {
                $xml_feed = "http://somewebsite/api/data.xml?city=$citynumber&page=$pagenumber";
 
                // CURL stuff omitted

                foreach ($blahblah->Table as $stuff)
                {
                        // whole bunch of stuff to pull xml data - not related to while loops.
                };

              
                $pagenumber = $pagenumber + 1;

                sleep(70);

        };

$citynumber = $citynumber + 1;
};

?>

Create new topic
597 posts

Ultimate Geek
+1 received by user: 98


  Reply # 896314 16-Sep-2013 12:12
Send private message

sleep(70) - sleep for 70 seconds? My guess is your script is hitting "max_execution_time" - 30 seconds for a default PHP install.

395 posts

Ultimate Geek
+1 received by user: 32


  Reply # 896316 16-Sep-2013 12:17
Send private message

At a quick glance it doesn't look so bad. I would try throw some debug echo statements throughout just to see what's going on.

Maybe throw an echo $citynumber . '<br />'; to make sure it's actually doing the loop. Also try comment out the second while loop altogether in order to see if something in there is causing it to break out and fail. In other words just dumb it down to smaller processes to try and find the problem.

Try for loops instead.

<?php

for ($citynumber == 1; $citynumber <= 20; $citynumber++) {

 

   $xml_feed = 'http://somewebsite/api/data.xml?city=$citynumber';
       
   // CURL stuff ommitted for brevity

   $pagenumber = (Current page number pulled from above xml document);
   $totalpages = (Number pulled from above xml document);

      for ($pagenumber; $pagenumber <= $totalpages; $pagenumber++) {

 

         ...
  
     }

}

?>




Web development blog: http://www.devhour.net
Follow me on twitter: @JAGracie

 
 
 
 


395 posts

Ultimate Geek
+1 received by user: 32


  Reply # 896318 16-Sep-2013 12:17
Send private message

Kraven: sleep(70) - sleep for 70 seconds? My guess is your script is hitting "max_execution_time" - 30 seconds for a default PHP install.


Actually yes this is probably the answer. Good spot.




Web development blog: http://www.devhour.net
Follow me on twitter: @JAGracie

1271 posts

Uber Geek
+1 received by user: 136


  Reply # 896331 16-Sep-2013 12:31
Send private message

Without being able to pinpoint the exact issue, there is a small logic issue.

The technique you are using is called a 'read-ahead' and the algorithm is as follows:

1. Set ignitial condition
2. Prefetch data (the read-ahead)
3. while(condition != termination condition)
{


4. Execute logic
5. fetch next bit of data
6. re-evaluate condition
}

In your example you have step 4 and 5 around the wrong way which is resulting in your code overwriting $xml-feed before it is used.

my suggestion is since the two references to $xml_feed are referring to different things (cities versus pages in cities) - I would refer to different variables $xml_city_feed and $xml_pages_in_cities_feed - dependant on what your logic routines require.

Also a way to debug is to simply comment out the logic parts of the loop and just run with the iterators and a print statement. Check that these are working and incrementing correctly, then place back each line of logic one or group at a time. This technique is similar to a binary chop, but you are excluding the area with the highest cyclomatic complexity (the bit most likely to contain the defect) first.




Software Engineer

 


395 posts

Ultimate Geek
+1 received by user: 32


  Reply # 896333 16-Sep-2013 12:41
Send private message

TwoSeven: Without being able to pinpoint the exact issue, there is a small logic issue.

The technique you are using is called a 'read-ahead' and the algorithm is as follows:

1. Set ignitial condition
2. Prefetch data (the read-ahead)
3. while(condition != termination condition)
{


4. Execute logic
5. fetch next bit of data
6. re-evaluate condition
}

In your example you have step 4 and 5 around the wrong way which is resulting in your code overwriting $xml-feed before it is used.

my suggestion is since the two references to $xml_feed are referring to different things (cities versus pages in cities) - I would refer to different variables $xml_city_feed and $xml_pages_in_cities_feed - dependant on what your logic routines require.

Also a way to debug is to simply comment out the logic parts of the loop and just run with the iterators and a print statement. Check that these are working and incrementing correctly, then place back each line of logic one or group at a time. This technique is similar to a binary chop, but you are excluding the area with the highest cyclomatic complexity (the bit most likely to contain the defect) first.


Where is $xml_feed being overwritten before it is being used? Sorry, I might just be having a brain fade but it technically looks ok to me? However, I do agree that using different variable names would be best.




Web development blog: http://www.devhour.net
Follow me on twitter: @JAGracie



242 posts

Master Geek
+1 received by user: 58


  Reply # 896336 16-Sep-2013 13:01
Send private message

Thanks for the ideas guys. Will go through it tonight when I have some time. And yea I'll change the xml_feed names to different names just be to safe - no harm in doing so.

Regards the time limit being hit - I don't think that is the case because the script works fine on the 50+ pages that the first city has - it goes for a few hours. It just doesn't move onto the next city. Im open to other ideas to slow the iteration rate though :)

1271 posts

Uber Geek
+1 received by user: 136


  Reply # 896453 16-Sep-2013 18:04
Send private message

Kingy:
TwoSeven: Without being able to pinpoint the exact issue, there is a small logic issue.

The technique you are using is called a 'read-ahead' and the algorithm is as follows:

1. Set ignitial condition
2. Prefetch data (the read-ahead)
3. while(condition != termination condition)
{


4. Execute logic
5. fetch next bit of data
6. re-evaluate condition
}

In your example you have step 4 and 5 around the wrong way which is resulting in your code overwriting $xml-feed before it is used.

my suggestion is since the two references to $xml_feed are referring to different things (cities versus pages in cities) - I would refer to different variables $xml_city_feed and $xml_pages_in_cities_feed - dependant on what your logic routines require.

Also a way to debug is to simply comment out the logic parts of the loop and just run with the iterators and a print statement. Check that these are working and incrementing correctly, then place back each line of logic one or group at a time. This technique is similar to a binary chop, but you are excluding the area with the highest cyclomatic complexity (the bit most likely to contain the defect) first.


Where is $xml_feed being overwritten before it is being used? Sorry, I might just be having a brain fade but it technically looks ok to me? However, I do agree that using different variable names would be best.

First line after the second while loop.  The object is re-queried with a different set of parameters, so technically one is getting a new object and there is no operation being performed on the old one (apart from getting your counts).  I would assign this second object to a different variable - it does mean you have two handles to the same rough piece of data, but it signals that the intent to use them for different purposes is clear.

Also, in the code that is not shown - your business logic, ensure that you are not updating the original data (in other words, accidentally changing your numpages and page count)




Software Engineer

 


395 posts

Ultimate Geek
+1 received by user: 32


  Reply # 896690 17-Sep-2013 08:07
Send private message

TwoSeven:
Kingy:
TwoSeven: Without being able to pinpoint the exact issue, there is a small logic issue.

The technique you are using is called a 'read-ahead' and the algorithm is as follows:

1. Set ignitial condition
2. Prefetch data (the read-ahead)
3. while(condition != termination condition)
{


4. Execute logic
5. fetch next bit of data
6. re-evaluate condition
}

In your example you have step 4 and 5 around the wrong way which is resulting in your code overwriting $xml-feed before it is used.

my suggestion is since the two references to $xml_feed are referring to different things (cities versus pages in cities) - I would refer to different variables $xml_city_feed and $xml_pages_in_cities_feed - dependant on what your logic routines require.

Also a way to debug is to simply comment out the logic parts of the loop and just run with the iterators and a print statement. Check that these are working and incrementing correctly, then place back each line of logic one or group at a time. This technique is similar to a binary chop, but you are excluding the area with the highest cyclomatic complexity (the bit most likely to contain the defect) first.


Where is $xml_feed being overwritten before it is being used? Sorry, I might just be having a brain fade but it technically looks ok to me? However, I do agree that using different variable names would be best.

First line after the second while loop.  The object is re-queried with a different set of parameters, so technically one is getting a new object and there is no operation being performed on the old one (apart from getting your counts).  I would assign this second object to a different variable - it does mean you have two handles to the same rough piece of data, but it signals that the intent to use them for different purposes is clear.

Also, in the code that is not shown - your business logic, ensure that you are not updating the original data (in other words, accidentally changing your numpages and page count)


Sorry I worded that question wrong. Yes I can see that it's getting overwritten but in it's context it doesn't really matter. The $xml_feed in the first while loop is only used to get $pagenumbers and $totalpages. Once the logic is worked out to set those two variables I'm pretty sure it doesn't matter what happens to $xml_feed. Therefore it goes into the 2nd while loop, resets the $xml_feed to what it needs over and over. Rinse and repeat with the 1st while loop.




Web development blog: http://www.devhour.net
Follow me on twitter: @JAGracie

Create new topic



Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.