Saturday, July 31, 2021

selenium scroll

 his is a HTML :

reference: https://michaeljsanders.com/2017/05/12/scrapin-and-scrollin.html

import time
from selenium import webdriver
from bs4 import BeautifulSoup as bs
# I used Firefox; you can use whichever browser you like.
browser = webdriver.Chrome()
# Tell Selenium to get the URL you're interested in.
browser.get("http://URLHERE.com")
# Selenium script to scroll to the bottom, wait 3 seconds for the next batch of data to load, then continue scrolling.  It will continue to do this until the page stops loading new data.
lenOfPage = browser.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    match=False
        while(match==False):
                lastCount = lenOfPage
                time.sleep(3)
                lenOfPage = browser.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
                if lastCount==lenOfPage:
                    match=True
# Now that the page is fully scrolled, grab the source code.
source_data = browser.page_source
# Throw your source into BeautifulSoup and start parsing!
bs_data = bs(source_data)




No comments:

Post a Comment