Monday, June 22, 2020

pyhton error Anaconda3

if you run python see this problem related to Anaconda3
Anaconda3\lib\site-packages\numpy\__init__.py:140: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service

Solution:
go to environment setting and add this in it: C:\Users\username\Anaconda3\Library\bin

Sunday, June 7, 2020

python with database



import sqlite3
conn = sqlite3.connect('D:\\hskio_python.db')
try:
    info = []
    cur = conn.cursor()
    rows = cur.execute('select * from person')
    for row in rows:
        id = row[0]
        hei ght = row[2]
        weight = row[3]
        bmi = round(weight/height**2, 2)
        print(id, height, weight, bmi)
        info.append([bmi, id])
    for data in info:
        cur.execute('update person set bmi=%d where id=%d' % (data[0], data[1]))
    conn.commit()
    
finally:
    conn.close()


beautifulsoup example

 beautifulsoup example 

Find and Findall Parameter:


findAll(tag, attributes, recursive, text, limit, keywords) find(tag, attributes, recursive, text, keywords)

Find Parameter: 
 beautifulsoup example  Find h1 tag
import requests
from bs4 import BeautifulSoup
resp=requests.get("https://code-gym.github.io/spider_demo/")
soup=BeautifulSoup(resp.text, 'html5lib')
print(soup.find('h1'))

beautifulsoup print with tag and without

With tag 
print(soup.find('h1'))

Without tag 
print(soup.h1)

Findall parameter: 


for h3 in soup.find_all('h3'): print(h3)

Find class name
for title in soup.find_all('h3','post-title'): print(title)


beautifulsoup crawl ptt sock

去ptt 股票爬文章


import requests
from bs4 import BeautifulSoup
import time
today = time.strftime('%m/%d').lstrip('0')

def ptt(url):
    resp = requests.get(url)
    if resp.status_code != 200:
        print('URL發生錯誤:' + url)
        return
    soup = BeautifulSoup(resp.text, 'html5lib')
    paging = soup.find('div', 'btn-group btn-group-paging').find_all('a')[1]['href']

    articles = []
    rents = soup.find_all('div', 'r-ent')

    for rent in rents:
        title = rent.find('div', 'title').text.strip()
        count = rent.find('div', 'nrec').text.strip()
        date = rent.find('div', 'meta').find('div', 'date').text.strip()
        article = '%s %s:%s' % (date, count, title)

        try:
            if today == date and int(count) > 10:
                articles.append(article)
        except:
            if today == date and count == '爆':
                articles.append(article)
    if len(articles) != 0:
        for article in articles:
            print(article)
        ptt('https://www.ptt.cc' + paging)
    else:
        return

ptt('https://www.ptt.cc/bbs/Stock/index.html')

Thursday, June 4, 2020

python changed pip

if our install pyton2 and python3 on your PC, it might used python2's pip. you can used the command

pip --version
D:\selenium-3.141.0.tar\dist\selenium-3.141.0>pip --version
pip 20.1.1 from c:\python27\lib\site-packages\pip (python 2.7)

you can also do like this :
#python36\Scripts\pip.exe install packagename
Example:
C:\python37\Scripts>pip3.exe install packagename

reference:
https://stackoverflow.com/questions/39851566/using-pip-on-windows-installed-with-both-python-2-7-and-3-5
https://stackoverflow.com/questions/40832533/pip-or-pip3-to-install-packages-for-python-3

selenium problem

This is a interesting topic and funny thing about selenium, after surfing on the net, i find this article which really solve the problem.


Problem: Used pip to install selenium and show install success. But module ONLY work on Python2 BUT Python3 DON'T work. Sound really strange, isn't.  
Solution: So just download selenium package and manual install. 
How: Just extract the file and go to the directory and used the command will install:
python
Conclusion is we have to manual install selenium . 



Wednesday, June 3, 2020

Seliunm

Chrome diver: chromedriver
https://chromedriver.chromium.org/downloads

Firefox driver: geckodriver
https://github.com/mozilla/geckodriver/releases

Basic Selenium
from selenium import webdriver
browser=webdriver.Chrome('D:\\chromedriver.exe')
browser.get('http://google.com')
browser.quit() 

Selenium with beautifulsoup example 1: will pop chrome 


from selenium import webdriver
from bs4 import BeautifulSoup
try:
    chrome=webdriver.Chrome(executable_path='D:\\CHROME_DRIVER\\chromedriver.exe')
    chrome.set_page_load_timeout(10)
    chrome.get('https://code-gym.github.io/spider_demo/')
    soup = BeautifulSoup(chrome.page_source, 'html5lib')
    print(soup.find('h1').text)
finally:
browser.quit() 


Selenium with beautifulsoup example 2: will run chrome at daemon
from selenium import webdriver
from bs4 import BeautifulSoup
try:
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')     
    chrome=webdriver.Chrome(options=options,executable_path='D:\\CHROME_DRIVER\\chromedriver.exe')
    chrome.set_page_load_timeout(10)
    chrome.get('https://code-gym.github.io/spider_demo/')
    soup = BeautifulSoup(chrome.page_source, 'html5lib')
    print(soup.find('h1').text)
finally:
browser.quit() 

Selenium with beautifulsoup using xpath to find related article 


from selenium import webdriver
from bs4 import BeautifulSoup
try:
.    options = webdriver.ChromeOptions()
    ..........................
    ..........................
    ..........................
  print(soup.find('h1').text)
  chrome.find_element_by_xpath('/html/body/div[2]/div/div[1]/div[1]/div/div/h3/a').click(
  print(chrome.find_element_by_xpath('//*[@id="post-header"]/div[2]/div/div/h1').text)
finally:
browser.quit()