WebApp - 셀레니움(With python)

1. python 설치

- google 검색 : python -> python.org 최신버전 다운로드

2. pyhthon venv 가상환경 설치

- google 검색 : pyhthon venv 가상환경

- VsCode 터미널 명령 : pythob -m venv selenium (selenium이라는 이름의 가상환경을 만들겠다)

- selenium이라는 폴더가 생성되고

- cd selenuium/Scripts 로 디렉토리르르 변경하고 activate 명령어를 입력하면 가상환경으로 들어옴

- activate 명령어로 가상환경으로 들어오면 터미널 디렉토리표시 앞에 (selenium)이 생김

- 현재 (selenium)이라는 가상환경안에 있고 이 상태에서는 다른 프로젝트와 독립적인 공간이라는 것

- 여기서 pip install selenium으로 셀레니움패키지를 설치

3. chromedriver설치

-나의 버전 확인 후 (우측상단 땡땡이 3개 - 도움말 - chrome정보 )chromedriver 다운로드 홈페이지에서 맞는 버전 다운로드

- 설치된 chromedriver.exe를 위 예제사진과 같이 selenium폴더에 드래그 앤 드랍

*여기까지 기본셋팅 완료

4. 이제 코드를 써보자

(1) - 좌측하단에서 내가 설정한 selenium가상환경으로 환경을 맞추어 주고

(2) - 기존 webdriver.FireFox()에서 webdriver.Chrome()으로 변경

- python google.py 를 실행하면 driever.get()안의 주소로 Chrome이 실행되는 것을 볼 수 있다

(3) - 구글이미지 검색창에 배수지 검색 -> 나오는 작은 이미지 여러개를 찾고 -> 그 중 첫번째 작은 임지를 찾아서 클릭

-> 클릭해서 나온 큰 이미지를 찾고 해당 이미지의 src 주소를 찾아 -> jpg 파일로 저장

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import urllib.request

driver = webdriver.Chrome()
driver.get("https://www.google.co.kr/imghp?hl=ko")
elem = driver.find_element_by_name("q") ##구글 검색창 선택
elem.send_keys("배수지") ##선택한 구글 검색창에 키워드입력
elem.send_keys(Keys.RETURN) ## 엔터
driver.find_elements_by_class_name('rg_i.Q4LuWd')[0].click() # = driver.find_elements_by_css_selector(".rg_i.Q4LuWd")[0].click()
#클래스이름으로 엘리멘트들을 모두 찾아 list에 담고 0번째 원소를 찾아서 클릭
time.sleep(3) ## 클릭 후 이미지가 로딩될때 까지 3초의 시간을 준 후
imageSrc = driver.find_element_by_css_selector(".n3VNCb").get_attribute("src") ## 로딩된 큰 이미지 엘리먼트를 찾고 src 주소를 찾음
urllib.request.urlretrieve(imageSrc, "text.jpg")## 찾은 src 주소를 실행중인 현제 디렉토리에 이미지파일로 저장

* find_element 방법 공식 문서 -> 출처 : selenium-python.readthedocs.io/locating-elements.html

* find_element 방법 더 구체적이고 이해하기 쉬운 설명

-> 출처 : greeksharifa.github.io/references/2020/10/30/python-selenium-usage/

* 공백이 들어간 class이름을 가져올 때 -> 출처 : soraji.github.io/web/2019/05/27/cssSelector/

(4) for문을 사용하여 반복문 돌리기

(5) - 이미지를 더 많이 로딩하기 위해 스크롤을 내려 로딩 시키는 법

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import urllib.request

driver = webdriver.Chrome()
driver.get("https://www.google.co.kr/imghp?hl=ko")
elem = driver.find_element_by_name("q") 
elem.send_keys("배수지") 
elem.send_keys(Keys.RETURN)

SCROLL_PAUSE_SEC = 2 # 스크롤하고 로딩 기다리는 시간
# 스크롤 높이 가져옴
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # 끝까지 스크롤 다운
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # 2초 대기
    time.sleep(SCROLL_PAUSE_SEC)

    # 스크롤 다운 후 스크롤 높이 다시 가져옴
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        try: #try구문을 실행하다 오류가 나면 즉 스크롤을 다 내렸는데 결과더보기 란이 없으면
            driver.find_element_by_xpath('//*[@id="islmp"]/div/div/div/div/div[5]/input').click()#결과더보기란 클릭
        except: # except 구문을 실행
            break #반복문 빠져나감
    last_height = new_height

(6) - 최종본

(find_element_by_class -> find_element_by_Xpath로 수정)

(try: ~ except: pass 코드 추가)

(driver.close() 코드 추가)

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import urllib.request

driver = webdriver.Chrome()
driver.get("https://www.google.co.kr/imghp?hl=ko")
elem = driver.find_element_by_name("q") 
elem.send_keys("배수지") 
elem.send_keys(Keys.RETURN)

SCROLL_PAUSE_SEC = 2 # 스크롤하고 로딩 기다리는 시간
# 스크롤 높이 가져옴
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # 끝까지 스크롤 다운
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # 2초 대기
    time.sleep(SCROLL_PAUSE_SEC)

    # 스크롤 다운 후 스크롤 높이 다시 가져옴
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        try: #try구문을 실행하다 오류가 나면 즉 스크롤을 다 내렸는데 결과더보기 란이 없으면
            driver.find_element_by_xpath('//*[@id="islmp"]/div/div/div/div/div[5]/input').click()#결과더보기란 클릭
        except: # except 구문을 실행
            break #반복문 빠져나감
    last_height = new_height
    

images = driver.find_elements_by_class_name('rg_i.Q4LuWd')
count = 1
for image in images:
    try:
        image.click()
        time.sleep(3) 
        imageSrc = driver.find_element_by_xpath('//*[@id="Sva75c"]/div/div/div[3]/div[2]/c-wiz/div[1]/div[1]/div/div[2]/a/img').get_attribute("src")
                                                 #기존 클래스로 요소찾기에서 Xpath로 요소찾기로 바꿈
        urllib.request.urlretrieve(imageSrc, str(count) + ".jpg")
        count = count + 1
    except: #혹시 모를 오류에 대비해서 오류가 나면 해당 이미지다운작업은 넘어가자
        pass

driver.close() # 종료하면 브라우저를 닫음

*기존에는 큰이미지를 찾을때 클래스(n3VNCb)로 요소찾기를 했는데 브라우저 콘솔창에서 JavaScript로

해당 클래스이름을 가진 태그를 찾아보니 3개가 나왔다.. 정확하게 원하는 요소만 찾기위해

Xpath를 사용하자

Xpath 사용법 - 요소검사로 원하는 큰이미지 클릭하고 Element창에서 우클릭 -> copy -> copy Xpath

저작자표시

'WebApp > WebApp' 카테고리의 다른 글

WebApp_동물상앱 - 웹어플 디자인(최종) (0)	2021.01.01
WebApp_동물상앱 - 웹어플 디자인(2) (0)	2020.12.31
WebApp_동물상앱 - 웹어플 디자인(1)( with. Zeplin, BootStrap) (0)	2020.12.31
WebApp_동물상앱 - Teachable Machine&GoormIDE (0)	2020.12.28
WebApp - 크롤링(With Python) (0)	2020.12.22

개발자노트

WebApp - 셀레니움(With python)

1. python 설치

2. pyhthon venv 가상환경 설치

3. chromedriver설치

4. 이제 코드를 써보자

'WebApp > WebApp' 카테고리의 다른 글

티스토리툴바

WebApp - 셀레니움(With python)

1. python 설치

2. pyhthon venv 가상환경 설치

3. chromedriver설치

4. 이제 코드를 써보자

'WebApp > WebApp' 카테고리의 다른 글

'WebApp/WebApp' Related Articles

티스토리툴바