Implementasi Web Scraping untuk Pengambilan Data Pada Website E-Commerce
Abstract
Data is a new mind; data is gold; data is a new mine. This is a parable that was as familiar in the digital era as it is today. Data can be utilized, among other things, to improve operational efficiency, spur innovation in a business, understand user needs, and encourage decision-making. Data from e-commerce can be collected in a variety of ways, including web scraping techniques utilizing Python and libraries like Selenium, Beautifulsoup, and Time. The web scraping methods at Shopee and Tokopedia used in this study are HTML Parsing and CSS Selector. This study found that the HTML Parsing and CSS Selector methods cannot be used for web scraping on Shopee owing to bot and CAPTCHA detection mechanisms. However, this method was successfully used on Tokopedia. We conducted 10 web scraping attempts on two product pages, which resulted in around 160–166 data points each time, with 3–18 duplications of data. The average execution time of a program is 1 minute and 0.5 seconds.
Downloads
References
P. Thota and E. Ramez, “Web Scraping of COVID-19 News Stories to Create Datasets for Sentiment and Emotion Analysis,” ACM Int. Conf. Proceeding Ser., pp. 306–314, 2021, doi: 10.1145/3453892.3461333.
E. S. Sulistiyawati and A. Widayani, “Marketplace Shopee Sebagai Media Promosi Penjualan UMKM di Kota Blitar,” J. Pemasar. Kompetitif, vol. 4, no. 1, p. 133, 2020, doi: 10.32493/jpkpk.v4i1.7087.
A. Ahdiat, “Ini Pertumbuhan Pengunjung Tokopedia sampai Kuartal II 2022,” databoks, 2022. https://databoks.katadata.co.id/datapublish/2022/11/21/ini-pertumbuhan-pengunjung-tokopedia-sampai-kuartal-ii-2022
T. Rizaldi and H. Arief, “Perbandingan Metode Web Scraping Menggunakan CSS Selector dan Xpath Selector,” Teknika, vol. 6, no. 1, pp. 43–46, 2017, doi: 10.34148/teknika.v6i1.56.
M. Afdhal, V. Ariandi, and R. Rita, “Memprediksi Penjualan Pada Toko Hanifah Metode C.45,” J. Teknol. Dan Sist. Inf. Bisnis, vol. 4, no. 2, pp. 248–255, 2022, doi: 10.47233/jteksis.v4i1.460.
N. N. Hasanah and A. S. Purnomo, “Implementasi Data Mining Untuk Pengelompokan Buku Menggunakan Algoritma K-Means Clustering (Studi Kasus : Perpustakaan Politeknik LPP Yogyakarta),” J. Teknol. Dan Sist. Inf. Bisnis, vol. 4, no. 2, pp. 300–311, 2022, doi: 10.47233/jteksis.v4i2.499.
R. Gunawan, A. Rahmatulloh, I. Darmawan, and F. Firdaus, “Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath,” vol. 2, no. IcoIESE 2018, pp. 283–287, 2019, doi: 10.2991/icoiese-18.2019.50.
I. Onyenwe, E. Onyedinma, C. Nwafor, and O. Agbata, “Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique,” Int. J. Nat. Lang. Comput., vol. 10, no. 5, pp. 01–07, 2021, doi: 10.5121/ijnlc.2021.10501.
E. Uzun, “A regular expression generator based on CSS selectors for efficient extraction from HTML pages,” Turkish J. Electr. Eng. Comput. Sci., vol. 28, no. 6, pp. 3389–3401, 2020, doi: 10.3906/ELK-2004-67.
M. El Asikri, S. Krit, and H. Chaib, “Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy,” Eur. J. Transl. Clin. Med., vol. 07, no. 03, pp. 433–442, 2020, [Online]. Available: https://www.researchgate.net/publication/346215371
D. B. Pratama, A. Sofwan, and Y. A. A. Soetrisno, “Implementasi Teknik Web Scraping dan Fitur Data Eksternal pada Sistem Informasi Dosen Penelitian dan Pengabdian Dosen Fakultas Teknik Universitas Diponegoro,” vol. 10, no. 2, pp. 292–299, 2021.
L. C. Dewi, Meiliana, and A. Chandra, “Social media web scraping using social media developers API and regex,” Procedia Comput. Sci., vol. 157, pp. 444–449, 2019, doi: 10.1016/j.procs.2019.08.237.
S. Han and C. K. Anderson, “Web Scraping for Hospitality Research: Overview, Opportunities, and Implications,” Cornell Hosp. Q., vol. 62, no. 1, pp. 89–104, 2021, doi: 10.1177/1938965520973587.
M. Elveny, S. M. Hardi, I. Jaya, and P. Gundari, “Web-based E-Commerce Products Grouping,” J. Phys. Conf. Ser., vol. 1898, no. 1, 2021, doi: 10.1088/1742-6596/1898/1/012018.
A. Purnomo, “Impementasi Web Scraping Pada OJS Dengan Metode CSS Selector,” REDOLUSI Rekayasa Tek. Inform. dan Inf., vol. 3, no. 2, pp. 176–191, 2022.
S. Nyamathulla, P. Ratnababu, N. S. Shaik, and B. L. N, “A Review on Selenium Web Driver with Python,” Ann. Rom. Soc. Cell Biol., vol. 25, no. 4, pp. 16760–16768, 2021, [Online]. Available: http://annalsofrscb.ro
M. Levi, H. N. Palit, and S. Rostianingsih, “Perbandingan Performa Tools Web Scraping pada Website dengan Data Statis dan Dinamis,” 2020.
A. Nagpal and G. Gabrani, “Python for Data Analytics, Scientific and Technical Applications,” Proc. - 2019 Amity Int. Conf. Artif. Intell. AICAI 2019, pp. 140–145, 2019, doi: 10.1109/AICAI.2019.8701341.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under an Attribution 4.0 International (CC BY 4.0) that allows others to share — copy and redistribute the material in any medium or format and adapt — remix, transform, and build upon the material for any purpose, even commercially with an acknowledgment of the work's authorship and initial publication in this journal.