首页Packt Python Web Scraping Cookbook
Python爬虫，介绍Python下爬虫原理和实现方式 This book is for those who want to learn to extract data from websites using the process of scraping and also how to work with various data management tools and cloud services. The coding will require basic skills in the Python programming language. The book is also for those who wish to learn about a larger ecosystem of tools for retrieving, storing, and searching data, as well as using modern tools and Pythonic libraries to create data APIs and cloud services. You may also be using Docker and Amazon Web Services to package and deploy a scraper on the cloud.
Python Web Scraping
Over 90 proven recipes to get you scraping with Python,
microservices, Docker, and AWS
BIRMINGHAM - MUMBAI
Python Web Scraping Cookbook
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, without the prior written permission of the publisher, except in the case of brief quotations
embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented.
However, the information contained in this book is sold without warranty, either express or implied. Neither the
author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged
to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products
mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy
of this information.
Commissioning Editor: Veena Pagare
Acquisition Editor: Tushar Gupta
Content Development Editor: Tejas Limkar
Technical Editor: Danish Shaikh
Copy Editor: Safis Editing
Project Coordinator: Manthan Patel
Proofreader: Safis Editing
Indexer: Rekha Nair
Graphics: Tania Dutta
Production Coordinator: Shraddha Falebhai
First published: February 2018
Production reference: 1070218
Published by Packt Publishing Ltd.
35 Livery Street
B3 2PB, UK.
About the author
Michael Heydt is an independent consultant specializing in social, mobile, analytics, and
cloud technologies, with an emphasis on cloud native 12-factor applications. Michael has
been a software developer and trainer for over 30 years and is the author of books such as
D3.js By Example, Learning Pandas, Mastering Pandas for Finance, and Instant
Lucene.NET. You can find more information about him on LinkedIn at michaelheydt.
I would like to greatly thank my family for putting up with me disappearing for months on
end and sacrificing my sparse free time to indulge in creation of content and books like this
one. They are my true inspiration and enablers.
About the reviewers
Mei Lu is the founder and CEO of Jobfully, providing career coaching for software
developers and engineering leaders. She is also a Career/Executive Coach for
Carnegie Mellon University Alumni Association, specializing in the software / high-tech
Previously, Mei was a software engineer and an engineering manager at Qpass,
M.I.T., and MicroStrategy. She received her MS in Computer Science from the
University of Pennsylvania and her MS in Engineering from Carnegie Mellon
Lazar Telebak is a freelance web developer specializing in web scraping, crawling, and
indexing web pages using Python libraries/frameworks.
He has worked mostly on projects of automation, website scraping, crawling, and exporting
data in various formats (CSV, JSON, XML, and TXT) and databases such as (MongoDB,
SQLAlchemy, and Postgres). Lazar also has experience of fronted technologies and
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and
apply today. We have worked with thousands of developers and tech professionals, just
like you, to help them share their insight with the global tech community. You can make a
general application, apply for a specific hot topic that we are recruiting an author for, or
submit your own idea.
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额