随着大数据时代的到来,数据准备在数据分析过程中扮演着至关重要的角色,但其复杂性和耗时性成为了企业及研究人员面临的重大挑战。"人在回路的数据准备技术研究进展"这一专题深入探讨了这两个核心问题——高昂的人力成本和冗长的时间周期。首先,文章关注于交互式数据准备技术,这是一种以用户为中心的方法,它通过实时互动预测用户的需求和意图,通过精准的算法预测和自动化流程,显著降低了数据准备的繁琐工作量,提高了效率。 交互式数据准备技术强调用户体验,用户不再是被动的数据接收者,而是参与到数据准备的决策过程中,这样既节省了人工干预的时间,又确保了数据处理的针对性和准确性。这种技术依赖于先进的机器学习模型和自然语言处理技术,能够理解用户的查询和需求,提供个性化的数据预处理解决方案。 其次,文章讨论了基于众包的数据准备技术。这种策略利用互联网上的庞大用户群体作为分布式计算资源,通过众包平台将数据清洗、转换等任务分解到个体用户,从而极大地扩展了数据处理能力。然而,如何确保众包数据的质量以及合理控制成本,如选择合适的任务分配策略、实施有效的质量监控机制,是这一领域亟待解决的关键问题。 为了实现高质量的众包数据准备,研究者们正在探索一系列方法,包括建立有效的激励机制、采用智能合约进行任务管理,以及通过人工智能技术自动评估和优化任务完成质量。同时,如何保护用户隐私和数据安全,以及如何处理数据主权和合规性问题,也是未来研究的重要方向。 "人在回路的数据准备技术研究进展"旨在通过结合交互式用户体验和众包计算力量,打破传统数据准备的瓶颈,推动数据分析的效率和质量提升。然而,技术进步的同时,也带来了新的伦理、法律和社会问题,这为未来的理论研究和实践应用提出了新的课题。在未来,期待看到更多创新的解决方案,使得数据准备变得更加智能、高效且透明。
2021-09-22 上传
Human-in-the-Loop Machine Learning lays out methods for humans and machines to work together effectively. Summary Most machine learning systems that are deployed in the world today learn from human feedback. However, most machine learning courses focus almost exclusively on the algorithms, not the human-computer interaction part of the systems. This can leave a big knowledge gap for data scientists working in real-world machine learning, where data scientists spend more time on data management than on building algorithms. Human-in-the-Loop Machine Learning is a practical guide to optimizing the entire machine learning process, including techniques for annotation, active learning, transfer learning, and using machine learning to optimize every step of the process. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology Machine learning applications perform better with human feedback. Keeping the right people in the loop improves the accuracy of models, reduces errors in data, lowers costs, and helps you ship models faster. About the book Human-in-the-Loop Machine Learning lays out methods for humans and machines to work together effectively. You'll find best practices on selecting sample data for human feedback, quality control for human annotations, and designing annotation interfaces. You'll learn to create training data for labeling, object detection, and semantic segmentation, sequence labeling, and more. The book starts with the basics and progresses to advanced techniques like transfer learning and self-supervision within annotation workflows. What's inside Identifying the right training and evaluation data Finding and managing people to annotate data Selecting annotation quality control strategies Designing interfaces to improve accuracy and efficiency About the author Robert (Munro) Monarch is a data scientist and engineer who has built machine learning data for companies such as