Cognitive Computing实践指南：数据集与逻辑回归示例

需积分: 0 137 浏览量更新于2024-07-15 收藏 302KB PDF 举报

在"CognitiveComputingRecipes.pdf"一书中，章节APPENDIX A探讨了数据在人工智能和机器学习领域中的核心地位，引用了一句广为流传的话："Data is the new oil"。作者Adnan Masood和Adnan Hashmi强调，当今研究人员和从业者依赖大量的数据来推动研究与实践的发展。该章节提供了一份流行的公开数据集和深度学习模型仓库的概览，帮助读者了解并获取可用于实践的数据资源。 1. 数据集的重要性随着AI技术的兴起，数据成为了驱动算法和模型的关键因素。它不仅是训练模型的基础，还能用于评估模型性能、优化算法和发现新的洞察。数据的质量、规模和多样性直接影响到模型的准确性和泛化能力。 2. 公共数据集和搜索引擎 Google Data Search是一个新兴工具，它让用户能够搜索到分布在各种来源的数据集，包括出版商网站、数字图书馆和个人网页。这个搜索引擎简化了数据查找的过程，有助于快速找到相关数据集。 - Google Public Data Explorer：提供来自世界银行、经合组织（OECD）、Eurostat等国际组织以及学术机构的公开数据和预测，覆盖经济、社会、地理等多个领域。 3. 其他数据集资源除了Google Data Search，还有其他一些常用的数据集发现平台，虽然此处并未列出具体链接，但它们可能包括Kaggle（https://www.kaggle.com），这是一个知名的机器学习竞赛和数据集分享平台；UCI Machine Learning Repository（https://archive.ics.uci.edu/ml/）提供了大量分类、回归和聚类问题的数据集；以及GitHub上的一些开源项目，其中包含许多实际应用中的数据集。 4. 深度学习模型仓库对于深度学习模型，GitHub也是一个重要的资源库，如TensorFlow（https://github.com/tensorflow/models）和PyTorch（https://github.com/pytorch/vision）等框架都有官方或社区维护的模型代码示例和预训练模型，供开发者直接使用或作为参考。理解并利用这些公共数据集和模型仓库是现代AI从业者必备的技能，通过深入挖掘和处理这些数据，可以极大地推动Cognitive Computing的发展，并在实践中创造出更具价值的应用。同时，持续关注和探索新的数据源和技术，是保持竞争力和创新的关键。

401

Fashion-MNIST

https://github.com/zalandoresearch/fashion-mnist

Fashion-MNIST consists of 60,000 training images and 10,000 test images. It is a

MNIST-like fashion product database. The developers believe MNIST has been overused

so they created this as a direct replacement for that dataset. Each image is in greyscale

and is associated with a label from ten classes.

Size: 30MB

Number of Records: 70,000 images in ten classes

IMDB Reviews

http://ai.stanford.edu/~amaas/data/sentiment/

This is a dream dataset for movie lovers. It is meant for binary sentiment

classification and has far more data than any previous datasets in this field. Apart from

the training and test review examples, there is further unlabeled data for use as well. Raw

text and preprocessed bag of words formats have also been included.

Size: 80MB

Number of Records: 25,000 highly polar movie reviews for training and 25,000 for testing

Sentiment140

http://help.sentiment140.com/for-students

Sentiment140 is a dataset that can be used for sentiment analysis. A popular dataset,

it is perfect to start off your NLP journey. Emotions have been pre-removed from the

data. The final dataset has the following six features:

• polarity of the tweet

• ID of the tweet

• date of the tweet

• the query

• username of the tweeter

• text of the tweet

Size: 80MB (Compressed)

Number of Records: 1,60,000 tweets

Appendix A public dAtAsets & deep leArning Model repositories

剩余20页未读，继续阅读

张老鱼

粉丝: 5
资源: 8

Cognitive Computing实践指南：数据集与逻辑回归示例

基于springboot大学生就业信息管理系统源码数据库文档.zip

基于java的驾校收支管理可视化平台的开题报告.docx

原木5秒数据20241120.7z

毕业设计&课设_基于 Vue 的电影在线预订与管理系统：后台 Java（SSM）代码，为毕业设计项目.zip

基于springboot课件通中小学教学课件共享平台源码数据库文档.zip

基于java的网上购物商城的开题报告.docx

delphi 12 控件之Delphi人脸检测与识别Demo1fdef-main.zip

基于java的咖啡在线销售系统的开题报告.docx

基于java的自助医疗服务系统的开题报告.docx

Visual Basic编程入门与高级应用详解

最新资源