Data Format
Image data
Image data contain 6,313,067 products uploaded by 1,000,517 merchants. Each product has at least five product images, where the first image is the main image that gives the detailed overview of a product and the rest of them depict its functionalities or characteristics. We pick all the main images to construct the dataset. Images of different sizes are divided into three groups representing different qualities.
Caption data
Caption data are provided by 1,000,517 merchants. It is a common case that the text descriptions do not always match well with other modalities due to the fraud. According to the fraud level, the caption data also can be split into three types: well-matched , partially-matched and poorly-matched.
Video data
Video data are used to showcase products’ usage and characteristics to customers. In our dataset, these videos are recorded at a speed of 24 frames per second (FPS). We further sample those original frames and select one frame per second, since adjacent frames are similar and redundant and could give rise to excessive computational burden.
Audio data
Audio data are extracted from the video data. We extract the corresponding audio information of all sampled video frames. Then the audio frames are transformed into spectrogram by Mel-Frequency Cepstral Coefficients (MFCC). We set the frame size and hop size as 1,024 and 256 respectively.
Tabular data
Tabular data are a special kind of database recording some additional product characteristics such as appearances, purposes and producer. The tabular data is indexed by the product ID and collected from the whole product database. There are 5,679 property information and more than 24,398,673 152 unique values.
Data Annotation
Annotation
Annotations of M5Product dataset can be seen in meta_info.
meta_info
meta_info contains basic information about the collecting caption, image, video, audio and tabular data.
"meta_info": {
"item_id" < str > -- "Unique ID, eg: 12345..."
"caption": < str > -- "Bubble Matt Blind Box Storage Ladder Transparent Display Dust-proof Doll Hand-made Jasmine Doll Acrylic Box Holder".
"image": < file > -- .jpg
"video": < file > -- .mp4
"audio": < file > -- .mp3
"tabular": < dict > -- (key,value)
"label": < str > -- (Annotation)
}
Data Toolkits (will be released very soon)
M5ProductCoder also provides a reproduction of the M5Product dataloader and experiments on github.