Data Format

Image data

Image data contain 6,313,067 products uploaded by 1,000,517 merchants. Each product has at least five product images, where the first image is the main image that gives the detailed overview of a product and the rest of them depict its functionalities or characteristics. We pick all the main images to construct the dataset. Images of different sizes are divided into three groups representing different qualities.

Caption data

Caption data are provided by 1,000,517 merchants. It is a common case that the text descriptions do not always match well with other modalities due to the fraud. According to the fraud level, the caption data also can be split into three types: well-matched , partially-matched and poorly-matched.

Video data

Video data are used to showcase products’ usage and characteristics to customers. In our dataset, these videos are recorded at a speed of 24 frames per second (FPS). We further sample those original frames and select one frame per second, since adjacent frames are similar and redundant and could give rise to excessive computational burden.

Audio data

Audio data are extracted from the video data. We extract the corresponding audio information of all sampled video frames. Then the audio frames are transformed into spectrogram by Mel-Frequency Cepstral Coefficients (MFCC). We set the frame size and hop size as 1,024 and 256 respectively.

Tabular data

Tabular data are a special kind of database recording some additional product characteristics such as appearances, purposes and producer. The tabular data is indexed by the product ID and collected from the whole product database. There are 5,679 property information and more than 24,398,673 152 unique values.

Data Annotation

Annotation

Annotations of M5Product dataset can be seen in meta_info.

meta_info

meta_info contains basic information about the collecting caption, image, video, audio and tabular data.

"meta_info": {
                        "item_id"               < str >   -- "Unique ID, eg: 12345..."
                        "caption":              < str >   -- "Bubble Matt Blind Box Storage Ladder Transparent Display Dust-proof Doll Hand-made Jasmine Doll Acrylic Box Holder".
                        "image":                < file >  -- .jpg
                        "video":                < file >  -- .mp4
                        "audio":                < file >  -- .mp3
                        "tabular":              < dict >  -- (key,value)
                        "label":                < str >   -- (Annotation)
}

Data Toolkits (will be released very soon)

Link for M5Product toolkit

M5ProductCoder also provides a reproduction of the M5Product dataloader and experiments on github.