Skip to content

Compound solubility data

Source

Tetko, I., Tanchuk, V., Kasheva, T., and Villa, A. (2001). Estimation of aqueous solubility of chemical compounds using E-state indices. Journal of Chemical Information and Computer Sciences, 41(6), 1488-1493.

Huuskonen, J. (2000). Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. Journal of Chemical Information and Computer Sciences, 40(3), 773-777.

Value

solubility

a data frame

Details

Tetko et al. (2001) and Huuskonen (2000) investigated a set of compounds with corresponding experimental solubility values using complex sets of descriptors. They used linear regression and neural network models to estimate the relationship between chemical structure and solubility. For our analyses, we will use 1267 compounds and a set of more understandable descriptors that fall into one of three groups: 208 binary "fingerprints" that indicate the presence or absence of a particular chemical sub-structure, 16 count descriptors (such as the number of bonds or the number of Bromine atoms) and 4 continuous descriptors (such as molecular weight or surface area).

Examples

data(solubility)
str(solubility)
#> tibble [1,267 × 229] (S3: tbl_df/tbl/data.frame)
#>  $ fp_001            : int [1:1267] 0 0 1 0 0 1 0 1 1 1 ...
#>  $ fp_002            : int [1:1267] 1 1 1 0 0 0 1 0 0 1 ...
#>  $ fp_003            : int [1:1267] 0 0 1 1 1 1 0 1 1 1 ...
#>  $ fp_004            : int [1:1267] 0 1 1 0 1 1 1 1 1 1 ...
#>  $ fp_005            : int [1:1267] 1 1 1 0 1 0 1 0 0 1 ...
#>  $ fp_006            : int [1:1267] 0 1 0 0 1 0 0 0 1 1 ...
#>  $ fp_007            : int [1:1267] 0 1 0 1 0 0 0 1 1 1 ...
#>  $ fp_008            : int [1:1267] 1 1 1 0 0 0 1 0 0 0 ...
#>  $ fp_009            : int [1:1267] 0 0 0 0 1 1 1 0 1 0 ...
#>  $ fp_010            : int [1:1267] 0 0 1 0 0 0 0 0 0 0 ...
#>  $ fp_011            : int [1:1267] 0 1 0 0 0 0 0 0 1 0 ...
#>  $ fp_012            : int [1:1267] 0 0 0 0 0 1 0 1 0 0 ...
#>  $ fp_013            : int [1:1267] 0 0 0 0 1 0 1 0 0 0 ...
#>  $ fp_014            : int [1:1267] 0 0 0 0 0 0 1 0 0 0 ...
#>  $ fp_015            : int [1:1267] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ fp_016            : int [1:1267] 0 1 0 0 1 1 0 1 0 0 ...
#>  $ fp_017            : int [1:1267] 0 0 1 1 0 0 0 0 1 1 ...
#>  $ fp_018            : int [1:1267] 0 1 0 0 0 0 0 0 0 0 ...
#>  $ fp_019            : int [1:1267] 1 0 0 0 1 0 1 0 0 0 ...
#>  $ fp_020            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_021            : int [1:1267] 0 0 0 0 0 1 0 0 1 0 ...
#>  $ fp_022            : int [1:1267] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ fp_023            : int [1:1267] 0 0 0 1 0 0 0 0 1 0 ...
#>  $ fp_024            : int [1:1267] 1 0 0 0 1 0 0 0 0 0 ...
#>  $ fp_025            : int [1:1267] 0 0 1 0 0 0 0 0 0 0 ...
#>  $ fp_026            : int [1:1267] 1 0 0 0 0 0 1 0 0 0 ...
#>  $ fp_027            : int [1:1267] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ fp_028            : int [1:1267] 0 1 0 0 0 0 0 0 1 1 ...
#>  $ fp_029            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_030            : int [1:1267] 0 0 0 0 1 0 0 0 0 0 ...
#>  $ fp_031            : int [1:1267] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ fp_032            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_033            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_034            : int [1:1267] 0 0 0 0 1 0 0 0 0 1 ...
#>  $ fp_035            : int [1:1267] 0 0 0 0 0 0 0 0 1 0 ...
#>  $ fp_036            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_037            : int [1:1267] 0 0 0 0 0 0 0 0 1 0 ...
#>  $ fp_038            : int [1:1267] 0 0 1 0 0 0 0 0 0 0 ...
#>  $ fp_039            : int [1:1267] 1 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_040            : int [1:1267] 1 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_041            : int [1:1267] 0 0 0 1 0 0 0 0 1 0 ...
#>  $ fp_042            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_043            : int [1:1267] 0 1 0 0 0 0 0 0 0 0 ...
#>  $ fp_044            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_045            : int [1:1267] 0 0 1 0 0 0 0 0 0 0 ...
#>  $ fp_046            : int [1:1267] 0 1 0 0 0 0 1 0 0 1 ...
#>  $ fp_047            : int [1:1267] 0 1 1 0 0 0 1 0 0 0 ...
#>  $ fp_048            : int [1:1267] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ fp_049            : int [1:1267] 0 0 0 0 0 0 1 0 0 0 ...
#>  $ fp_050            : int [1:1267] 0 0 0 0 0 0 0 1 0 1 ...
#>  $ fp_051            : int [1:1267] 0 1 0 0 0 0 0 0 0 0 ...
#>  $ fp_052            : int [1:1267] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ fp_053            : int [1:1267] 0 0 0 0 0 0 1 0 0 0 ...
#>  $ fp_054            : int [1:1267] 0 0 0 1 0 0 0 0 1 1 ...
#>  $ fp_055            : int [1:1267] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_056            : int [1:1267] 1 0 0 0 0 0 0 0 0 0 ...
#>  $ fp_057            : int [1:1267] 0 0 0 0 0 0 1 0 0 0 ...
#>  $ fp_058            : int [1:1267] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ fp_059            : int [1:1267] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ fp_060            : int [1:1267] 0 1 1 0 0 0 0 1 1 0 ...
#>  $ fp_061            : int [1:1267] 0 0 1 0 0 0 0 1 1 0 ...
#>  $ fp_062            : int [1:1267] 0 0 1 0 0 1 0 1 1 1 ...
#>  $ fp_063            : int [1:1267] 1 1 0 0 1 1 1 0 0 1 ...
#>  $ fp_064            : int [1:1267] 0 1 1 0 1 1 0 1 0 0 ...
#>  $ fp_065            : int [1:1267] 1 1 0 0 1 0 1 0 1 1 ...
#>  $ fp_066            : int [1:1267] 1 0 1 1 1 1 1 1 1 1 ...
#>  $ fp_067            : int [1:1267] 1 1 0 0 1 1 1 0 0 1 ...
#>  $ fp_068            : int [1:1267] 0 1 0 0 1 1 1 0 0 1 ...
#>  $ fp_069            : int [1:1267] 1 0 1 1 1 1 0 1 1 0 ...
#>  $ fp_070            : int [1:1267] 1 1 0 1 0 0 1 0 1 0 ...
#>  $ fp_071            : int [1:1267] 0 0 0 0 0 0 1 0 1 1 ...
#>  $ fp_072            : int [1:1267] 0 1 1 0 0 1 0 1 1 1 ...
#>  $ fp_073            : int [1:1267] 0 1 1 0 0 0 0 0 1 0 ...
#>  $ fp_074            : int [1:1267] 0 1 0 0 0 0 0 0 1 0 ...
#>  $ fp_075            : int [1:1267] 0 1 0 0 1 1 1 0 0 1 ...
#>  $ fp_076            : int [1:1267] 1 1 0 0 0 0 1 0 1 1 ...
#>  $ fp_077            : int [1:1267] 0 1 0 1 0 0 0 1 1 1 ...
#>  $ fp_078            : int [1:1267] 0 1 0 0 0 0 0 0 1 0 ...
#>  $ fp_079            : int [1:1267] 1 1 1 1 1 0 1 0 1 1 ...
#>  $ fp_080            : int [1:1267] 0 1 0 0 1 1 1 1 0 0 ...
#>  $ fp_081            : int [1:1267] 0 0 1 1 0 0 0 1 1 1 ...
#>  $ fp_082            : int [1:1267] 1 1 1 0 1 1 1 0 1 1 ...
#>  $ fp_083            : int [1:1267] 0 0 0 0 1 0 0 0 0 1 ...
#>  $ fp_084            : int [1:1267] 1 1 0 0 1 0 1 0 0 0 ...
#>  $ fp_085            : int [1:1267] 0 1 0 0 0 0 1 0 0 0 ...
#>  $ fp_086            : int [1:1267] 0 0 0 1 1 0 0 1 1 1 ...
#>  $ fp_087            : int [1:1267] 1 1 1 1 1 0 1 0 1 1 ...
#>  $ fp_088            : int [1:1267] 0 1 0 0 0 0 0 1 1 0 ...
#>  $ fp_089            : int [1:1267] 1 1 0 0 0 0 1 0 0 0 ...
#>  $ fp_090            : int [1:1267] 0 1 0 1 0 0 0 1 1 1 ...
#>  $ fp_091            : int [1:1267] 1 1 0 0 1 0 1 0 0 1 ...
#>  $ fp_092            : int [1:1267] 0 0 0 0 1 1 1 0 1 0 ...
#>  $ fp_093            : int [1:1267] 0 1 0 1 0 0 0 1 1 1 ...
#>  $ fp_094            : int [1:1267] 0 0 0 0 1 0 0 1 0 0 ...
#>  $ fp_095            : int [1:1267] 0 0 0 0 0 0 0 0 1 1 ...
#>  $ fp_096            : int [1:1267] 0 0 0 0 0 0 0 0 1 0 ...
#>  $ fp_097            : int [1:1267] 1 1 0 0 0 0 1 0 1 0 ...
#>  $ fp_098            : int [1:1267] 0 0 1 0 0 0 0 1 0 0 ...
#>  $ fp_099            : int [1:1267] 0 0 0 0 0 0 0 0 1 0 ...
#>   [list output truncated]