[原创] 在树莓派3上跑ELL的demo报错:ImportError: build/_darknetReference.so: undefined symbol: cblas_sgemm

OS:Arch Linux ARM
gcc version:7.1.1 20170516 (GCC)

微软于2017年6月底发布了一个主要用于嵌入式系统(例如,树莓派,ARM Cortex-M0等)的机器学习库ELLEmbedded Learning Library嵌入式学习库)。
本文主要介绍了在树莓派上跑ELL的demo程序时,遇到的一个“undefined symbol: cblas_sgemm”问题的解决办法。

当我们一切准备工作已经基本完成,在树莓派上跑ELL的demo程序时,可能会报这个错:

(py34)[root@alarmpi compiled_darknetReference_pi3]# python compiledDarknetDemo.py
Traceback (most recent call last):
File "/root/raspberry-pi/ai/ell-related/compiled_darknetReference_pi3/darknetReference.py", line 14, in swig_import_helper
return importlib.import_module(mname)
File "/root/.miniconda3/envs/py34/lib/python3.4/importlib/init.py", line 109, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 2254, in _gcd_import
File "", line 2237, in _find_and_load
File "", line 2226, in _find_and_load_unlocked
File "", line 1191, in _load_unlocked
File "", line 1161, in _load_backward_compatible
File "", line 539, in _check_name_wrapper
File "", line 1715, in load_module
File "", line 321, in _call_with_frames_removed
ImportError: build/_darknetReference.so: undefined symbol: cblas_sgemm
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
File "compiledDarknetDemo.py", line 11, in 
import darknetReference as model
File "/root/raspberry-pi/ai/ell-related/compiled_darknetReference_pi3/darknetReference.py", line 17, in 
_darknetReference = swig_import_helper()
File "/root/raspberry-pi/ai/ell-related/compiled_darknetReference_pi3/darknetReference.py", line 16, in swig_import_helper
return importlib.import_module('_darknetReference')
File "/root/.miniconda3/envs/py34/lib/python3.4/importlib/init.py", line 109, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: build/_darknetReference.so: undefined symbol: cblas_sgemm

问题的核心在于标红的那两句。这说明我们在树莓派上编译出来的Python module _darknetReference.so,在运行时找不到 cblas_sgemm 这个函数,这个函数其实是应该在blas库里定义的。
文章来源:https://www.codelast.com/
这说明我用 pacman -S cblas blas 安装的blas库不能用。
在Arch Linux ARM系统上,检索到的和blas相关的package就只有如下几个:

[root@alarmpi ~]# pacman -Ss blas
extra/blas 3.7.1-1
    Basic Linear Algebra Subprograms
extra/cblas 3.7.1-1
    C interface to BLAS
extra/liblastfm 1.0.9-2
    A Qt4 C++ library for the Last.fm webservices

我试验过,无论是单独安装 blas,还是安装 cblas,或者是两个一起装,最终都不能解决问题
下面的这段message,就是在我同时安装了 blas 和 cblas 之后,编译_darknetReference.so过程中的 cmake 输出:

[root@alarmpi build]# cmake ..
-- The C compiler identification is GNU 7.1.1
-- The CXX compiler identification is GNU 7.1.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Blas libraries: /usr/lib/libblas.so
-- Blas linker flags: 
-- Blas include directories: 
-- Using BLAS include path: /usr/include
-- Using BLAS library: /usr/lib/libblas.so
-- Using BLAS DLLs: 
-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.2", minimum required is "3.4") 
-- Found PythonLibs: /usr/lib/libpython3.6m.so (found suitable version "3.6.2", minimum required is "3.4") 
-- Configuring done
-- Generating done
-- Build files have been written to: /root/raspberry-pi/ai/ell-related/compiled_darknetReference_pi3/build
注意标蓝的那几句,看似blas依赖都找到了,编译过程也可以成功地执行完,然而编译出来的_darknetReference.so却不能用。
奇怪的是,我检查过了安装好的 blas 库文件(.so),它确实是带了 cblas_sgemm 函数的,所以为什么编译出来的Python module不能work,我没搞明白:

nm -D /usr/lib/libblas.so | grep cblas_sgemm
(输出不为空,例如 "0005f230 T cblas_sgemm")

文章来源:https://www.codelast.com/
我还试验过只安装 blas,不安装 cblas,那么在 cmake 的输出中,下面这两行你是看不到的:
-- Using BLAS include path: /usr/include
-- Using BLAS library: /usr/lib/libblas.so
取而代之的是:
-- Blas include directories: 
-- BLAS library not found
这说明cmake连blas的依赖都没有找到,显然是不行的。所以不用想也知道结果了:编译出来的_darknetReference.so也不能用。

经历了以上失败的过程,我一度很困惑,经过一番探索,最终找到了可行的解决方案:自己编译安装OpenBLAS,并且在编译ELL的Python module的时候对编译过程略作修改,从而让编译脚本找到正确的blas库
文章来源:https://www.codelast.com/
二话不说我们马上开始干活:

  • 下载OpenBLAS源码 & 编译
git clone https://github.com/xianyi/OpenBLAS
cd OpenBLAS
make

最后一段输出信息如下:

......
make[1]: Leaving directory '/root/resource/OpenBLAS/exports'
 
 OpenBLAS build complete. (BLAS CBLAS) 
 
  OS               ... Linux                                                  
  Architecture     ... arm                                                    
  BINARY           ... 32bit                                                  
  C compiler       ... GCC  (command line : gcc)                              
  Library Name     ... libopenblas_armv7p-r0.3.0.dev.a (Multi threaded; Max num-threads is 4)
 
To install the library, you can run "make PREFIX=/path/to/your/installation install".

最后一句提示我们,可以通过 make PREFIX=路径 install 的方式,把编译好的OpenBLAS安装到指定的路径下。

文章来源:https://www.codelast.com/

  • 安装OpenBlas到自定义的目录下

我不想搞乱系统目录,所以就指定了安装目录:

[root@alarmpi OpenBLAS]# make PREFIX=/usr/lib/openblas install
make -j 4 -f Makefile.install install          
make[1]: Entering directory '/root/resource/OpenBLAS'                                          
Generating openblas_config.h in /usr/lib/openblas/include                                      
Generating f77blas.h in /usr/lib/openblas/include                                              
Generating cblas.h in /usr/lib/openblas/include                                                
Copying the static library to /usr/lib/openblas/lib                                            
Copying the shared library to /usr/lib/openblas/lib                                            
Generating openblas.pc in /usr/lib/openblas/lib/pkgconfig                                      
Generating OpenBLASConfig.cmake in /usr/lib/openblas/lib/cmake/openblas
Generating OpenBLASConfigVersion.cmake in /usr/lib/openblas/lib/cmake/openblas
Install OK!                                    
make[1]: Leaving directory '/root/resource/OpenBLAS'
  • 重新编译_darknetReference.so的一些准备工作
(1)卸载之前安装的 blas、cblas

pacman -R blas cblas

(2)把OpenBlas的lib路径添加到LD_LIBRARY_PATH中

[root@alarmpi build]# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openblas/lib

文章来源:https://www.codelast.com/
(3)修改 compiled_darknetReference_pi3/OpenBLASSetup.cmake 文件
这个文件定义了如何找到 OpenBLAS 的include头文件以及.so文件,所以我把路径 /usr/lib/openblas/include/ 添加到blas的search路径中:

set(BLAS_INCLUDE_SEARCH_PATHS
    /System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers/
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers/
    /usr/include
    /usr/local/include
    /usr/lib/openblas/include
)
蓝色那一句是我加的。

  • 重新编译_darknetReference.so

cmake的输出大部分与之前相同,不同的是这几句:

-- Blas libraries: /usr/lib/openblas/lib/libopenblas.so
-- Blas linker flags:
-- Blas include directories:
-- Using BLAS include path: /usr/lib/openblas/include
-- Using BLAS library: /usr/lib/openblas/lib/libopenblas.so
找到的blas路径都是我安装的OpenBLAS路径,可见以上修改真的生效了。

最后在miniconda环境下用 python compiledDarknetDemo.py 测试,发现编译出来的Python module果然work了,问题解决!
文章来源:https://www.codelast.com/
➤➤ 版权声明 ➤➤ 
转载需注明出处:codelast.com 
感谢关注我的微信公众号(微信扫一扫):

wechat qrcode of codelast

发表评论