Efficient parallel all-electron 4-component Dirac-Kohn-Sham program using a distributed matrix approach. II