Создание новой Российской Cae-системы

9 мая 2006

Время ушедшее на решение (формирование матрицы, решение СЛАУ, подчет результатов (напряжения, деформации))= 38 минут 53 сек (со всеми стандартными установками). Можно "заказать" статистику по решателю, посмотреть сколько времени займет решение СЛАУ отдельно. Если кто-то хочеть сравнить, вот скрипт который можно использовать для оптимального времени решения (т.е. быстрее с данным КЭ, решателем, версией АНСИС и железом врядли можно, но это не тот скрипт который использовал я при решении выше).

!****************************

fini

/cle

/config,noeldb,1 ! no results into database

/nerr,0 ! no bullshit in the output

/prep7

shpp,off ! Deactivates element shape checking during meshing

etcontrol,off,off ! no bullshit in the output

et,1,185 ! NB: 2x2 integration

ex,1,1

nuxy,0.3 ! to make analyser happy

block,,100,,100,,100 ! volume 100x100x100

da,2,all ! constrain one surface

lesize, 5,,, 40 ! numer of elements per side

lesize, 6,,, 40

lesize,10,,, 40

mshape,0,3D ! use 3d bricks

mshkey,1 ! mapped meshing

vmesh,1

fk,1,fy,-200 ! load two nodes

fk,3,fy,-200

sbctran ! transfer solid model loads onto the mesh

modmsh,detach

vdele,1,,,1 ! delete solid model

/solu

solcontol,off,,nopl

bcsoption,,maxi,,-1,,performance ! use optimal solver settings + order perfomance stats

outres,,none ! no results to wite

eqslv,sparse

solve,,,,,nocheck ! no element shape checking before solution

/exit,nosave

!****************************

Не забудьте уменьшить место на -db перед запуском, чтобы больше досталось решателю. Уменьшить можно пока не появиться непустой .page файл до решения.

9 мая 2006

Я полностью прокомментировал используемый ввод. Я думаю, ваши друзья из Белоруссии, у которых есть Ансис, подтвердят то, что решалась как раз та задача что надо. С чего вы взяли, что СЛАУ не приводилась к треугольному виду? Где это написано? АНСИС 9.0 решает эту же задачу на моей машине без проблем.

9 мая 2006

Прогнал ввод на P4 2.8 Гц, 1 Гб XP(SP2) 2003 года выпуска

Вот саммая развернутая статистика:

!*********************************************************

S O L U T I O N O P T I O N S

PROBLEM DIMENSIONALITY. . . . . . . . . . . . .3-D

DEGREES OF FREEDOM. . . . . . UX UY UZ

ANALYSIS TYPE . . . . . . . . . . . . . . . . .STATIC (STEADY-STATE)

EQUATION SOLVER OPTION. . . . . . . . . . . . .SPARSE

GLOBALLY ASSEMBLED MATRIX . . . . . . . . . . .SYMMETRIC

*** NOTE *** CP = 1.484 TIME= 19:10:09

Present time 0 is less than or equal to the previous time.

Time will default to 1.

*** NOTE *** CP = 1.531 TIME= 19:10:09

The conditions for direct assembly have been met. No .emat or .erot

files will be produced.

L O A D S T E P O P T I O N S

LOAD STEP NUMBER. . . . . . . . . . . . . . . . 1

TIME AT END OF THE LOAD STEP. . . . . . . . . . 1.0000

NUMBER OF SUBSTEPS. . . . . . . . . . . . . . . 1

STEP CHANGE BOUNDARY CONDITIONS . . . . . . . . NO

PRESSURE LOAD STIFFNESS . . . . . . . . . . . .NEVER USED

PRINT OUTPUT CONTROLS . . . . . . . . . . . . .NO PRINTOUT

DATABASE OUTPUT CONTROLS

ITEM FREQUENCY COMPONENT

ALL NONE

NUMBER OF PROCESSORS USED = 1

ELEMENT FORMULATION CP TIME = 11.766

ELEMENT FORMULATION ELAPSED TIME = 13.020

Range of element maximum matrix coefficients in global coordinates

Maximum= 0.515046296 at element 62270.

Minimum= 0.515046296 at element 1.

*** ELEMENT MATRIX FORMULATION TIMES

TYPE NUMBER ENAME TOTAL CP AVE CP

1 64000 SOLID185 11.719 0.000183

Time at end of element matrix formulation CP= 15.1875.

MultiSolution: Sparse Assembly Option .... Call No. 1

ANSYS largest memory block available 268346288 : 255.91 Mbytes

ANSYS memory in use 142072000 : 135.49 Mbytes

ANSYS max memory used 142899040 : 136.28 Mbytes

End of PcgEnd

ANSYS largest memory block available 385827456 : 367.95 Mbytes

ANSYS memory in use 36468416 : 34.78 Mbytes

ANSYS max memory in use 148688544 : 141.80 Mbytes

Total Time (sec) for Sparse Assembly 1.02 cpu 4.32 wall

SPARSE MATRIX DIRECT SOLVER.

Number of equations = 201720, Maximum wavefront = 41

ncoefL/neqn = 25.1 call number 1

cs_num_sectors= 0 asym= 1

BCS_REORD - Reordering Stats:

METHOD reord fact+solv front mem sym. fact in-core fct OOC fact

(sec) (Millon ops) (D.P.wrds) (D.P. wrds) (D.P. wrds) D.P. words

------ ------- ----------- ---------- ------------ ------------ ------------

METIS 8.29 780784.6 28151256 3076356 318461875 19044241

Using Metis Reordering

Reordering - Adj matrix compression stats

orig neq cmp neq ratio orig nz cmp nz ratio

-------- -------- ----- ------------- ------------- -----

201720 67283 3.0 7673272 830622 9.2

Total Reordering Time (cpu,wall) = 7.906 8.296

cs_num_sectors= 0 asym= 1

METIS 8.34 780784.6 28151256 3076356 313821241 14403607

Using Metis Reordering

Reordering - Adj matrix compression stats

orig neq cmp neq ratio orig nz cmp nz ratio

-------- -------- ----- ------------- ------------- -----

201720 67283 3.0 7673272 830622 9.2

Total Reordering Time (cpu,wall) = 8.000 8.340

Memory available for solver = 364.87 MB

Memory required for in-core = 2307.86 MB

Optimal memory required for out-of-core = 238.26 MB

Minimum memory required for out-of-core = 23.48 MB

*** NOTE *** CP = 37.844 TIME= 19:11:10

Memory available for Sparse Matrix solver= 365 MB is less than that

required for in-core solution= 2308 MB. Proceeding with part in-core

and part out-of-core solution (uses 365 MB memory and achieves similar

CPU performance as in-core).

UP formulation Summary before factorization:

UPformInput= 1

localUPform= 0

UP reordering not used: UPformUsed= 0

No matrix scaling: BCSscl= 0

No pivoting: pvttol= 0.000000000000000E+000 BCStol= 0.000000000000000E+000

===========================

= multifrontal statistics =

===========================

number of equations = 201720

no. of nonzeroes in lower triangle of a = 4851705

number of compressed nodes = 67283

no. of compressed nonzeroes in l. tri. = 830622

amount of workspace currently in use = 1627652

max. amt. of workspace used = 47823988

no. of nonzeroes in the factor l = 246744750.

number of super nodes = 5298

number of compressed subscripts = 1607997

size of stack storage = 52975368

maximum order of a front matrix = 7503

maximum size of a front matrix = 28151256

maximum size of a front trapezoid = 478112

no. of floating point ops for factor = 7.8009D+11

no. of floating point ops for solve = 9.8799D+08

actual no. of nonzeroes in the factor l = 246744750.

actual number of compressed subscripts = 1607997

actual size of stack storage used = 45703167

negative pivot monitoring activated

number of negative pivots encountered = 0.

factorization panel size = 64

factorization update panel size = 32

solution block size = 2

time (cpu & wall) for structure input = 2.234375 2.318437

time (cpu & wall) for ordering = 8.000000 8.337776

time (cpu & wall) for symbolic factor = 0.125000 0.256515

time (cpu & wall) for value input = 1.765625 7.850886

time (cpu & wall) for numeric factor = 363.890625 436.386419

computational rate (mflops) for factor = 2143.735559 1787.602083

condition number estimate = 0.0000D+00

time (cpu & wall) for numeric solve = 14.703125 266.013850

computational rate (mflops) for solve = 67.195756 3.714046

i/o stats: unit file length amount transferred

words mbytes words mbytes

---- ----- ------ ----- ------

20 8209041. 63. Mb 17350931. 132. Mb

25 1607998. 12. Mb 4019995. 31. Mb

9 246744750. 1883. Mb 740234250. 5648. Mb

11 42875391. 327. Mb 85750782. 654. Mb

------- ---------- -------- ---------- --------

Totals: 299437180. 2285. Mb 847355958. 6465. Mb

Sparse Solver Call 1 Memory ( Mb) = 364.9

Sparse Matrix Solver CPU Time (sec) = 401.078

Sparse Matrix Solver ELAPSED Time (sec) = 736.227

*** NOTE *** CP = 418.266 TIME= 19:22:42

The initial memory allocation (-m) has been exceeded.

Supplemental memory allocations are being used.

NUMBER OF PROCESSORS USED = 1

ELEMENT RESULTS CP TIME = 0.328

ELEMENT RESULTS ELAPSED TIME = 0.953

*** ELEMENT RESULT CALCULATION TIMES

TYPE NUMBER ENAME TOTAL CP AVE CP

1 64000 SOLID185 0.000 0.000000

*** NODAL LOAD CALCULATION TIMES

TYPE NUMBER ENAME TOTAL CP AVE CP

1 64000 SOLID185 0.000 0.000000

*** LOAD STEP 1 SUBSTEP 1 COMPLETED. CUM ITER = 1

*** TIME = 1.00000 TIME INC = 1.00000 NEW TRIANG MATRIX

*** ANSYS BINARY FILE STATISTICS

BUFFER SIZE USED= 16384

66.375 MB WRITTEN ON ASSEMBLED MATRIX FILE: file.full

10.625 MB WRITTEN ON RESULTS FILE: file.rst

FINISH SOLUTION PROCESSING

!*********************************************************

Основные итоги:

Sparse Matrix Solver ELAPSED Time (sec) = 736.227

Это чистое время на решение СЛАУ. Версия Win 32 бита, но все в douple precision конечно. Простите, Испа, что так "больно быстро".

Обратите внимание на следующию строку

*** TIME = 1.00000 TIME INC = 1.00000 NEW TRIANG MATRIX

Т.е. факторизация проводилась.

Вот информация о матрице жесткости:

i/o stats: unit file length amount transferred

words mbytes words mbytes

---- ----- ------ ----- ------

9 246744750. 1883. Mb 740234250. 5648. Mb

Вот реальный размер 740234250 words. Матрица жескости в данном солвере оперируется без нулей (используя формат Harwell-Boeing). Wavefront в применении к матрицам это ширина ленты (после перенумерации, reordering см. Выше), но в данном солвере это определение непонятно, к тому же все равно хранятся только ненулевые элементы. Я постараюсь уточнить что это означает в этом решателе.

9 мая 2006

Вот отличная информация о sparse solver прям от ANSYS

<noindex>http://ansys.net/ansys/?mycat=tnt_poole1</noindex>

Некоторые выдержки:

"The sparse matrix is a direct solver. It directly solves for (x), for example, in the static equation [K](x)=(F), similar to the frontal solver. The frontal solver actually triangularizes [K] and the back-substitutes for (x). This is time-consuming and is also a hard drive hog (since the full [K] is factorized). Sparse solvers, on the other hand, take advantage of the fact that [K] is sparse and banded (usually non-zero terms near diagonal) to reduce memory requirements...

...More improvements are in the works with the sparse solver. Eventually the wavefront will not matter at all. Right now the wavefront message appears because we use frontal assembly to feed the sparse solver. We already have alternative assembly paths in ANSYS but we are not ready to turn them on for all types of analyses and boundary conditions..."

Т.е. wavefront в сообщениях это что-то внутреннее.

9 мая 2006

На форуме много знатоков ANSYS, я хотел бы посмотреть может кто-то сможет уличить меня во лжи... Вы бы хоть подождали прежде чем брать на себя обязанность обличать меня...

Я предоставил исчерпывающую информацию и все входные данные будем ждать. И как уже сказал wavefront в статистике к sparse solver это сугубо внутренняя величина, потому что матрица не хранится лентами в этом решателе.

В доказательство, в дабавок с ссылке выше, в примере во главе

Structural Guide | Chapter 10. Gasket Joints Simulation | 10.6. Solution Procedure and Result Output

SPARSE MATRIX DIRECT SOLVER.

Number of equations = 24, Maximum wavefront = 0

Зная вас достаточно давно, могу утверждать что стыд и совесть это не ваши коньки, поэтому апеллировать к этим качествам бесполезно. Надеюсь, однако, что товарищ который слил вам эту дезу покраснеет. Вашей же подмоченной репутации однако уже ничего не поможет.

Скажу только что, очевидно, дабы вам польстить товарищ решал задачу при помощи другого решателя "frontal solver" канувшего в небытие более чем 10 лет назад, но все еще работающего в АНСИС потому что он необходим в некоторых методах, например в модальном анализа где необходимо учесть влияние демпфирования (MODOPT,DAMP), для анализа скажем флаттера (две и более совпадающие собст. частоты с позитивной мнимой частью и реальной частью выше нуля). Действительно это другой прямой решатель в АНСИС который хранить матрицы лентами и он не работает с матрицами не влезающими в 8Гб на ПС.

См.

Basic Guide | Chapter 3. Solution | 3.1. Selecting a Solver

10 мая 2006

ИСПА, ну почему бы вам просто не признать поражение (ведь это уже всем очевидный факт)??? В результаты, вы же первый переходите на личности, заявляя, что была решена не та задача, использовался не тот метод, не тот юзер и пр...Ведь, таким образом вы сами выставляете, в первую очередь, свой продукт в черном цвете.

Насколько я знаю Артема, он человек очень честный и порядочный, и заявляя, что в этом расчете он что-то смухлевал, вы оскорбляете не только его, ну и всех нас!!!

Я тоже предлагаю прекратить спекуляции по поводу ансиса и того, что он многое не может решить или у вас еще есть примеры???

10 мая 2006

Считаю, что ИСПА не прав, потму как Артём радачку решил. Это факт!

И всё равно считалась бы эта задача 12 минут или 12 часов.

У меня АНСИСа нет, потому повторить не могу. Но не верить Артёму у меня причины нет. Он со своими 244 сообщениями на форуме намного больше полезной инфы выдал, чем другие с более 700. Это тоже факт и просто проверить если почитать старые сообщения.

10 мая 2006

У меня еще есть примеры, которые не может решить ANSYS под WIN32. Но сейчас мы их обсуждать не будем.

Прямо детский сад какой-то получается....

Что-то мне подсказывает, что нежелание испы давать нерешаемые ансисом примеры вызвано боязнью, что тут их решат и черный PR не пройдет!!!!

10 мая 2006

Прогнал тест от Артема (скрипт в сообщении №644).

Sparse solver: curEqn= 201720 totEqn= 201720 Job CP sec= 427.515

Factor done= 100% Factor Wall sec= 432.4 rate= 1804.2 Mflops;

Задача решена (надо отдать должное скрипту. Другие могут найти много полезного для применения в практике :). Изъящно написано... облегчение модели DB, уменьшение с помощью некотороых команд обращения к винту (дабы не писать все подряд), и др.) Используемое программное обеспечение - ANSYS v10.

***

10 мая 2006

ИСПА! Технология решения СЛАУ прямыми методами активно развивалась в 90-х, а не застыла как у вас на моменте появления ПК ИСКРА. В базе Compendex & Inspec я нашел 1453 англоязычные статьи в международных журналах посвященные именно этому sparse solver. Платите за подписку и читайте.

А пока настоятельно советую вооружиться англо-русско словарем и усвоить следущию информацию:

<noindex>http://ansys.net/ansys/?mycat=tnt_poole1</noindex>

<noindex>http://ansys.net/ansys/?mycat=tnt_poole2</noindex>

<noindex>http://www.ansys.com//assets/tech-papers/n...solver-tech.pdf</noindex>

<noindex>http://www.tynecomp.co.uk/Xansys/solver_2002.pdf</noindex>

Во всех источниках говорится sparse solve is the DIRECT solver! В презентация вы даже посмотрите как там все делается. Найдете намек на "непрямость" метода кричите! Удивляет просто ваше упрямство!

Для тех кто хочет попробывать аналог решателя существующего в ИСПА прошу заменить в скрипте выше

eqslv,sparse на eqslv,frontal

Кстати к моему удивлению eqslv,frontal решил задачку c кубиком за ночь без проблем. Видать ограничение связанный с frontal solver в 10-ке убрали со сменой компиляторов с Compaq Visual Fortran на Intel Fortran.

Для тех кому интересно. Sparse solver в ANSYS не "родной" его купили у Boeing и уже портировали и отлаживали под ANSYS. На самом дела в ANSYS нет родный решателей, все куплены у кого-то в свое время.

Я тестировал доступные sparse решатели, самый быстрый это WSMP: Watson Sparse Matrix Package (Version 6.04.25)

<noindex>http://www-users.cs.umn.edu/~agupta/wsmp.html</noindex>

его можно попробывать бесплатно несколько недель и купить. Для тех кто хочет совсем бесплатно есть вполне достойный решатель

<noindex>http://www.enseeiht.fr/irit/apo/MUMPS/</noindex>

он лишь немного медленей чем Sparse Solver в ANSYS но хорошо ускоряется в MPI режиме и доступен в исходнике с потрохами. Проблема с ним лишь то что он работает только in-core пока что. Есть интерфейс к Матлаб...

10 мая 2006

Первая ссылка

<noindex>http://66.249.93.104/search?sourceid=navcl...at%3Dtnt_poole1</noindex>

!**************************************************************************

Sparse Solver (Memory Requirements, Performance)

Q: Can someone direct me to more details on the sparse method? I didn't see anymore than a few sentences on it in the ansys 5.6 doc'n.

A: The sparse matrix is a direct solver. It directly solves for (x), for example, in the static equation [K](x)=(F), similar to the frontal solver. The frontal solver actually triangularizes [K] and the back-substitutes for (x). This is time-consuming and is also a hard drive hog (since the full [K] is factorized). Sparse solvers, on the other hand, take advantage of the fact that [K] is sparse and banded (usually non-zero terms near diagonal) to reduce memory requirements.

I've only read two papers on sparse solvers, so I'm not an expert in this area (in fact, I usually have little idea what I write about, but I think I just like hearing myself type). However, as a layman's simplified explanation on this, it's basically trying to store only non-zero terms. The two papers I read do things differently, so I don't know if there's a common algorithm. As one explanation, let's view a [K] matrix of order nxn:

For direct gaussian elimination, we basically need to store an nxn matrix (or, actually, upper triangle for symmetric matrices)

For sparse solver, we store non-zero terms only (say "m"). Oftentimes, because these non-zero terms could be anywhere, the "row" and "column" numbers/locations need to be stored, too. So one ends up with a mx3 matrix, which is much smaller than an nxn matrix.

This is an over-simplified explanation, but I hope you get the basic idea. As to how it actually solves this, it is too much for me to explain, and even I don't like typing *that* much. I'd refer you to some papers, but I don't think they're published (i.e., public), so I'm sorry about that...

Sorry if you already know this, but iterative solvers (usually conjugate gradient solvers) solve, as an example, the equation [K](x)=(F) by guessing a solution of (x) and updating it (using a preconditioning matrix -- this, too, is more than I'd like to get into right now). That's why a PCG solver goes through maybe a hundred or more iterations. It does not explicitly solve for (x), but since the convergence is usually tight (1e-6 or 1e-8), the answer is basically the same as you would get from a direct solver.

Posted by Sheldon Imaoka (CSI) on 05.18.2000

Sparse solver

You are correct in your observation that memory usage can grow in nonlinear analyses using the sparse solver (or frontal for that matter). Contact elements are one cause of the problem. The good news here is that in version 5.7 we are going to be using supplemental memory allocations in the sparse solver loop. This should mean that ANSYS will stop less often with out of memory errors. However, automatically growing memory can actually use a lot more memory than you need. For example, say there is 250 Mbytes available in the ANSYS memory when the sparse solver is called but 300 Mbytes are required. A supplemental memory allocation of 300 Mbytes would happen automatically but the 250 Mbytes remains unused. The memory has to be in one contiguous block. You can imagine if each call to the sparse solver increases the memory by 50 Mbytes what could happen. Eventually you will run out of space even though after the initial extra 300 Mbytes there are 550 total Mbytes available. The solver can only use contiguous blocks. This is a design issue with the sparse solver and we are working with the folks from Boeing to break up the memory requirements. But this change is a major one for the sparse solver package from Boeing so it won't happen overnight.

The best strategy is probably to allow extra memory at the start of a nonlinear run like this. You will find that in 5.7 the sparse solver will be able to run well in a LOT less memory space than in 5.6, at least in some cases. The 300k DOF size job should benefit from the memory changes in 5.7. We are seeing a 20 percent reduction in CPU/WALL times on SGI systems from I/O improvements and we have reduced the file size requirements by nearly half. You will see this in a running job by the fact the file.LN22 will no longer contain a copy of the large LN09 file.

Hope this helps some. More improvements are in the works with the sparse solver. Eventually the wavefront will not matter at all. Right now the wavefront message appears because we use frontal assembly to feed the sparse solver. We already have alternative assembly paths in ANSYS but we are not ready to turn them on for all types of analyses and boundary conditions.

Posted by Gene Poole (ANSYS, Inc.) on 06.14.2000

PCG Solver

There are differences if you are using the PCG solver. First, it is not a factorization based direct solver. So, even though the matrices change just like in the frontal or sparse solver cases, there should not be the potentially large growth in the matrices that you may see with frontal or direct. The PCG solver assembles the matrices using sparse matrix technology so the wavefront should not enter into the problem at all. Also, all of that which I said about the solver needing large contiguous blocks of memory does not apply to the PCG solver. It may grow memory as in the frontal and sparse solvers but it does so more incrementally so you should not get potentially large unused blocks of memory space.

Having said all of that, if you were running out of PCG solver space, or close to the limit, then you could still fail for memory space. When the PCG solver fails then you really are out of memory. This is because the PCG solver will do supplemental memory allocations until the system memory is unavailable. The only thing you can do at this point is to reduce your database space and rerun. This will cause the file.page to be used but that is not necessarily a huge penalty. The other thing which you can do is to make sure you have plenty of physical swap space set aside. I'd suggest 2X your main memory size on PCs. That way the PCG solver can still run if it exceeds your physical memory size. (If you exceed physical memory size for the PCG solver be prepared for about a 10X hit in Wall time. It is ok to exceed physical memory for all of ANSYS but if the PCG solver also exceeds physical memory each PCG iteration goes through disk I/O to the virtual memory space and it will take awhile) The file.PCG provides a good estimate of the amount of memory required for the PCG solver. It is only written the first time the PCG solver is called so it will not reflect the growth in the problem which you were asking about. Still, it is a worthwhile estimate of the PCG solver space.

Posted by Gene Poole (ANSYS, Inc.) on 06.14.2000

Choosing a Solver

My main area of expertise is solver performance so I am very interested in your experience. Perhaps I can also help you some.

The correct solver choice depends on several factors and I did see someone post some advice that was quite accurate. The PCG solver will minimize disk space at the expense of memory. PCG is an iterative solver which means there is no matrix factorization as in the frontal or sparse solver options. The file.tri or file.LN09 files are the big files in the frontal and sparse solver runs. In addition there is a temporary scratch file written by the sparse solver that is essentially a copy of the sparse solver workspace plus all of the files used by the sparse solver. This file is file.LN22. It goes away after the program stops so it can be confusing to get out of disk error messages and yet there seems to be a lot of space available. In 5.7 the LN22 file will not be used at all except in some nonlinear runs and we will no longer be saving copies of the huge files in any case. That "feature" was actually put in there external to ANSYS and we have removed it. So the disk space requirements for the sparse solver in ANSYS 5.7 will be less than half of the current requirements.

Now, as to memory usage. The frontal is the least memory but the slowest algorithm and biggest external file - however with the current situation in 5.6 where the sparse solver files get stored essentially twice it may actually take less disk in some cases. The frontal solver also has very good parallel performance. But again, for the size problem you describe it could run for a LONG time and take a huge disk file - several Gbytes would not surprise me. It depends on the maximum wavefront size. If you want an estimate of file size for the frontal solver look in the output file where it tells you the R.M.S. Wavefront size. The file size is pretty close to RMS WF size * num of DOFs * 8 / 1024*1024 Mbytes. The 8 is the number of bytes per double precision word. So if your problem has 300,000 DOFs and the R.M.S wavefront is 5000 that is 1.5 Billion D.P. words, or around 11 Gbytes.

The sparse solver (eqslv,spar) is also a direct factorization method but a much newer technology. It will run out-of-core but does have a minimum memory requirement on any given job. In 5.6 this solver option will grab all of available memory in the ANSYS heap at the time the solver is called. It prints out some messages about the memory available to the solver and if you run with the following undocumented debug flag set you will get some nice performance stats from the sparse solver:

use eqslv,spar,,-5

The sparse solver can run quite efficiently out-of-core if you have a decent disk setup. It does quite well on SGI Origin machines, including running in parallel if you have set ANSYS up to run in parallel. Other UNIX workstation platforms with large memory and good I/O configuration should also do well. We will be adding more sparse solver hardware optimizations in the future.The performance is also quite good on NT systems because we have linked with a fast math library that is used with that solver. However, most NT workstations are short on memory and many have terrible I/O performance. You really need SCSII disks on them and plenty of room. One way to get the maximum memory for the sparse solver is to cut the db space back. So, if you ran with -m 1000 -db 256 and then reran with -m 1000 -db 56 the solver would have an additional 200 Mbytes available. Of course you will get a file.page that is 200 Mbytes larger, potentially, but that is not much of a performance hit.

If you are interested in more info from me let me know. I would like to know a bit more about what kind of system you are running on and if you could do a run with eqslv,spar,,-5 and send me the output file I can tell a lot about your job performance. If you run the PCG solver the file.PCS is the file that I would like to see for performance data.

Good luck! I hope we can continue to improve your solver options. You will definitely see improvements in 5.7.

Posted by Gene Poole (ANSYS, Inc.) on 06.21.2000

Memory Allocation

There is no performance issue with allowing supplemental memory allocations for your job rather than increasing total scratch memory at the start of your job. However, I would generally recommend asking for sufficient memory up front whenever you have a good idea of the amount you need. The reason is that when supplemental memory allocations occur the amount of memory actually allocated is determined as some fraction of the initial space you started with. So, there can be cases where perhaps you only needed 1 Mbyte more of space but the supplemental memory allocation would get an additional block that would be potentially much more then 1 Mbyte. This is not necessarily a performance hit but you might end up thinking you need a lot more memory to run your job than was actually required.

One of the responses to your email mentioned a paper on ANSYS memory usage at www.csi-ansys.com/tip_of_the_week.htm This is really a well written paper with some very good data. Check it out - it will be very useful.

Hopefully, the word is getting out that in 5.7 the sparse solver will now also use supplemental memory allocation, at least in many cases. This change will work for eqslv,spar as well as the modal analysis runs which use the block Lanczos solver; modopt,lanb. The biggest change besides adding supplemental memory allocation is that we have added some additional logic and functionality to the sparse solver interface so that on the solver can now function using significantly less memory. It does this at the expense of more I/O but we have also improved I/O performance and eliminated some unnecessary I/O to compensate. In most cases the sparse solver is faster and uses less memory. If you have a large memory system you will be able to run the sparse solver with no I/O at all for smaller problems (under 70,000 dofs) and with minimal memory for larger jobs, as long as you specify large initial memory settings via the -m command line. The choice between running in-core and out-of-core is automatic and depends on the memory available. The biggest change in the block Lanczos side is that we can now run fairly large Lanczos jobs with much less memory. In one example a 1.4 Million DOF modal run that required over 2000 Mbtyes to run in 5.6 will now run with Lanczos solver memory of just 450 Mbytes. The total job runs with -m 850.

The performance hit for using less memory is system dependent but it was minimal on many of the systems we have tested. Don't expect performance miracles on small NT systems with IDE drives.

Posted by Gene Poole (ANSYS, Inc.) on 09.12.2000

!**************************************************************************

Вторая ссылка

<noindex>http://66.249.93.104/search?sourceid=navcl...at%3Dtnt_poole2</noindex>

!**************************************************************************

Sparse vs. PCG Solver

Q: Which solver 'should' be quicker? PCG or Sparse Solver?

I'd like to throw in a few comments relative to the recent thread of questions/comments about solver choices in ANSYS. This is such a politically correct subject that it is probably not interesting enough for xansys. After all, no matter what your answer to this question everyone gets to be correct in some solution space. Isn't' that wonderful? We are all right..

That disclaimer aside there are changes in the solver implementations in ANSYS that have been ongoing since 5.7 that may change your correct answer to the question of which is better to use. Many of you will probably change your answers in the future - and that is ok, it's certainly the PC thing to do.

The main changes we have made are improving the sparse solver performance. We have reduced memory requirements, improved single processor performance of the solver and in 5.7.1 added parallel processing to the factorization phase of the solver on HP, IBM, SUN and Dec. Previously only SGI had parallel processing in the sparse solver and NT systems had some parallel processing capability but most people never saw it or used it. Now all ANSYS platforms that support ANSYS parallel processing will run in parallel for the sparse solver and you will see a significant reduction in elapsed time for factorization. How significant depends on how big the model is and how pinched you are for memory. If you drive the memory size down you increase I/O and that effect will mask most of the parallel processing benefit. Also, if you are running a small job where factorization time is small anyway don't expect to see linear speedups. A reasonable expectation is to reduce wall time by .5p where p is the number of processors you use. For example, a 200k dof static analysis run on an HP or SGI workstation on 4 cpus would take half the time that a single processor run would do, perhaps even a bit better than that in the best case.

In version 6.0 we are now using sparse assembly rather than the frontal assembly path that we previously used for the sparse solver. This means that assembly is faster in almost all cases - sometimes a LOT faster ( particularly for problems with lots of constraint equations ). It also means that you can now run most jobs in the same memory space whether sparse or pcg solver. That is true as well for lanczos runs. We have really cut back memory usage for the lanczos solver in 6.0 compared to previous versions. We have NOT reduced performance to reduce memory, in fact in many cases performance is better.

The comparison with PCG involves a couple things. First, iterative solvers like the pcg solver, do repeated sparse matrix/vector multiplications and sparse solves. Sparse here means the scarcity of the assembled stiffness matrix. This matrix is in memory along with a preconditioner which is of the same order of size as the stiffness matrix, only it is stored in single precision to save space. The number of iterations required by the pcg solver is the key. Every time the pcg solver runs it writes a file.PCS and reports the size of the matrix and the number of iterations required for convergence.

Here is an example from a run on a 300 Mhz SGI Origin system. This model is just a small test case with 80,634 dofs. The assembled matrix has 2.9 million coefficients which is about 36 coefficients per row of the matrix. This run took 214 iterations to achieve a 10**-6 error tolerance ( the default is 10**-8 in ANSYS )

Degrees of Freedom: 80634

DOF Constraints: 1208

Elements: 22264

Assembled: 22264

Implicit: 0

Nodes: 26878

Number of Load Cases: 1

Nonzeros in Upper Triangular part of

Global Stiffness Matrix : 2919012

Nonzeros in Preconditioner: 1232823

Total Operation Count: 3.79527e+09

Total Iterations In PCG: 214

Average Iterations Per Load Case: 214

Input PCG Error Tolerance: 1e-06

Achieved PCG Error Tolerance: 9.671e-07

DETAILS OF SOLVER CP TIME(secs) User System

Assembly 4.5 0.8

Preconditioner Construction 1.6 0.6

Preconditioner Factoring 0.1 0

Preconditioned CG 38.6 0.1

******************************************************************************

Total PCG Solver CP Time: User: 62 secs: System: 3.2 secs

******************************************************************************

Estimate of Memory Usage In CG : 39.1707 MB

Estimate of Disk Usage : 40.6556 MB

CG Working Set Size with matrix outofcore : 11.2815 MB

******************************************************************************

Multiply with A MFLOP Rate:106.674 MFlops

Solve With Precond MFLOP Rate:108.793 MFlops

******************************************************************************

The key thing here is the number of iterations is in the few hundred range. Almost anytime that this is true the pcg solver will beat the sparse solver. But many problems take 1,000 or more iterations. I have seen problems from customers that take 2500 - 3500 iterations. These are realistic meshes with lots of features that give rise to large and small elements in the same model and lots of poor aspect elements. In these problems the sparse solver will usually win and sometimes by a great deal.

Just for completeness and for those with pocket protectors who are still reading I'll paste in the performance summary from the sparse solver for this same job on the same SGI machine.. I'm deleting some of the lines to save space.

===========================

= multifrontal statistics =

===========================

number of equations = 79426

no. of nonzeroes in lower triangle of a = 2373080

amount of workspace currently in use = 635644

max. amt. of workspace used = 10330548

no. of nonzeroes in the factor l = 40308929.

no. of floating point ops for factor = 3.9392D+10

no. of floating point ops for solve = 1.6163D+08

time (cpu & wall) for structure input = 4.100000 4.111955

time (cpu & wall) for ordering = 13.960000 14.014937

time (cpu & wall) for symbolic factor = 0.370000 0.359820

time (cpu & wall) for value input = 4.090000 4.101902

time (cpu & wall) for numeric factor = 95.730000 95.999392

computational rate (mflops) for factor = 411.495225 410.340494

i/o statistics: unit number length amount

----------- ------ ------

20. 4596494. 10230628.

25. 615454. 1538635.

9. 40308929. 120926787.

Sparse Matrix Solver CP Time (sec) = 126.990

Sparse Matrix Solver ELAPSED Time (sec) = 127.362

For this job the PCG solver wins. But look at some of the details and you will see why sometimes the sparse solver will win. If you increase the iterations for PCG by a factor of 3 or so the sparse solver will win. The factored matrix for this job has 40 million coefficients - compared to 2.4 million for the assembled stiffness matrix. In terms of the number of equations this is around 500 coefficients per row. That's a lot more than the 36 coefs per row that the iterative solver sees. But, the sparse solver achieves 400 Mflops - it is 4 times faster than the PCG solver for its computations. This run of the sparse solver could easily fit in memory but I ran it out-of-core. Running in-core is of course faster but since the factored matrix gets big in a hurry for large jobs it is not practical to run out-of-core and the performance is not degraded all that much due to a very nice implementation of out-of-core in the Boeing software.

So, to summarize PCG/sparse. Look at the number of iterations reported in the file.PCS. If it is high ( high hundreds of iterations and up ) look at the sparse solver as a faster alternative. If you have done this once in your life the answer you got then may have changed unless you are still using the same computer, solving the same problem and using the same version of ANSYS. It is worth trying the options once in awhile.

There is another alternative to sparse and PCG. That is the new AMG solver. This solver is available as part of the parallel processing addon. It does take more memory than PCG but not all that much more. It is a better preconditioner for some of the poor aspect ratio jobs. I have seen it beat PCG by a factor of 5. I have also seen it fail altogether. It is a very powerful iterative solver for a large class of problems. It has not been in ANSYS as long as the PCG solver so do not except miracles but it is there because it is a very good alternative for some problems. It is also a better performer on shared memory parallel machines.

It is not our intent to make solver selection so complicated. We want to make this as automatic as possible and as correct every time as possible. But, some input from an experienced user will be worth the effort if you are trying to improve solution time or pushing the limits of your current hardware. We are going to continue to improve solver performance in every release as we continue to advance ANSYS capabilities.

P.S. If you are still reading you are hard core. If you want to measure your performance using the sparse solver just add this command AFTER eqslv,spar or modopt,lanb and before solve: bcsopt,,,,,,-5 - That's 6 commas and a -5.

e.g.

eqslv,spar

bcsopt,,,,,,-5

solve

Posted by Gene Poole (ANSYS, Inc.) on 09.26.2000

!*************************************************************************************

Ну что будет если я решу задачу 100x100x100 прямым методом? Будем решать 200x200x200 и т.д. и будет комп всю ночь гудеть... Спасибо...

10 мая 2006

Уважаемый ИСПА,

Цитирую мануал, с переводом (с):

По умолчанию в ANSYS используется прямой метод расчета разряженных матриц, за исключением случаев шага создания матриц при использовании подконструкций и расчета задач магнетизма, в которых используется прямой фронтальный метод.

10 мая 2006

используется прямой фронтальный метод.

Под фронтальным методом понимается или метод Холецкого или Гаусса или еще что-то. От этого и идет путаница. Речь всегда шла о приведении матрицы к треугольному виду. Как я понял, данная задача в 10 версии решена (в более ранних версиях не решена), но решается она не за 15 или 20 минут, а за ночь.

10 мая 2006

ТО Артем Кулаченко:

Вообще то прямые методы для разреженных матриц разрабатывались и применялись с конца 70-х годов. в 1984 г. я уже встраивал эти методы в свой РЕЗАК-5. Конечно прямые методы могут конкурировать с итерационными при решении урматфизов только при комплексном спектре СЗ матрицы, где квазиньютоновские иттерационные методы имеют неоспоримое преимущество перед другими. В задачах с действительным спектром СЗ, особенно большой размерности, преимущества квазиньютоновских иттерационных методов не очевидны.

Интересная ссылка по сравнению пакетов может быть многим полезна:

<noindex>http://www.netlib.org/utk/people/JackDongarra/la-sw.html</noindex>

10 мая 2006

Из

<noindex>http://www.ansys.com//assets/tech-papers/n...solver-tech.pdf</noindex>

специально для ИСПА.

Какие вопросы?

To Eugeen

Широкое применение прямых методов стало возможно благодаря развитию вычислительной техники в 90 годах. В 95-том нам говорили что процессор в 1Гц требует охлаждения размером в стадион и мы работали под ДОС с 512 Кб, к концу учебы мы работали на ПК с 1Гц с памятью 512 Мб. В конце 80 годов начало 90 много людей верило что будещее КЭ это p-элементы. Где они сейчас??? Я констатирую факт, пик развития и ускорения прямых методов это 90 годы.

10 мая 2006

ТО Артем Кулаченко:

Широкое применение прямых методов стало возможно благодаря развитию вычислительной техники в 90 годах. В 95-том нам говорили что процессор в 1Гц требует охлаждения размером в стадион и мы работали под ДОС с 512 Кб, к концу учебы мы работали на ПК с 1Гц с памятью 512 Мб. В конце 80 годов начало 90 много людей верило что будещее КЭ это p-элементы. Где они сейчас??? Я констатирую факт, пик развития и ускорения прямых методов это 90 годы.

Зря Вы так горячитесь! Я Вас не опровергаю, а чуть поправляю. Реальные стимулы к развитию прямых SPARSE-Methods для появились именно при существенном ограничении на память ЭВМ в 70-80 годы.

Мало того все существенные приемы программирования SPARSE-Methods были разработаны в тот период. Путем прямого сравнения на тестах напр. очень старого пакета Y12M (см. ссылки:

1.Zlatev Z. On Some Pivotal Strategies in Gaussian Elimination by Sparse Тechnique//SIAM J. Numer. Anal. 17, 1980. P. 18—30.

2.Z. Zlatev et.al., "Y12M solution of large and sparse systems of linear algebraic equations", Lecture Notes in Computer Science, Volume 121, Springer, 1981 с тем же WSMP вы сможете увидеть, что за 25 лет развития смогли "выжать" несколько процентов по быстродействию, т.е. никакого качественного скачка не произошло!

10 мая 2006

Для Eugeen.

Посмотрите на формулы. Что-то придумали. Каким-то образом матрицу разрезают.

И еще

“Sparse solvers, on the other hand, take advantage of the fact that [K] is sparse and banded (usually non-zero terms near diagonal) to reduce memory requirements.”

Все это наводит на размышления

10 мая 2006

Это т.н. блок-иттеративные методы. Прием, разработанный еще в 70-е годы.

Кстати, самый эффективный алгоритм для разреженных матриц был придуман черт те когда, это .... трехдиагональная прогонка!

10 мая 2006

Ну не гудел он всю ночь со sparse solver... только 13 минут. С стареньким frontal гудел. Чессссно слово!

тем же WSMP вы сможете увидеть, что за 25 лет развития смогли "выжать" несколько процентов по быстродействию, т.е. никакого качественного скачка не произошло!

<{POST_SNAPBACK}>

Это интересная информация, я хотел бы посмотреть на источник. По моему опыту со sparse solver в АНСИС они значительно убыстряли работу его от версии к версии, прежде всего увеличивая объем работ проводимых в заданной оперативной памяти. С версии 6.0 это фактор 2.5 на том же железе. Так что не все только в чистых алгоритмах... Мы делали лабораторную работу по ускорению кода (решение уравнения Максвелла). Мы добились без особых изменений в алгоритме ускорения работы в 6 раз используя грамотно железо и оптимизируя код!

10 мая 2006

По моему опыту со sparse solver в АНСИС они значительно убыстряли работу его от версии к версии, прежде всего увеличивая объем работ проводимых в заданной оперативной памяти. С версии 6.0 это фактор 2.5 на том же железе. Так что не все только в чистых алгоритмах... Мы делали лабораторную работу по ускорению кода (решение уравнения Максвелла). Мы добились без особых изменений в алгоритме ускорения работы в 6 раз используя грамотно железо и оптимизируя код!

Я не говорил за АНСИС, я говорил про "чистый" алгоритм решения разреженных систем уравнений.

Конечно сравнение пакетов SPARSE при решении различных типов разреженных систем ур-ний дает представление об эффективности каждого, но делать сравнение надо корректно.

Например алгоритмы должны быть реализованы на одном и том же стандарте языка (Фортран или С++ напр.), скомпиллированы на одном и том же компилляторе в одной среде, с одинаковым "железом".

Где то я встречал довольно представительные тесты сравнения разных пакетов, помню лишь, что "выдающихся" показателей при сравнении никто не имел! Попробую восстановить источник.

Широкое применение прямых методов стало возможно благодаря развитию вычислительной техники в 90 годах. В 95-том нам говорили что процессор в 1Гц требует охлаждения размером в стадион и мы работали под ДОС с 512 Кб, к концу учебы мы работали на ПК с 1Гц с памятью 512 Мб.

Суперкомп CRAY-1 (выпускался в начале 80-х) имел размеры и форму небольшого диванчика с высокой спинкой, но никак не со стадион. Чуть уступали ему CONVEX (векторно-конвейерная ЭВМ) и CDC. Но именно на этих компах и решались, а во многих странах до сих пор решаются серьезные задачи.

Войти

Создание новой Российской Cae-системы

Рекомендованные сообщения

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Chechin 0

Ссылка на сообщение

Поделиться на других сайтах

Vova 419

Ссылка на сообщение

Поделиться на других сайтах

Chechin 0

Ссылка на сообщение

Поделиться на других сайтах

Islander 2

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Islander 2

Ссылка на сообщение

Поделиться на других сайтах

Гость ISPA

Ссылка на сообщение

Поделиться на других сайтах

Eugeen 6

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Eugeen 6

Ссылка на сообщение

Поделиться на других сайтах

Гость ISPA

Ссылка на сообщение

Поделиться на других сайтах

Eugeen 6

Ссылка на сообщение

Поделиться на других сайтах

Артем Кулаченко 0

Ссылка на сообщение

Поделиться на других сайтах

Eugeen 6

Ссылка на сообщение

Поделиться на других сайтах

Сейчас на странице 0 пользователей

Сообщения